3 packages returned for Tags:"segmenter"
- 9,489 total downloads
- last updated 5/1/2019
- Latest version: 3.9.2
Tokenization of raw text is a standard pre-processing step for many NLP tasks. For English, tokenization usually involves punctuation splitting and separation of some affixes like possessives. Other languages require more extensive token pre-processing, which is usually called segmentation.