Using a rule-based tokenizer for German
Tokenization is a text segmentation process whose objective resides in dividing written text into meaningful units. This post introduces two rule-based methods to perform tokenization on German, English and beyond.
more ...