Text analytics fundamentals
Morphological analysis delivers the core linguistic building blocks that prepare your text for further analysis. These processes include lemmatization, noun phrase extraction, part-of-speech (POS) tagging, and features specific to particular languages like decompounding and readings for Han script words.
Noun phrase extraction
Lemmas vs. stemming
Most search engines use stemming, chopping off characters at the end of a word, to find its root form. However, stemming often results in more recall but poorer precision, associating unrelated words such as arsenic/aresenal which share a stem (arsen).
Rosette lemmatization associates semantically related words through the common dictionary form of the word (the lemma). Rosette looks at vocabulary, context, and advanced morphological analysis to figure out when “spoke” is a noun or a verb. The result is more recall and better precision.