World Leading Text Analytics

Capabilities

Base Linguistics


Text analytics fundamentals to prepare your data for analysis. Language-specific tools for tokenization, part-of-speech tagging, lemmatization, decompounding, and Chinese and Japanese readings for your input.

Categorization


Categorization is arranging, or classifying, content sources such as documents and web pages under a list of topics, or taxonomy. Rosette classification automates this process for your content, allowing you to find the documents most relevant to your needs.

Chat Translation


Convert Arabic text written with the Roman alphabet to Arabic script.

Entity Extraction


Entities are the key actors in your content: the people, organizations, locations, email addresses, products, dates, times, and more that are hidden in your text. Rosette uncovers these entities to help you understand what your content is telling you.

Entity Linking


Rosette uncovers entities such as people, organizations, and locations, and links them back to a knowledge base.

Language Identification


Instantly identify and triage many languages within large volumes of text to prepare for further analysis

Morphological Analysis


Morphological analysis delivers the core linguistic building blocks that prepare your text for further analysis, including lemmatization, part-of-speech (POS) tagging, and features specific to particular languages like decompounding.

Name Matching


Names are challenging to match when there are misspellings, aliases or nicknames, initials, and titles. Rosette provides the industry’s leading multilingual identity resolution for government intelligence, identity verification, financial compliance, and many other uses.

Name Translation


Rosette uses its knowledge of language-specific naming conventions to recognize when to transliterate a name based on spelling or when to translate the meaning, such as a title.

Relationship Extraction


Relationships are the grammatical and semantic connections between two entities in a piece of text. Rosette uses a combination of machine learning and semantic rules to recognize and extract the action that connects entities: their relationship.

Sentence Tagging


Rosette uses machine learning and statistical analysis to discern the context of a punctuation mark and accurately extract only those which represent sentence boundaries.

Sentiment Analysis


Sentiment is the attitudes, opinions, and emotions of a person towards a person, place, thing, or entire body of text in a document. Rosette determines where on a scale from positive to negative sentiment lies subjectively.

Text Embedding


Compare semantic similarity between words and documents across nine languages.

Tokenization


Tokenization separates text into its most fundamental elements: words. While tokenization is necessary as the basis for text analytics in any language, it is especially important for scripts that do not use spaces between words, like Japanese and Chinese.

Topic Extraction


Identify keywords and significant phrases in your text data, even when they are not explicitly mentioned