Text Analytics for Data Unification


Enrich, catalog, connect, and resolve text across languages and alphabets.

Quickly understand multilingual, multicultural unstructured data.

  • Clean, reliable, unified data, regardless of language
  • High-accuracy de-duplication across languages and alphabets
  • Translation of names between foreign languages and English
  • Entity and fact extraction and resolution from unstructured text fields

Request a demo

 

 

Enhancing Elasticsearch™
with Text Analytics


Better search, recall and precision in 40 languages

Whether you’re tackling log analysis, e-commerce, watch list screening or other applications, names are often the key. Can you find “Abdul Jabbar, Karim” if you search for “Kareem AbdalJabar” or “كريم عبد الجبار”?

 

Higher recall and precision

Rosette provides advanced tokenization for Chinese, Japanese, and Korean; plus decompounding for languages such as Korean, German, and Dutch. Rosette also delivers lemmatization (as opposed to stemming) for normalization based on a word’s root meaning.

Facet on real-world entities

Rosette entity extraction and linking enable high quality faceted search by extracting people, locations, organizations, products, and 13 other entity types, and linking them to real-world entities.

Rosette comes pre-trained in 18 languages and can be adapted in the field to domain-specific content for improved accuracy.

Find names, no matter how they’re written

Names are frequently the most significant term in a query. Rosette increases search recall by ensuring that a larger number of occurrences of a name are found—overcoming misspellings, nicknames, missing spaces, the same name written in different languages, and other variations.

Open source intelligence (OSINT)

Rosette adds structure to vast quantities of text coming from social media, news, or blog feeds by extracting new or known entities, including people, places, and organizations.

Rosette can also standardize, translate, and link these names to an authority, mitigating the problem of inconsistent and “messy” real-world data.

 

Select Customers