Enhancing Elasticsearch™
with Text Analytics

Better search, recall and precision in 40 languages

Whether you’re tackling log analysis, e-commerce, watch list screening or other applications, names are often the key. Can you find “Abdul Jabbar, Karim” if you search for “Kareem AbdalJabar” or “كريم عبد الجبار”?



Higher recall and precision

Rosette provides advanced tokenization for Chinese, Japanese, and Korean; plus decompounding for languages such as Korean, German, and Dutch. Rosette also delivers lemmatization (as opposed to stemming) for normalization based on a word’s root meaning.

Facet on real-world entities

Rosette entity extraction and linking enable high quality faceted search by extracting people, locations, organizations, products, and 13 other entity types, and linking them to real-world entities.

Rosette comes pre-trained in 18 languages and can be adapted in the field to domain-specific content for improved accuracy.

Find names, no matter how they’re written

Names are frequently the most significant term in a query. Rosette increases search recall by ensuring that a larger number of occurrences of a name are found—overcoming misspellings, nicknames, missing spaces, the same name written in different languages, and other variations.

Open source intelligence (OSINT)

Rosette adds structure to vast quantities of text coming from social media, news, or blog feeds by extracting new or known entities, including people, places, and organizations.

Rosette can also standardize, translate, and link these names to an authority, mitigating the problem of inconsistent and “messy” real-world data.

Elasticsearch Enhanced With a Simple Plugin

100M names

50 matches/second

The Rosette plugin contains a custom mapper which does all the work behind the scenes:

  • The Rosette plugin’s name data type indexes keys for different
    phenomena (types of name variations) in separate (sub) fields.
  • The plugin generates analogous keys for a custom Lucene query that finds good candidates for re-ranking.
  • Rosette uses a rescore query to score names in the best candidate documents and reorder accordingly.

Fuzzy Name Matching For Elasticsearch

Fuzzy name matching is a common problem for both government and commercial users, and there are few reliable solutions.

Names connect data points in financial compliance, anti-fraud, government intelligence, law enforcement, and identity verification applications. The challenge is in connecting the dots despite incredible variation in misspellings, nicknames, initials, and titles, to list but a few.

To add even more complexity, in international databases, a single name may also appear in many languages.

Currently, performing name matching in Elasticsearch requires a complex query against multiple fields, and can produce a large number of unwanted results.

The Rosette plug-in for fuzzy name matching can identify name variations, nicknames, phonetic spelling, cross-script variations, and more.