Rosette for Elasticsearch
with Text Analytics
Better search, recall and precision in 40 languages
Whether you’re tackling log analysis, e-commerce, watch list screening or other applications, names are often the key. Can you find “Abdul Jabbar, Karim” if you search for “Kareem AbdalJabar” or “كريم عبد الجبار”?
Higher recall and precision
Rosette provides advanced tokenization for Chinese, Japanese, and Korean; plus decompounding for languages such as Korean, German, and Dutch. Rosette also delivers lemmatization (as opposed to stemming) for normalization based on a word’s root meaning.
Facet on real-world entities
Rosette entity extraction and linking enable high quality faceted search by extracting people, locations, organizations, products, and 13 other entity types, and linking them to real-world entities.
Rosette comes pre-trained in 18 languages and can be adapted in the field to domain-specific content for improved accuracy.
Find names, no matter how they’re written
Names are frequently the most significant term in a query. Rosette increases search recall by ensuring that a larger number of occurrences of a name are found—overcoming misspellings, nicknames, missing spaces, the same name written in different languages, and other variations.
Open source intelligence (OSINT)
Rosette adds structure to vast quantities of text coming from social media, news, or blog feeds by extracting new or known entities, including people, places, and organizations.
Rosette can also standardize, translate, and link these names to an authority, mitigating the problem of inconsistent and “messy” real-world data.
Fuzzy Name Matching For Elasticsearch
Fuzzy name matching is a common problem for both government and commercial users, and there are few reliable solutions.
Names connect data points in financial compliance, anti-fraud, government intelligence, law enforcement, and identity verification applications. The challenge is in connecting the dots despite incredible variation in misspellings, nicknames, initials, and titles, to list but a few.
To add even more complexity, in international databases, a single name may also appear in many languages.
Currently, performing name matching in Elasticsearch requires a complex query against multiple fields, and can produce a large number of unwanted results.
The Rosette plug-in for fuzzy name matching can identify name variations, nicknames, phonetic spelling, cross-script variations, and more.