Explainable AI to accurately verify names of people and organizations against vast databases
Overview
Industry-leading accuracy, explainable matches
AML/KYC systems for financial compliance, government intelligence, and law enforcement agencies worldwide adopt Rosette® because it misses fewer matches and reduces the number of false positives. Matching names of people, locations, and organizations are obscured by misspellings, aliases, nicknames, initials, and names in different languages.
Rosette recognizes 15+ name phenomena (see all of them in the Tech Specs section). Here are two examples:
Same name in multiple languages | Mao Zedong ↔ Мао Цзэдун ↔ 毛泽东 |
Semantically similar names | PennyLuck Pharmaceuticals, Inc. ↔ PennyLuck Drugs |
Our name indexer solves these challenges by blending machine learning with traditional name matching techniques, such as name lists, common key, and rules, to determine a match score. This score can also consider fuzzy matches in other fields (including address and date of birth). At the same time, Rosette explains the reasons for the match score, such as gender mismatch, missing or out-of-order name component, and missing or added space.
Eminently tunable for your data and needs
Multiple configuration knobs let you tune Rosette name matching and optimize the results using the GUI interface of Rosette Match Studio, which shows you how configuration changes affect the matches. If the “date of birth” field is more reliable, you would give that greater weight in the final match score and Match Studio will show how that changes the scores.
Product highlights
- Supports multiple languages and 12+ variation types, including cross-lingual
- Matches names of people, locations, and organizations
- Reduces false positives and false negatives
- Ranks results by match score
- Easily integrates with existing systems, Elasticsearch, and Solr
- Deploys in the cloud or on-premise
- Is fast and scalable against massive lists
- Developed actively, at least six releases per year
- Offers industrial-grade support
How It Works
The industry leader in names
Our technology uses machine learning, rather than generated lists of name variations, to perform fuzzy name matching. Our approach matches never-before-seen names. It also avoids the problem of an exponentially growing list. Even a three-element name (first, middle, last) with 12 variations for each element would add 1,728 variations to a list.
Unlike expensive and less accurate legacy solutions driven by thousands of spelling variants, our software has a smaller footprint and analyzes the intrinsic structure of each name component to perform an intelligent comparison using advanced linguistic algorithms. Under the hood, name indexer uses cutting-edge NLP techniques, including neural networks, hidden Markov models, transliteration rules, and word embedding vectors.
Special algorithms for organization names
For matching organizational names in English, Chinese, and Japanese, Rosette also compares the semantic similarity of words in the name using text embeddings — one of the most powerful results of current deep learning research. Organization names frequently contain common nouns that may be swapped with words with similar meanings. Rosette matches these organizations based on word meaning, not just phonetics. For example, a search for “Eagle Drugs, Inc.” will fuzzy match “Eagle Pharmaceuticals, Inc.” because “drugs” and “pharmaceuticals” are close in meaning.
Transparent tuning and customization
Our text analytics tools are unique in their adaptability. With the user-friendly GUI interface of Rosette Match Studio, you can see how changing configurations affects the matching behavior of Rosette. On-premise options of Rosette Java (an SDK) or Rosette Server let you:
- Set the minimum threshold of the similarity score to manage the precision and recall of search results
- Create a list of “stopwords” to ignore when calculating matching scores (e.g., titles, honorifics)
- Preset two names to always match with a given score (e.g., “Elizabeth” and “Lisbeth” always match at 90%)
- Consider any number of other identity attributes (including address and date of birth) in calculating the match score
- Fuzzy match address components that typically contain names (such as street and city).
Unlike other solutions that have been retrofitted to become scalable, our name indexer was designed for customers with tens of millions of data entries in large, complex databases, and for use cases that cannot afford lags in performance and accuracy.
Tech Specs
Availability and platform support
Deployment availability: | |
Plugins: | |
Bindings: |
Supported languages
Supported writing systems and transliteration standards (further below)
Arabic | French | Italian | Persian | Turkish | |
Burmese | German | Japanese | Portuguese | Urdu | |
Chinese, Simplified | Greek | Khmer | Russian | Vietnamese | |
Chinese, Traditional | Hebrew | Korean | Spanish | ||
English | Hungarian | Pashto | Thai |
The many ways Rosette matches names
Phonetic similarity | Jesus ↔ Heyzeus ↔ Haezoos |
Transliteration spelling differences | Abdul Rasheed ↔ Abd al-Rashid |
Nicknames | William ↔ Will ↔ Bill ↔ Billy |
Missing spaces or hyphens | MaryEllen ↔ Mary Ellen ↔ Mary-Ellen |
Titles and honorifics | Dr. ↔ Mr. ↔ Ph.D. |
Gender | Jon Smith ↔ John Smith (but not Joan Smith) |
Truncated name components | Blankenship ↔ Blankensh |
Missing name components | Phillip Charles Carr ↔ Phillip Carr |
Out-of-order name components | Diaz, Carlos Alfonzo ↔ Carlos Alfonzo Diaz |
Initials | J. E. Smith ↔ James Earl Smith |
Names split inconsistently across database fields | Rip Van Winkle ↔ Rip Van Winkle |
Same name in multiple languages | Mao Zedong ↔ Мао Цзэдун ↔ 毛泽东 ↔ 毛澤東 |
Semantically similar names | PennyLuck Pharmaceuticals, Inc. ↔ PennyLuck Drugs, Co. |
Semantically similar names across languages | Nippon Telegraph and Telephone Corporation ↔ 日本電信電話株式会社 |
Organizational Aliases | IBM ↔ Big Blue |
Supported writing systems and transliteration standards
Language | Script | Sample | Supported transliteration standards |
---|---|---|---|
Arabic | Arabic | محمد أنور السادات | IC, SATTS, BGN, Basis, Buckwalter, and others |
Persian (Dari/Farsi) | Arabic | عذرا جعفری (Dari) شيرين عبادى (Farsi) |
BGN, IC, MELTS |
Pashto | Arabic | حامد کرزی | BGN, JDEC-Afghanistan |
Urdu | Arabic | عبد السلام | BGN, IC |
Burmese | Burmese | မြန်မာ | Folk (Basis), MLCTS |
Chinese | Hanzi | 刘晓波 | Hanyu Pinyin, Wade-Giles |
Korean | Hanja Hangul |
金大中 김대중 |
BGN, Korda, McCune-Reischauer, Revised Romanization of Korean |
Hebrew | Hebrew | עִברִית | ISO 259-2:1994, Folk (Basis), ICU |
Japanese | Kanji Hiragana Katakana |
鈴木章 かづさ スズキ |
Hepburn, Kunrei |
Russian | Cyrillic | Михаи́л Серге́евич Горбачёв | BGN, IC |
Greek | Greek | Ἀριστοτέλης | ISO 843:1997, ICU |
Thai | Thai | พระพุทธยอดฟ้าจุฬาโลก | ICU, ISO :11940-2, ISO 11940-2:2007 |
Sample output:
{ "name1": { "text": "Влади́мир Влади́мирович Пу́тин", "language": "rus", "entityType": "PERSON" }, "name2": { "text": "Vladimir Putin", "language": "eng", "entityType": "PERSON" } } { "result": { "score": 0.9486632809417912 } }
Try the Demo
Deployment
Rosette Cloud
Sign up today for a free 30-day trial
The SaaS version of Rosette is rapidly implemented, low maintenance and ideal for users who wish to pay based on monthly call volume. Numerous bindings through a RESTful API are supported.
Rosette Server Edition
This on-premise private cloud deployment puts all the functionality of Rosette Cloud behind your secure firewall, and enables advanced user settings, access to custom profiles (user-specific configuration setups), and deployment of custom models.
Rosette Java Edition
For on-premise systems that need the low-latency, high-speed integration of an SDK, Rosette Java is the way to go. It has been deployed in the most demanding, high-transaction environments, including web search engines, financial compliance, and border security.
Rosette Plugins
Just plug in Rosette for instant high-accuracy multilingual search and fuzzy name search for Elasticsearch or Apache Solr.
Quality documentation and support
Our support team responds to customers in less than a business day, and is committed to a satisfactory resolution. Users have access to in-depth documentation describing all the features, with code examples and a searchable knowledge base.
Visit our GitHub for bindings and documentation.
Select Customers Include
Learn More
Deep Search for Salesforce
AI-driven Search Application for Salesforce
KonaSearch is a best-in-class search application for Salesforce enabling users to search every field, file, and object across multiple orgs and other data sources.
