Name Indexer

Explainable AI to accurately verify names of people and organizations against vast databases

Overview

Industry-leading accuracy, explainable matches

AML/KYC systems for financial compliance, government intelligence, and law enforcement agencies worldwide adopt Rosette® because it misses fewer matches and reduces the number of false positives. Matching names of people, locations, and organizations are obscured by misspellings, aliases, nicknames, initials, and names in different languages.

Rosette recognizes 15+ name phenomena (see all of them in the Tech Specs section). Here are two examples:

Same name in multiple languages Mao ZedongМао Цзэдун泽东
Semantically similar names PennyLuck Pharmaceuticals, Inc.PennyLuck Drugs

Our name indexer solves these challenges by blending machine learning with traditional name matching techniques, such as name lists, common key, and rules, to determine a match score. This score can also consider fuzzy matches in other fields (including address and date of birth). At the same time, Rosette explains the reasons for the match score, such as gender mismatch, missing or out-of-order name component, and missing or added space.

Eminently tunable for your data and needs

Multiple configuration knobs let you tune Rosette name matching and optimize the results using the GUI interface of Rosette Match Studio, which shows you how configuration changes affect the matches. If the “date of birth” field is more reliable, you would give that greater weight in the final match score and Match Studio will show how that changes the scores.

Product highlights

  • Supports multiple languages and 12+ variation types, including cross-lingual
  • Matches names of people, locations, and organizations
  • Reduces false positives and false negatives
  • Ranks results by match score
  • Easily integrates with existing systems, Elasticsearch, and Solr
  • Deploys in the cloud or on-premise
  • Is fast and scalable against massive lists
  • Developed actively, at least six releases per year
  • Offers industrial-grade support

How It Works

The industry leader in names

Our technology uses machine learning, rather than generated lists of name variations, to perform fuzzy name matching. Our approach matches never-before-seen names. It also avoids the problem of an exponentially growing list. Even a three-element name (first, middle, last) with 12 variations for each element would add 1,728 variations to a list.

Unlike expensive and less accurate legacy solutions driven by thousands of spelling variants, our software has a smaller footprint and analyzes the intrinsic structure of each name component to perform an intelligent comparison using advanced linguistic algorithms. Under the hood, name indexer uses cutting-edge NLP techniques, including neural networks, hidden Markov models, transliteration rules, and word embedding vectors.

Special algorithms for organization names

For matching organizational names in English, Chinese, and Japanese, Rosette also compares the semantic similarity of words in the name using text embeddings — one of the most powerful results of current deep learning research. Organization names frequently contain common nouns that may be swapped with words with similar meanings. Rosette matches these organizations based on word meaning, not just phonetics. For example, a search for “Eagle Drugs, Inc.” will fuzzy match “Eagle Pharmaceuticals, Inc.” because “drugs” and “pharmaceuticals” are close in meaning.

Transparent tuning and customization

Our text analytics tools are unique in their adaptability. With the user-friendly GUI interface of Rosette Match Studio, you can see how changing configurations affects the matching behavior of Rosette. On-premise options of Rosette Java (an SDK) or Rosette Server let you:

  • Set the minimum threshold of the similarity score to manage the precision and recall of search results
  • Create a list of “stopwords” to ignore when calculating matching scores (e.g., titles, honorifics)
  • Preset two names to always match with a given score (e.g., “Elizabeth” and “Lisbeth” always match at 90%)
  • Consider any number of other identity attributes (including address and date of birth) in calculating the match score
  • Fuzzy match address components that typically contain names (such as street and city).

Unlike other solutions that have been retrofitted to become scalable, our name indexer was designed for customers with tens of millions of data entries in large, complex databases, and for use cases that cannot afford lags in performance and accuracy.

Tech Specs

Availability and platform support

Deployment availability:
Plugins:
Bindings:

Supported languages

Supported writing systems and transliteration standards (further below)

Arabic French Italian Persian Turkish
Burmese German Japanese Portuguese Urdu
Chinese, Simplified Greek Khmer Russian Vietnamese
Chinese, Traditional Hebrew Korean Spanish
English Hungarian Pashto Thai

The many ways Rosette matches names

Phonetic similarity JesusHeyzeusHaezoos
Transliteration spelling differences Abdul RasheedAbd al-Rashid
Nicknames WilliamWillBillBilly
Missing spaces or hyphens MaryEllenMary EllenMary-Ellen
Titles and honorifics Dr.Mr.Ph.D.
Gender Jon SmithJohn Smith (but not Joan Smith)
Truncated name components Blankenship ↔ Blankensh
Missing name components Phillip Charles CarrPhillip Carr
Out-of-order name components Diaz, Carlos AlfonzoCarlos Alfonzo Diaz
Initials J. E. SmithJames Earl Smith
Names split inconsistently across database fields Rip Van WinkleRip Van Winkle
Same name in multiple languages Mao ZedongМао Цзэдун泽东澤東
Semantically similar names PennyLuck Pharmaceuticals, Inc.PennyLuck Drugs, Co.
Semantically similar names across languages Nippon Telegraph and Telephone Corporation ↔ 日本電信電話株式会社
Organizational Aliases IBM ↔ Big Blue

Supported writing systems and transliteration standards

Language Script Sample Supported transliteration standards
Arabic Arabic محمد أنور السادات IC, SATTS, BGN, Basis, Buckwalter, and others
Persian (Dari/Farsi) Arabic عذرا جعفری (Dari)
شيرين عبادى (Farsi)
BGN, IC, MELTS
Pashto Arabic حامد کرزی BGN, JDEC-Afghanistan
Urdu Arabic عبد السلام BGN, IC
Burmese Burmese မြန်မာ Folk (Basis), MLCTS
Chinese Hanzi 刘晓波 Hanyu Pinyin, Wade-Giles
Korean Hanja
Hangul
金大中
김대중
BGN, Korda, McCune-Reischauer, Revised Romanization of Korean
Hebrew Hebrew עִברִית ISO 259-2:1994, Folk
(Basis), ICU
Japanese Kanji
Hiragana
Katakana
鈴木章
かづさ
スズキ
Hepburn, Kunrei
Russian Cyrillic Михаи́л Серге́евич Горбачёв BGN, IC
Greek Greek Ἀριστοτέλης ISO 843:1997, ICU
Thai Thai พระพุทธยอดฟ้าจุฬาโลก ICU, ISO :11940-2, ISO 11940-2:2007
Sample output:
{
  "name1": {
    "text": "Влади́мир Влади́мирович Пу́тин",
    "language": "rus",
    "entityType": "PERSON"
  },
  "name2": {
    "text": "Vladimir Putin",
    "language": "eng",
    "entityType": "PERSON"
  }
}

{
  "result": {
    "score": 0.9486632809417912
  }
}

Try the Demo

Deployment

Rosette Cloud

Sign up today for a free 30-day trial

The SaaS version of Rosette is rapidly implemented, low maintenance and ideal for users who wish to pay based on monthly call volume. Numerous bindings through a RESTful API are supported.

Rosette Server Edition

This on-premise private cloud deployment puts all the functionality of Rosette Cloud behind your secure firewall, and enables advanced user settings, access to custom profiles (user-specific configuration setups), and deployment of custom models.

Rosette Java Edition

For on-premise systems that need the low-latency, high-speed integration of an SDK, Rosette Java is the way to go. It has been deployed in the most demanding, high-transaction environments, including web search engines, financial compliance, and border security.

Rosette Plugins

Just plug in Rosette for instant high-accuracy multilingual search and fuzzy name search for Elasticsearch or Apache Solr.

Quality documentation and support

Our support team responds to customers in less than a business day, and is committed to a satisfactory resolution. Users have access to in-depth documentation describing all the features, with code examples and a searchable knowledge base.

Visit our GitHub for bindings and documentation.

Select Customers Include

konasearch salesforce

Deep Search for Salesforce

AI-driven Search Application for Salesforce

KonaSearch is a best-in-class search application for Salesforce enabling users to search every field, file, and object across multiple orgs and other data sources.

View on AppExchange

SalesForce Search