How to improve name matching with Rosette plugin for Elasticsearch
How to improve name matching?
Many organizations operating in finance, national security, law enforcement and other industries want to know how to improve name matching when using search platforms, such as Elasticsearch, for the job.
Elasticsearch is a powerful, well-respected platform, used primarily for searching unstructured data. But it was never built for name matching. Search engine capabilities in this arena lie somewhere between match/no match determinations and fuzzy matching — a computing approach that improves upon binary processes by considering degrees of truth. Returning only exact or near-exact matches, the name-matching capabilities of Elasticsearch and other search platforms are fuzzy enough for general searches, but not expansive enough for optimized name matching.
Using Rosette in conjunction with your search platform can dramatically improve name matching.
The name-match limitations of search platforms
Consider trying to match for an American woman named Mary Ann White. Search platforms may find instances of her full name, spelled correctly, in English. They may also return matches for Mary A. White, and perhaps Mary White. And that’s about it. They will not return instances of Maria Ana Blanca (Mary’s name in Spanish) or Maria Anna Bianca (Italian). They cannot find instances of her name rendered in non-Latin scripts (Мэри Энн Уайт, in Russian, メアリー・アン・ホワイト in Japanese). Nor will they detect aliases or nicknames such as Mimi White, or misspellings such as Marity White.
Because they do not consider the proximity of one part of a name to another the way a humans understand it, search platforms are also likely to return false positives. Consider this example, something that might appear in the weekly newspaper of a very small town.
After graduating with high honors from White Field High School, twin sisters Mary Gonzalez and Ann Gonzalez have been accepted to Rutgers University.
Your search engine may return this article as a match for your query, because it contains the terms “Mary,” “Ann,” and “White,” without considering each term’s proximity to the other.
Matching addresses and corporate names
Search engines face particular struggles in trying to match addresses and corporate names.
Different nations use different conventions for listing addresses. United States ZIP codes, for example, are either five numbers, or, when expressed as ZIP + 4s, nine numbers. United Kingdom postcodes are six to eight characters and contain both letters and numbers. Because of situations such as these, transposing addresses from nation to nation is a tricky process that gives rise to misspellings, incorrect labeling, inconsistent and misused abbreviations, and other issues that can typically only be resolved with fuzzy name matching.
Corporate names are also often hard for search engines to match. Subsidiaries abound. “Hill Housewares” may be a subsidiary of “Mountain Makeup,” which in turn may be owned by “Pinnacle Products, Inc.” Issues that affect one company may affect the others, yet typical search engines cannot chart relationships among the three.
Nicknames and initialisms can similarly confound search engines. You may regularly dine at your favorite burger place, “WBB,” and pick up your prescriptions at a chain that calls itself “VFP.” Too often, search engines cannot link these initialisms to the companies’ official names, “World’s Best Burgers LLC” and “Very Fine Pharmaceuticals, Inc.”
How much do you trust — or even understand — your match scores?
Many search engines use complicated ranking functions to estimate the data’s alignment to a given query. Their results are often based on factors such as how often, or how infrequently, a search term appears in a document or set of documents. The match score search engines deliver is a ratio calculation based on those frequencies. This scoring system is suboptimal for name matching because, while it gives users an idea of how often search teams appear, it does not clearly indicate how closely search terms match.
How to improve name matching with Rosette
Enhance your search engine-based name matching processes with Rosette.
Rosette is a scalable software solution that automates the matching of personal names, organizational names, and addresses. It employs AI-powered fuzzy matching capabilities to recognize names in all their varieties. Offered as a plug-in for Elasticsearch, it is interoperable with a broad variety of additional search engines and databases. It works atop these search engines and other systems, avoiding the need to rip-and-replace name-matching solutions. Rosette’s language capability enables it to match names from dozens of languages — including complex, non-Latin languages such as Arabic, Chinese, Hebrew, Japanese, Korean, and Russian.
Rosette can help you build confidence in your name-matching processes. Rather than just providing a “match/no match” report, Rosette gives you normalized, actionable scores indicating its confidence in each match. Scores run from 0 to 1, with, as examples, a .5 indicating a 50% degree of confidence in the match, and .75 indicating a 75% degree of confidence.
Don’t trust your high-volume, high-stakes name matching solely to search platforms not purpose-built for the task. Instead, deploy the Rosette plug-in for high-speed, scalable, cross-language, cross-script name searches.