Name Matching Algorithms
The basics you need to know about fuzzy name matching
When identification numbers are not available, names are often used as a unique identifier. Yet, misspellings, aliases, nicknames, name similarities, transliteration and translation errors bring unique challenges in matching names. Each fuzzy name matching algorithm excels at solving one or several of these challenges in their own unique ways to provide better matching.
Learn the basics of fuzzy name matching techniques and find out the one that suits you best.
A Few Things to Know Before Starting
What is fuzzy name matching?
Fuzzy matching assigns a probability to a match between 0.0 and 1.0 based on linguistic and statistical methods instead of just choosing either 1 (true) or 0 (false). As a result, names Robert and Bob can be a match with high probability even though they’re not identical.
What is exact name matching?
Exact name matching determines whether two names are the same. For instance, Bill and William is not a match in this case because two names are not exactly the same even though Bill is a nickname for William.
What are the current challenges in matching names?
Name matching is hard. Spelling variations, initials, nicknames, titles, name similarity, and names in different languages and scripts are just some of the most common challenges of name matching we see today.
What does “precision” and “recall” mean?
Precision: The number of correct results over the total number of results retrieved. High precision indicates the measure of quality.*
Recall: The number of correct items you found over the total number of correct items. High recall indicates the measure of quantity.*
The Most Common Challenges of Name Matching
Jesus ↔ Heyzeus ↔ Haezoos
Missing Spaces & Hyphens
MaryEllen ↔ Mary Ellen ↔ Mary-Ellen
Phillip Charles Carr ↔ Phillip Carr
Split Database Fields
Dick. Van Dyke ↔ Dick Van. Dyke
Abdul Rasheed ↔ Abd al-Rashid
Titles & Honorifics
Dr. ↔ Mr. ↔ Ph.D.
Diaz, Carlos Alfonzo ↔ Carlos Alfonzo Diaz
Mao Zedong ↔ Мао Цзэдун ↔ 毛泽东
William ↔ Will ↔ Bill ↔ Billy
McDonalds ↔ McDonald ↔ McD
J.J. Smith ↔ James Earl Smith
Eagle Pharmaceuticals, Inc. ↔ Eagle Drugs, Co.
Name Matching Algorithms at a Glance
Common Key Method
Assigns names a key or code based on their English pronunciation such that similar sounding names share the same key. A well-known common key method is Soundex.
Generates a list of all possible spelling variations of each name component and, then, matches names from that list.
Edit Distance Method
Calculates the smallest number of changes — in different ways — it takes to get from one name to other.
Statistical Similarity Method
Develops a statistical algorithm by training thousands of paired names to calculate the similarity score between two names.
Word Embedding Method
Turns each word into a numerical vector based on its semantic meaning and calculates the similarity of two words in a multidimensional space. Commonly used for organization names.
Combines some or all of the name matching methods above.
Dive into the World of Name Matching Algorithms
Learn the pros and cons of each algorithm and understand what’s happening behind the scenes.
Best Practice: The Hybrid Method
The hybrid name matching method combines two or more of these name matching algorithms to backfill weakness in one algorithm with the strength of another algorithm.
Rosette uses the hybrid method combining algorithms that suit your needs best.
Taking advantage of the common key method, Rosette quickly winnows the candidate pool down to a smaller, likely set of matches in the first pass.
In the second pass, using a high-precision statistical method, Rosette filters the highest scoring matches to the top so that fine-grained distinctions between different matches have been made.
Additionally, setting a minimum match threshold further controls the quality and quantity of results returned and allows Rosette provide the best results for you.
Throughout the process, Rosette doesn’t simply combine different algorithms but also handles name phenomena such as missing name components by comparing every available combination and scoring the degree of match for each to give the end-user an appropriate degree of confidence in the match.
Benefits of Rosette Name Matching
- Matches names of people, locations, and organizations
- Ranks results by the relevancy based on the confidence score
- Matches names regardless of how the names are written in 20+ languages
- Leverages cross-script and cross-lingual matching
- Takes advantage of semantic similarity algorithms
- Provides greater accuracy and recall
- Faster and more reliable than legacy solutions
- Available to deploy on-premise and in the cloud
Rosette understands the linguistic complexities of names across 20+ languages. Contact us today to learn more about the sophistication of Rosette’s name matching algorithm and what difference it could make in your business.
Request a demo
Name matching in the Lemonade Aftermath
How wrong was the “Beyhive” when they mistook Rachael Ray for Rachel Roy? The drama True to form, Beyoncé once again “broke the internet” Saturday night with the surprise drop of her ...Learn more
Fuzzy Name Matching Techniques
Methods of name matching and their respective strengths and weaknesses In a structured database, names are often treated the same as metadata for some other field like an email, phone number, ...Learn more
Fundamentals of Understanding, Translating and Matching CJK Names
When we talk about cross-lingual name matching between English and Japanese, it’s pretty straightforward, and pretty obvious which name is in English and which in Japanese. This applies to any ...Learn more
Names Search for the Modern Health Agency
Storing, accessing and sharing electronic medical data with intelligent patient matching In the age of smartphones, cloud storage, and the internet of things, we have come to expect the information we ...Learn more
An Elasticsearch Plugin for Simple Fuzzy Name Matching
Normalization is crucial to high quality search results -- who wants irrelevant variations between queries and documents leading to missed hits (e.g., “celebrity” v. “celebrities”)? Normalizing dictionary words works, but ...Learn more
Elasticsearch Name Matching – Fuzzy Search Names in Elasticsearch – Rosette Text Analytics
Elasticsearch developers who want to fuzzy search names across multiple fields and cover the spectrum of name variations (sometimes two or more in a single name), know how much of a bear it can be. Until now, the solution has not been completely satisfactory, comprehensive, nor clean, but that’s all about to change.Learn more
Could Better Name Matching Have Prevented the Boston Marathon Bombings?
Whitepaper - Making the Most of Intelligence: The Importance of Name Matching in Identity Resolution in Government In the Federal Government, making the right connections among field intelligence, open source media, ...Learn more
Elasticsearch and Fuzzy Name Matching Meetup, World Tour
Normalization is crucial to high-quality search results -- who wants irrelevant variations between queries and documents leading to missed hits (e.g., “celebrity” v. “celebrities”)? Normalizing dictionary words works, but what ...Learn more
Fuzzy Name Search and Name Matching Presentations in San Francisco
Names connect data points and are frequently the most important piece of information in a document. But unlike common nouns and verbs, they defy standardization, making them an elusive search ...Learn more
Word Embeddings for Fuzzy Matching of Organization Names
Rosette’s name matching is enhanced by word embeddings to match based on semantics as well as phonetics Tracking mentions of particular organizations across news articles, social media, and internal communications is ...Learn more
[*] Source: Wikipedia