Name Matching Algorithms

The basics you need to know about fuzzy name matching

When identification numbers are not available, names are often used as a unique identifier. Yet, misspellings, aliases, nicknames, name similarities, transliteration and translation errors bring unique challenges in matching names. Each fuzzy name matching algorithm excels at solving one or several of these challenges in their own unique ways to provide better matching.

Learn the basics of fuzzy name matching techniques and find out the one that suits you best.

Request Demo

name matching

A Few Things to Know Before Starting

What is fuzzy name matching?

Fuzzy matching assigns a probability to a match between 0.0 and 1.0 based on linguistic and statistical methods instead of just choosing either 1 (true) or 0 (false). As a result, names Robert and Bob can be a match with high probability even though they’re not identical.

What is exact name matching?

Exact name matching determines whether two names are the same. For instance, Bill and William is not a match in this case because two names are not exactly the same even though Bill is a nickname for William.

What are the current challenges in matching names?

Name matching is hard. Spelling variations, initials, nicknames, titles, name similarity, and names in different languages and scripts are just some of the most common challenges of name matching we see today.

What does “precision” and “recall” mean?

Precision: The number of correct results over the total number of results retrieved. High precision indicates the measure of quality.*

Recall: The number of correct items you found over the total number of correct items. High recall indicates the measure of quantity.*

The Most Common Challenges of Name Matching

Phonetic Similarity

Jesus ↔ Heyzeus ↔ Haezoos

Missing Spaces & Hyphens

MaryEllen ↔ Mary Ellen ↔ Mary-Ellen

Missing Components

Phillip Charles Carr ↔ Phillip Carr

Split Database Fields

Dick. Van Dyke ↔ Dick Van. Dyke

Spelling Differences

Abdul Rasheed ↔ Abd al-Rashid

Titles & Honorifics

Dr. ↔ Mr. ↔ Ph.D.

Out-of-Order Components

Diaz, Carlos Alfonzo ↔ Carlos Alfonzo Diaz

Multiple Languages

Mao Zedong ↔ Мао Цзэдун ↔ 毛泽东

Nicknames

William ↔ Will ↔ Bill ↔ Billy

Truncated Components

McDonalds ↔ McDonald ↔ McD

Initials

J.J. Smith ↔ James Earl Smith

Similar Names

Eagle Pharmaceuticals, Inc. ↔ Eagle Drugs, Co.

Name Matching Algorithms at a Glance

Common Key Method

Assigns names a key or code based on their English pronunciation such that similar sounding names share the same key. A well-known common key method is Soundex.

List Method

Generates a list of all possible spelling variations of each name component and, then, matches names from that list.

Edit Distance Method

Calculates the smallest number of changes — in different ways — it takes to get from one name to other.

Statistical Similarity Method

Develops a statistical algorithm by training thousands of paired names to calculate the similarity score between two names.

Word Embedding Method

Turns each word into a numerical vector based on its semantic meaning and calculates the similarity of two words in a multidimensional space. Commonly used for organization names.

Hybrid Method

Combines some or all of the name matching methods above.

Dive into the World of Name Matching Algorithms

Learn the pros and cons of each algorithm and understand what’s happening behind the scenes.

Best Practice: The Hybrid Method

The hybrid name matching method combines two or more of these name matching algorithms to backfill weakness in one algorithm with the strength of another algorithm.

machine learning name matching

Rosette uses the hybrid method combining algorithms that suit your needs best.

Taking advantage of the common key method, Rosette quickly winnows the candidate pool down to a smaller, likely set of matches in the first pass.

machine learning name matching
machine learning name matching

In the second pass, using a high-precision statistical method, Rosette filters the highest scoring matches to the top so that fine-grained distinctions between different matches have been made.

Additionally, setting a minimum match threshold further controls the quality and quantity of results returned and allows Rosette provide the best results for you.

machine learning name matching

Throughout the process, Rosette doesn’t simply combine different algorithms but also handles name phenomena such as missing name components by comparing every available combination and scoring the degree of match for each to give the end-user an appropriate degree of confidence in the match.

Benefits of Rosette Name Matching

  • Matches names of people, locations, and organizations
  • Ranks results by the relevancy based on the confidence score
  • Matches names regardless of how the names are written in 20+ languages
  • Leverages cross-script and cross-lingual matching
  • Takes advantage of semantic similarity algorithms
  • Provides greater accuracy and recall
  • Faster and more reliable than legacy solutions
  • Available to deploy on-premise and in the cloud

Rosette understands the linguistic complexities of names across 20+ languages. Contact us today to learn more about the sophistication of Rosette’s name matching algorithm and what difference it could make in your business.

Request a demo

Further Reading

Blog
29 Apr 2016
Blog
12 Dec 2017
Blog
07 Jun 2017
Blog
06 Sep 2017
Blog
23 Nov 2015
Blog
29 Jun 2015
Blog
18 Dec 2014
Blog
10 Jun 2015
Blog
14 Apr 2015
Blog
02 Aug 2017

[*] Source: Wikipedia