04 Nov 2019

Name Matching for Addresses – Fuzzy Address Matching

Rosette now offers smart matching for person names, addresses, and dates

Astaire Avenue, Garland Drive, Lamarr Avenue, Skelton Circle, and Hepburn Circle are real street names in Culver City, CA, and equally prone to spelling errors as person names. In fact, many components of an address are essentially names: streets, cities, states/provinces, or buildings. Rosette has been widely adopted for its intelligent fuzzy matching of person, location, and organization names, but up until now did not specifically handle these “embedded names” in addresses.

Rosette now applies algorithmic smarts to postal addresses the same way it does for personal names in the LABS (beta) release of the new /address-similarity endpoint.

How address similarity works

Working on fielded addresses, Rosette applies appropriate edit distance calculations to address fields like postal code or street number, and the algorithms of name matching to fields like “street name,” “house (aka, building name),” “city,” “province/state,” and “country.” Within each of the text fields, Rosette matches with respect to:

Phonetics and spelling differences 100 Montvale Ave vs. 100 Montvail Av
Missing address field components 100 Montvale Ave vs. 100 Montvale
Differences in upper and lowercase 100 Montvale Ave vs. 100 MONTVALE AVE
Reordered address components within a field 100 Montvale Ave. vs. 100 Avenue Montvale
Address field abbreviations Montvale St. vs Montvale Street
(explicitly coded only for U.S. addresses at present)

How fuzzy date matching works

Date fuzzy matching by Rosette complements fuzzy address and name matching. It can compare partial dates and misordered date components (DDMMYY vs. MMDDYY). In particular, the matching engine considers five aspects of dates:

  • Time: The number of days between Date 1 and Date 2
  • Year: The difference of the year fields of Date 1 and Date 2
  • Month: The difference of the month fields of Date 1 and Date 2
  • Day: The difference of the day fields of Date 1 and Date 2 (even if they are close in time, 1 and 30 are considered far apart)
  • String distance: Date 1 and Date 2 to a standard format; then the string distance score is calculated based on the edit distance between the two strings.

Date matching is currently available through the Rosette Enterprise SDK or its Elasticsearch plugin.

Who needs fuzzy records matching?

Names, addresses, and dates are critical data points to check when matching records in many domains.

Know Your Business for financial compliance

Financial institutions are required by Know Your Customer regulations to avoid transacting business with known bad actors. Often these customers are other businesses. Suppose a new business customer requests a line of credit from a bank. Before approving it, the bank asks for information, such as the customer’s places of business and names of its executive board. The bank compares the provided information against business directory listings to verify the customer is who it says it is. For example, if a business is applying from the Cayman Islands, but the directory listing shows no offices there, that might be a red flag. Similarly, if the name of the executive applying for credit isn’t listed in the directory, that will be considered in the risk calculation of whether to take on this new client.

Assigning unique IDs

In any system where the use of Social Security numbers as IDs is restricted by privacy rules— such as education or health care — a unique ID might be assigned to each person. These records include person names, nicknames, dates of birth, and current and previous addresses. Using Rosette’s Elasticsearch plugin, the various record fields can be weighted differently, depending on which data are known to be the most dependable, and thus an overall match score can take into account the various scores from fuzzy matching name, address, and date fields. Being able to positively identify matching or non-matching records eliminates the creation of duplicate records, which are costly to ferret out and correct.

LABS release caveats

The beta release of the /address-similarity endpoint currently only supports addresses written in Latin characters and English-speaking locales with support for U.S. address abbreviations and some Spanish ones — although it may not handle all cases for addresses outside the U.S.

Rosette does not parse addresses into fields or propose to canonicalize addresses to a prescribed format by postal authorities. We recommend LibPostal for parsing. If this level of coding is not your jam, we have a solutions team that can implement it for you.

Try the new /address-similarity endpoint at https://developer.rosette.com/api-doc#!/name-similarity/runAddressSimilarity and let us know what you think.