Name Matching Gets a Shot in the ARM (chip)
Or, How We Made Name Screening Efficient and Eco-friendly
In this webinar Basis Technology’s Declan Trezise (Director of Global Solutions Engineering) and Peter De Bie (Senior Sales Engineer) discussed the interesting confluence of modern AI technology with the exploding adoption of the highly efficient ARM architecture chip. This led to Rosette® name matching technology customers asking: “Will Rosette run on ARM?”
Currently, Rosette Name Indexer, our name matching technology, churns through an estimated 480 million name matches on mission critical systems every day. These include border security, financial compliance name screening against watchlists (with some lists as large as 15 million names), and matching paper documentation to applications for health insurance eligibility through the U.S. Affordable Care Act.
Rosette is just one of many modern AI tools that are very computationally intensive, so the question is, “How much does it cost in terms of servers and power, and how many trees do I have to burn to keep my country safe?”
Then along came ARM chip architecture. ARM uses smaller, simpler instructions, and makes it possible to use the same computing power as existing chips, while performing computations faster and consuming much less energy. Not only is it your smartphones and laptops that are running on ARM, but ARM has moved into the data center, including Amazon Web Services Gravitron.
The short answer to whether our name matching technology would run on ARM is this: the Java portion of our code worked as-is, and we saw very promising performance increases in the 27 to 30% range. Then our core engineering team signed up to port the C/C++ native code portion of Rosette Name Indexer to work on ARM and finished in just six weeks! We also saw performance gains of 28 to 30%, as measured in milliseconds per 100 queries, with each query executing between 50 and 100 name comparisons.
The combination of reduced hardware needs (costs) plus performance gains in speed added up to an estimated 43.6% cost savings per year, which is huge for many of our customers running large-scale systems. We have some customers at the moment testing out this new version, which will hopefully enter production in the near future.
Select Q&A from our webinar audience
Q. Do you merge records for deduplication?
A. We focus on the matching of the person or organization names, as well as matching dates and addresses. It’s all these attributes combined that help determine if records are duplicates. There are then many ways you can use this information, such as for list deduplication. We do this on a small scale, but our professional services team would help with large-scale deduplication of lists. This question comes up often in master data management (MDM) for merging data from disparate databases, such as to gain a single, 360-view of customers, but it is generally delivered by us as a services engagement.
Q. How do you quantify your matching process?
A. Some may be talking about measuring quality, accuracy or speed. The metric depends on the customer. What we provide is a similarity score between names, whether it’s searching for a name and getting a list of potential matches that Rosette ranks from the highest to lowest match score. It’s up to the customer to determine what the threshold score is for those names defined as a “match.” Do you err on the side of recall (fewer missed matches), such as the case of border security? Or, for identity verification in the case of a person applying for a loan and you are trying to decide if an ID matches the applicant, you might value higher precision (fewer false matches). Rosette has a flexible set of configurations that can be adjusted for each domain. We quantify it based on how our customers need to use it.
Q. Follow-up question: If a client asks, “How good is your matching process?”, how do you prove your process is better than someone else’s?
A. It depends on your use case. Is it fighting crime/terrorism and border security? That risk appetite might be different than a bank doing checks on their customers. The risk taken could be higher depending on the type of bank. Maybe some banks can afford a bit more risk than government agencies. The quality is in how many false negatives (missed matches) you are living with. Are you using a technology that is looking broadly enough for match candidates? What kind of effort and hardware are you putting in to make sure you don’t miss anyone you shouldn’t? But it’s always expressed in how much you can afford a false negative. It’s also about workload for time spent investigating false positives (wrong matches). How important is precision to you for day-to-day operations to avoid wasting time on false positives?
If a customer comes to us with a test data set, we can run that evaluation for you quickly and give you an F-score (indicator of accuracy) on that. But we have all done this many times to know there’s more than just the idealized factors of “How good is your matching process?” We invite anyone to come try us. We can also look at specifically tuning for the name phenomena that your data has, to make the performance even better. You do need to measure to know how good a matching engine is.
Q. Do you offer name matching as a service or is it only on-premise?
A. For a lot of customers, their data is highly sensitive and secure, so they prefer it on-premise. That ends up being the majority of deployments. That’s not to say we can’t provide name matching as a service, but we would be happy to talk to anyone who would be interested. Name matching in the cloud for pairwise matching is offered as a custom plan, but it’s not part of the public Rosette Cloud. We could host that for you on AWS for pairwise name matching. But for matching against a list, it requires a search engine to hold the watchlist, such as Elasticsearch, the platinum Elastic Cloud, to install our plugin there.