An Elasticsearch Plugin for Simple Fuzzy Name Matching
Normalization is crucial to high quality search results — who wants irrelevant variations between queries and documents leading to missed hits (e.g., “celebrity” v. “celebrities”)? Normalizing dictionary words works, but what if your application focuses on names? Whether you’re tackling log analysis, e-commerce, watch list screening or other applications, names are often the key. Can you find “Abdul Jabbar, Karim” if you search for “Kareem AbdalJabar” or “كريم عبد الجبار”?
Applications using Elasticsearch provide some fuzziness by mixing its built-in edit-distance matching and phonetic analysis with more generic analyzers and filters. We’ve tried to go beyond that to provide both better matching and a simpler integration. We use a custom Mapper and Score Function so that linguistic nuances can be handled behind-the-scenes. At this Meetup, we talk about how we built this sort of plug-in for Rosette, its customization, and its connection to broader trend of entity-centric search.
Chris Mack is the Director of Customer Engineering for text analytics at Basis Technology. Chris’s team designs solutions and delivers services to adapt text analytic components for a broad range of customer problems. Chris has spent the last 20 years in software development, data analytics, business strategy, and business operations. Chris received his BS in Management from Bentley University where he also studied Computer Information Systems.
This Meetup also includes Elasticsearch 2.0 Release with Q&A. In this talk Ryan Ernst walks through new features, breaking changes, and migration strategy for this major release.
Ryan Ernst is an Apache Lucene committer and PMC member. He is an Elasticsearch developer and enjoys working on anything with bits. Prior to Elasticsearch, he worked on Amazon’s Product Search and AWS CloudSearch.
Listen to both talks from our November 16, 2015 Meetup with Elastic here.