Category: Text Analytics

Elasticsearch and Fuzzy Name Matching Meetup, World Tour

Normalization is crucial to high-quality search results — who wants irrelevant variations between queries and documents leading to missed hits (e.g., “celebrity” v. “celebrities”)? Normalizing dictionary words works, but what if your application focuses on names? Whether you’re tackling log analysis, e-commerce, watch list screening or other applications, names are often the key. Can you […]

Language Learning Gets a Boost From Lingua.ly

New browser extension accelerates language learning on the Web Lingua.ly—the latest innovative business to take advantage of the Basis Technology Startup Program—was making a splash in the Chrome Web Store last week, where the editorial team loved it so much they featured it on their central banner. The extension incorporates language learning into the context […]

Fuzzy Name Search and Name Matching Presentations in San Francisco

Names connect data points and are frequently the most important piece of information in a document. But unlike common nouns and verbs, they defy standardization, making them an elusive search target. But you can, in just two days, go from “neophyte” to “well-informed” in the realm of fuzzy name searching and matching. Basis Technology’s VP […]

Accurate Language Detection for Queries & Tweets

Doubles the Accuracy of Existing Language Identification Software Basis Technology’s Rosette language identification function has been improved to solve the problem of language detection for short texts. Existing language detectors require many words to confidently identify the language of a string of text, and are therefore unreliable when trying to detect the language of queries, tweets, photo […]

Can you rely on the Treasury Department’s Sanctions List Search?

When the United States wants to prohibit its citizens and corporations from doing business with a foreign national, that individual is added to the Specially Designated Nationals list maintained by the Office of Foreign Assets Control of the US Department of the Treasury. One person on that list is Chabaane Ben Mohamed al-Trabelsi*, a Tunisian […]

A Better Pure Java RegEx Engine

Regular expressions are ubiquitous in NLP, not to mention many miscellaneous text-processing tasks. People use regular expressions as a quick solution to matching and parsing. People build entire complex extraction systems with regular expressions. Some people get themselves into serious trouble by trying to apply them beyond their natural limits. Once upon a time, regular […]

Interview: The Future of Human Language Technology

Our VP of Engineering, David Murgatroyd, was recently invited to provide input to a US Government effort to chart the future of Human Language Technology (HLT). Input was collected as a question and answer, with questions bolded. He was asked to respond with respect to one task area: Triage, Translation Support or Knowledge Discovery. He […]

Keeping pace with the ever-changing name of ISIS through the lens of Wikipedia

If you’ve followed recent events in Syria and Iraq, then you’ve surely heard of an organization that at various times is referred to as ISIL (Islamic State of Iraq and the Levant), ISIS (Islamic State of Iraq and Syria), or just IS (Islamic State). While the New York Times recently decided to use “ISIS” in […]