Basis Technology Startup Program Empowers Big Data Entrepreneurs

Access to Industry-Leading Multilingual Text Analytics Speeds Innovation

CAMBRIDGE, Mass., Nov. 10, 2014—Basis Technology’s Startup Program, which makes Rosette’s professional-grade, multilingual text analytics accessible to high-impact, early-stage startups, is off to a great start.

“This year’s applicants to the Startup Program are pursuing a wide range of text-based challenges, from big data curation, to examining the nuances of word choice in a single document,” Carl Hoffman, CEO of Basis Technology said. “We’ve seen the pace of innovation in this area increase exponentially in recent years as the massive onslaught of data—the majority of which is text—continues to outstrip human capacity to cope with it.”

Startups are recognizing that by turning to Rosette’s professional-grade text analytics, they can focus on their core innovation, and avoid spending time cobbling together an open source solution, or reinventing the wheel by developing their own text analytics solution. Rosette analyzes text in over 40 languages providing language identification, part-of-speech tagging, base noun phrase detection, entity extraction, name translation, and fuzzy name matching.

Successful early alumni of the program include Luminoso (teaching computers to think with human-like common sense reasoning), Recorded Future (mining web data to help firms anticipate risks and capitalize on opportunities), and Zoomd (anticipating and offering the “next action” to web surfers to increase site stickiness and site revenues).

“Rosette enables us to expand beyond English to collect intelligence from a global network of sources in European, Asian, and Middle Eastern languages, resulting in deeper insights for our customers,” said Christopher Ahlberg, CEO & co-founder of Recorded Future.

Here is just a sampling of the innovations Basis Technology has worked with this year.

Document Authorship Identification

Part-of-speech was never so exciting as when it’s used to analyze suspicious documents to identify authorship in criminal, civil, and security matters. Should the threatening note be taken seriously? Did person X really write this suicide note? A forensically rigorous software application looks to use part-of-speech information for English and other languages to do so.

The Web As Living Language Textbook

Part-of-speech information helps another startup turn the web into a “living textbook” for students learning over 18 foreign languages. The startup’s application seeks out content pitched to the correct difficulty level for the student. It learns the student’s interests and lets the student collect vocabulary from anywhere on the web.

Web Browsing Peek-Ahead

In the category of browser enhancements, the detection of base noun phrases (noun plus its modifiers) and entity extraction (identifying people, places, and organizations in text) enables another startup to show snippets of content tailored to the user’s interest by hovering over URLs.

Data Curation

The tricky business of matching names—overcoming typos, spelling variations, nicknames—is essential for one startup in data curation, which automates the previously manual labor of connecting and enriching data from multiple sources through a combination of machine learning and human guidance. Name matching is one pivot the system uses to de-duplicate records.

Road Race Records Collation

Large foot races, like the Boston Marathon, stagger starts to put the fastest runners up front. Until now, collating the race records of each runner was a huge, tedious, manual task. Just the top 50 races in the U.S. include about 40,000 names! Add in international races, and some names might not be in the A-to-Z script, thus requiring cross-lingual name matching (which Rosette handles!)

If a full stack of linguistics across 40+ languages accessible through one API can make a difference in your startup, register today at

About Basis Technology

Basis Technology develops innovative products and solutions incorporating multilingual text analytics and digital forensics. The Rosette linguistics platform provides morphological analysis, entity extraction, and name matching and translation, yielding useful information from unstructured data in such fields as information retrieval, government intelligence, eDiscovery, and financial compliance. The digital forensics team pioneers better, faster, and cheaper techniques to extract forensic evidence, keeping government and law enforcement ahead of the exponential growth of data storage volumes.