Natural Language Processing Software for Search Applications Adds 13 LanguagesOctober 4, 2011
Rosette version 7.4 supports more European languages, Indonesian, and Malay
Cambridge—October 4, 2011—Basis Technology Corporation (www.basistech.com), the leading provider of natural language processing software for search-based applications, is now shipping the Rosette® linguistics platform version 7.4, which adds 13 languages. Search engines and text processing applications incorporating Rosette can instantly analyze 40 languages. The newly added languages are Albanian, Bulgarian, Catalan, Croatian, Estonian, Indonesian, Latvian, Malay, Norwegian, Serbian, Slovak, Slovenian, and Ukrainian.
For the supported languages, Rosette returns the dictionary form of each word, enabling search engines to match all occurrences of keywords regardless of the word form. Thus, searching for the verb “spoke” in English would also find occurrences of “speak”, “speaking”, and “speaks.”
“Basis Technology is continually expanding the coverage and capabilities of our linguistic software, because it forms the foundation of relevant and accurate search in many languages,” said Steve Kearns, Product Manager at Basis Technology. “Rosette provides a wide range of text analysis functionality in one package that is the choice of search industry leaders and new startups alike.”
About the Rosette Linguistics Platform
Rosette’s natural language processing components integrate into software applications to add multilingual capability to search and retrieval, business intelligence, e-discovery, digital forensics, and financial compliance applications.
Rosette Language Identifier determines the written language and character encoding of each indexed document, and is capable of recognizing 55 languages and 45 encodings. Rosette Base Linguistics tokenizes and lemmatizes text in 40 languages at index or query time. Determining the “lemma”—i.e., dictionary form—of each indexed word sharpens search engine relevancy. This technique enables queries containing, for example, the word “children” to match documents containing the word “child.” Rosette Entity Extractor automatically extracts “entities”—e.g., names of people, places, and organizations—to enable document clustering and faceted search. Rosette Name Translator quickly and accurately translates Middle Eastern and Asian Names to English. Rosette Name Indexer resolves name variations despite spelling and language differences.
About Basis Technology
Basis Technology develops innovative products and solutions incorporating multilingual text analytics and digital forensics. Our Rosette® linguistics platform provides morphological analysis, entity extraction, name matching, name translation, and Arabic chat translation, yielding useful information from unstructured data in such fields as information retrieval, government intelligence, e‑discovery, and financial compliance. Our digital forensics team pioneers better, faster, and cheaper techniques to extract forensic evidence, keeping government and law enforcement ahead of exponential growth of data storage volumes.
Our products and services are used by over 250 major organizations, including Amazon.com, Clearwell, EMC, Endeca/Oracle, Exalead/Dassault, Fujitsu, Google, Hewlett-Packard, Microsoft, NetBase, Oracle, and governments around the world. Learn more at www.basistech.com or call +1-617-386-2090.