SOLUTIONS


Rosette for Search-Based Applications

A search engine plug-in for searching in Japanese, Chinese, Arabic, German or over 20 languages

Rosette's linguistics support search in these languages:
  • Arabic
  • Chinese/Simplified
  • Chinese/Traditional
  • Czech
  • Dutch
  • English
  • French
  • German
  • Greek
  • Hungarian
  • Italian
  • Japanese
  • Korean
  • Farsi/Dari
  • Polish
  • Portuguese
  • Russian
  • Spanish
  • Urdu


The Rosette Language Identifier identifies 55 languages in 39 encodings.



Search-based applications are all about getting a job done, and accurate, comprehensive search is the first step. Rosette Linguistics Platform is the commercially available linguistic technology chosen by enterprise and web search engines such as Google, Yahoo!, and Bing to enable search in many languages.

How Rosette Improves Multilingual Search

Rosette's components are designed for speed and accuracy supporting small and large search indexes.

  • Language Identification — automatically classifies documents by language and encoding as the first step in indexing or querying
  • Segmentation/Tokenization — breaks up input text or queries into words, using built-in linguistic data about each language
  • Lemmatization — generates the dictionary form of a word to make search results more comprehensive, finding occurrences of both "race bike" and "racing bike" -- for example
  • Noun Decompounding — divides compound nouns into sub-compounds for comprehensive search results
  • Part-of-Speech Identification — tags a word’s part-of-speech such as noun, verb, or preposition.

How Rosette Plugs into Search Engines

Rosette is available as a software developer's kit which integrates into the search engine of your choice via a C, C++ or Java API. For users of Lucene and Solr, a free connector integrates Rosette into the TokenFilter.

How Rosette Works

Rosette uses a variety of different algorithms, applying the best approach to the specific requirements of each language. Depending on the language, a combination of lexical data, heuristic rules, and statistical models are implemented to provide the best accuracy and speed for all applications.

We designed Rosette to add language support to search engines, so it has the processing speed demanded by large search engines and also highly accurate language detection for very short query terms.