SOLUTIONS


Rosette for Lucene & Solr
Read how you can build a global-ready search server using Apache Lucene or Solr using the Rosette Linguistics Platform

Building a Multlingual Search Engine with Apache Lucene



What is Lucene?

Lucene is an open source search toolkit library whose development is sponsored by the Apache Software Foundation.

What is Solr?

Solr is an open source, web-based search service that runs on top of Lucene, also sponsored by the Apache Software Foundation. It adds a schema, administrative tools, cache management, replication and faceted browsing.


Reliable, multilingual search that's easy to deploy

The same multilingual text processing technology used by industry-leading search engines Google, Microsoft, and Yahoo is now available for the open source search solution, Apache Lucene and Solr.

Deploy in Days

Out-of-the-box you can connect Basis Technology’s Rosette Linguistics Platform to Apache Lucene and have robust and accurate multilingual search up and running on your website, intranet or internal network.

Dependability you can bank on

Lucene — a high-performance open source search toolkit — is a popular search solution with over 3,000 installations in organizations including IBM, CNET, and Wikipedia. Rosette has a ten year track record of providing linguistic intelligence to meet the demanding accuracy and performance required by major search and text mining providers.

Search to the standards of enterprise search vendors

  • Language identification and full-text search in 54 languages
  • Linguistically improved search in 19 languages including major European languages, Arabic, Japanese, Chinese, and Korean. (Read more)
  • Entity extraction and faceted navigation in 12 languages
  • A scalable, high performance architecture (Read more)
Users enjoy the same quality of experience with Lucene they have come to expect from their favorite web and enterprise search engines.

Request an evaluation copy of Rosette today with the “Rosette for Lucene” module.

All You Need To Do

Download and install the Rosette SDK or runtime package. Lucene leverages Rosette functions and passes along information such as the location of documents to be indexed. The “Rosette for Lucene” module enables Rosette to connect to Lucene out of the box. No additional work is needed for Lucene to search text in any language Rosette supports.


Rosette Linguistic Capabilities:

  • Language Identification: Identifies the language a document is written in.
  • Language–specific processing: The base linguistics function of RLP is the starting point for building a search index and refining queries. Advanced linguistic features improve precision and recall of search results.
  • Segmentation and tokenization:Separates streams of text into unique word tokens, especially needed for languages -- such as Chinese and Japanese – written without spaces between words.
  • Lemmatization: Provides the dictionary form for an inflected word to improve recall.
  • Noun decompounding: Separates compound words (such as in German and Dutch) into their separate components to improve recall.
  • Part-of-speech tagging: Improves precision and recall.
  • Entity extraction: Extracts entities to enable faceted search on key names and entities in search results.

Apache Lucene Performance & Scalability

  • Thread safe
  • Cross-platform solution
  • Support for multiple cores
  • Small RAM requirements
  • Incremental indexing as speedy as batch indexing
  • Index is only 20-30% the size of text indexed
  • Powerful search algorithms

For more details, see http://lucene.apache.org/.