Why Tokenization Matters
Search is increasingly becoming the first and most important way to access information. Because of the ubiquity of Google, the depth and breadth of Amazon and the growing amount of ...
Learn moreDelivering More Accurate Search Results with Lemmatization
Many of our commercial and government customers are building extremely powerful and efficient search engines for their own internal or customer's data. Whether they are using open source Solr/Lucene, Elasticsearch ...
Learn moreLucene Revolution: Update from the Conference
The Basis Technology team is here in San Diego at the Lucene Revolution conference! We kicked off the show in style at the opening party, where everyone had a great ...
Learn moreFrom Text to Truth: Real-World Facets for Multilingual Search
Benson Margulies, Basis Technology's CTO, will be presenting From Text to Truth: Real-World Facets for Multilingual Search at Lucene Revolution on May 1, 2013, in San Diego, CA. Excerpt: From Text to ...
Learn moreMatching Marathon Bombers’ Name Misspellings and Name Variants
It has been a tough few weeks for Boston. Many of us here at Basis Technology know of someone who was directly affected by the marathon bombings. Our Cambridge office ...
Learn moreMakisu Day: Rolling Up Innovative Ideas at Basis Technology
Recently, Basis Technology decided to take a break from the routine of Agile/Scrum to have our first ever “hackathon” day. The only rules were: Pick a project that is related to ...
Learn moreHighlight 5 Release Brings Translation of Names from Pashto to English
Highlight 5’s Transliteration Assistant translates a name in Pashto to English, according to the IC standard. For the government intelligence community, the recent release of Highlight version 5 is a huge productivity boost ...
Learn moreUnderstanding Dari and Pashto Names: A Challenge to Intelligence Gathering in Afghanistan
Shakespeare asked “What’s in a name?” It turns out there’s a lot in the name of a typical Afghani including common nouns and personal titles—not used as titles! One of our linguistic ...
Learn moreRosette Version 7.6 Released!
Extract New Entity Types and Languages; Enhanced Name Translation and Matching for Spanish and Persian Names We’re excited to announce the release of Rosette 7.6 which has many improvements, and several ...
Learn moreMission Possible: Connecting Structured and Unstructured Data to Create New Insights
Advanced text analytics can link structured data with unstructured data in ways that were impossible years ago. These capabilities are unlocking insights and enabling new workflows in business domains where ...
Learn moreHaven’t I Met You Before? Cross-Document Coreference Resolution
The Dude: Nobody calls me Lebowski. You got the wrong guy. I’m the Dude, man. Blond Treehorn Thug: Your name’s Lebowski, Lebowski. Your wife is Bunny. The Dude: My... my wi-, my ...
Learn moreIndexing Strategies for Multilingual Search with Solr and Rosette
As a solutions engineer at Basis Technology, I often discuss the integration of Rosette and Apache Solr with our existing and potential clients, who look to Rosette to improve multilingual Solr search in ...
Learn moreMining Gold from Big Data with Text Analytics
Sunday’s New York Times featured a news analysis article about the age of big data and how that means more analysis and technologies are being applied to domains which formerly seemed removed ...
Learn moreArabic and Afghan Name Translation Software Improves Intelligence Analysis
Software suite facilitates inter-agency collaboration; meets federally mandated intelligence standards Cambridge—June 8, 2011—A crucial tool for U.S. intelligence agencies charged with translating foreign languages vital to national security was unveiled today ...
Learn more