Rosette is now part of Babel Street! Read more >>

Tag: text embedding

Cross-Lingual Search Based on Concepts and Meaning

We’ve recently released this whitepaper which explores a new way to solve cross-lingual semantic search. Rather than use machine translation to translate queries or search records, this approach delivers better accuracy based on semantics, not translation. Semantic search (aka, concept search) goes beyond finding keywords, to retrieving ideas suggested by the keywords. In part 1 […]

Duplicate document detection and cross-lingual search

How to automate mundane tasks and find relevant text using text embedding Numbers are great, because they are easy to compare, tabulate and examine. Text? Not so much. But text embeddings let one manipulate and compare the meaning behind words and text like numbers. Basically, text embeddings convert words, phrases, or even whole documents into […]

Rosette Cloud 1.11: New Entity Types, Hungarian Names, and Cross Language Semantics

We’re thrilled to announce the latest version of Rosette (1.11). It’s a big one — lots of exciting new features, enhancements, and improvements. We hope you’ll check it out! TL; DR check the release notes. Entities: Enhanced Extraction and Linking with New Types Rosette Entity Extraction & Linking now recognizes 700 new classes of entities […]

Word Embeddings for Fuzzy Matching of Organization Names

Rosette’s name matching is enhanced by word embeddings to match based on semantics as well as phonetics Tracking mentions of particular organizations across news articles, social media, and internal communications is integral to the workflow of dozens of use-cases across industries. However it can be especially challenging to match names of companies and organizations because […]

Minds Converge: A Machine Learning Meeting in Toulon

Basis Technology R&D presents at the International Conference on Learning Representations in France The International Conference on Learning Representations (ICLR) is an annual gathering of leading machine learning experts working in both industry and academia. This year’s conference was held from April 24-26 in Toulon, France. ICLR focuses on a broad range of subjects, with […]

Using Deep Learning to Power Multilingual Text Embeddings for Global Analysis, Part II

Wait! Have you read Part I yet? Check it out, then come on back.  Putting Text Embeddings to Work Using the updated text embeddings endpoint in Rosette API 1.5, you’ll notice significant accuracy improvements on longer strings of text, both sentences and documents. We’ve also begun to incorporate text embeddings into some of our higher […]

Using Deep Learning to Power Multilingual Text Embeddings for Global Analysis, Part I

A Crash Course in Basic Text Embeddings A chronic problem with using machines to analyze human language is that the same meaning can be expressed using many different words. Take for example the sentence “Bill Gates was educated at Harvard.”  There are many ways to express this relationship: Bill Gates studied at Harvard, Bill Gates […]

Rosette API 1.5 Released

Today we’re pleased to announce the launch of Rosette API version 1.5! Updates include new targeted relationship extraction (replacing the previous “open” relationship extraction), changes to entity linking and extraction,  improved text embeddings, and expanded support for Chinese, Japanese, Korean, and Vietnamese, including sentiment analysis for Japanese text (beta). What’s new?   Targeted Relationships The […]

Never be duped by fake news again with TrustServista

Rosette brings the text analytics power to news data startup, Zetta Cloud The US presidential election has made it clear that fake news is spreading across the internet like a virus. While shared knowledge is one of the many perks of our increasingly connected online society, it also has a darker side: the opportunity for […]

Notes from the Lab: Fueling New Research into Machine Learning with Wikidata

Basis Technology R&D team pioneers new technique and open sources WikiSem500, a dataset for multilingual word embedding evaluation The most time consuming and expensive aspect of machine learning research is data preparation—aggregation and cleaning—and every data scientist has been frustrated by it. However the importance of good, testing data makes it hard to cut corners. […]

Deep Learning Powers Cross-Lingual Semantic Similarity Calculation

Text Embeddings Now Available in the Rosette API The Rosette API team is excited to announce the addition of a new function to Rosette’s suite of capabilities: text embedding. This endpoint returns a single vector of floating point numbers for your input, a.k.a. an embedding of your text in a semantic vector space. Text embeddings […]