Connecting to Asia with Rosette API

Latest 1.5 API release renews commitment to global data analytics

We kicked off 2017 with some major updates to Rosette API (version 1.5). There are lots of  new features to play with, so we’re breaking down the key updates in a series of blog posts to share with our awesome users.

One common thread among the updates included in 1.5 is a focus on Asian language coverage. In an increasingly global economy, the need for tools to process multilingual data in it’s native language is growing exponentially. This is especially true when companies expand beyond Latin script languages into Asian and Eurasian scripts.

We entered the text analytics business over 20 years ago to help Western tech giants like Google and Amazon reach Asian markets, and the Rosette API 1.5 release reinforces our commitment to connecting linguistically diverse markets.

Name analytics for Asian languages

As the industry leader in fuzzy name matching, we’re intimately familiar with  the intricacies of names. Translation errors, nicknames, misspellings and other discrepancies make accurate name matching extremely difficult, whether for a simple business task like database deduplication, or vital issues of national security like monitoring government watch lists. Until now, our name matching software has been focused on the English speaking world. While we supported 15 languages, users could only match to or from English.

Rosette API 1.5 launches an exciting new chapter to our name matching story with the introduction of cross-lingual matching between three Asian languages. Rosette now allows you to score name similarity between Japanese and Chinese, Japanese and Korean, and Korean and Chinese. As an added bonus, we also made significant improvements to Japanese matching accuracy with some help from our state of the art text embeddings.

Translating names can also be a hairy issue when working with diverse scripts and cultures. For example, the Japanese name 山中 means “in the mountain.” However, as a name, it should be translated by pronunciation, “Sanchu,” rather than meaning. We’ve added the same cross-lingual Chinese, Japanese and Korean name translation coverage as well.

Vietnamese entity extraction

We’re also excited to share that we’ve added Vietnamese to our list of supported languages for entity extraction, bringing our total to 20! In addition to Vietnamese, the Rosette API performs entity extraction in five other Asian languages: Chinese (simplified and traditional), Japanese, Korean, Malay, and Indonesian, plus 14 other languages from around the world.

That’s not all! We also added social media entity linking to our Chinese and Japanese entity extraction endpoints . This allows you to not only extract entities from short social media content, but to automatically connect those entities to a knowledge base. We pre-built the API to link to Wikidata, but our on-premise solutions can be customized to link to your internal knowledge base as well.

Japanese sentiment analysis

While we don’t like to play favorites among our many endpoints, sentiment analysis is certainly one of the most buzzed about capabilities in the natural language market right now. In fact, it’s getting so much hype that we wrote an entire separate blog post about it. Check it out!