29 Aug 2018
Blog

Rosette Cloud 1.11: New Entity Types, Hungarian Names, and Cross Language Semantics


We’re thrilled to announce the latest version of Rosette (1.11). It’s a big one — lots of exciting new features, enhancements, and improvements. We hope you’ll check it out!


TL; DR check the release notes.


Entities: Enhanced Extraction and Linking with New Types

Rosette Entity Extraction & Linking now recognizes 700 new classes of entities in a new labs feature that supports English, Spanish, Japanese, and Chinese. Drawn from the DBpedia ontology, these entity types include AIRCRAFT, DRUG and EVENT, among many others.

For example, in previous versions, the Rosette /Entities endpoint had no means of tagging entities like“Ibuprofen”. Now, with the new `DBpedia type` feature enabled, “Ibuprofen” will be tagged as “SUBSTANCE”, linked to the WikiData ID: Q186969, and assigned the DBpedia tree ”ChemicalSubstance/Drug”.

Names: Hungarian

Name Similarity and Name Deduplication now support Hungarian names. Unlike most western countries, Hungarian names use the “Eastern name order,” where family name precedes given name. This can be confusing for business systems that specify “First Name” and “Last Name”. Well known investor “George Soros” is written as “Soros György” in Hungarian. Out-of-order name components are just one of the 13+ name match phenomenon solved by Rosette.

Text Embedding: Cross-language semantics in four new languages

Rosette’s Text Embedding endpoint now supports Russian, North Korean, South Korean, and Arabic. These multilingual vectors provide the foundation for cross-lingual semantic search. Check out our crash course blog posts in text embedding, part one and part two, for more information.