Like many other languages and cultures, Hebrew names have been influenced by languages of countries where Jewish people lived, but also by the fact that within the last half century, Hebrew became a living language again, with native speakers creating their own unique naming traditions. This blog will focus on Jewish names in Hebrew and […]
Category: Rosette Cloud
How to Write Annotation Guidelines for Entity Extraction
Solid annotation guidelines are an essential requirement for producing good training data. These guidelines distinguish correct from incorrect results, define the task and ensure that the annotation process is reliable and repeatable for independent human annotators.
Rosette 1.17.0 Release: Hebrew Name Translation, French Semantic Similarity, Robust Address Matching
Recent Rosette® Cloud and Enterprise releases (1.17.0, 1.16.1) bring expanded language coverage to name translation and semantic similarity, and ease of use to the address matching capability within Rosette Name Indexer. We have also made improvements to Arabic-Arabic and Arabic-English name matching, as well as better morphological analysis in various languages. Hebrew name translation Name […]
Building a More Useful Hebrew Transliteration Scheme
When the Rosette® Name Translator team set out to build a Hebrew-to-Latin character translator, one of the first considerations was: Which Hebrew transliteration standard should we use? As the joke goes, “Standards are great because there are so many to choose from.” The existing Hebrew transliteration standards, ISO 259-2:1994 and UNGEGN (United Nations Group of […]
Rosette 1.14 Release: Entity linking to Thomson Reuters PermID, Multi-model language identification
The August release of Rosette 1.14 brings new features to entity extraction and linking, as well as language identification. Roadmap for linking entities to multiple knowledge bases In addition to linking to entities in Wikidata and DBpedia, entity extraction Rosette will ultimately link to multiple knowledge bases, including Thomson Reuters open PermID. PermID covers a […]
How Data Annotation Works: Inside NLP and Search, Part IV
Interested search technology—or AI generally? Over the next four weeks, we’re going to take an in-depth (and interesting!) look at the technology that makes modern search tick. This week, we’re breaking down step by step how data annotation works. How Entity Annotation Works It should come as no surprise that an entity extractor requires a […]
Why Data & Data Annotation Make or Break AI: Inside NLP and Search, Part III
Interested search technology—or AI generally? Over the next four weeks, we’re going to take an in-depth (and interesting!) look at the technology that makes modern search tick. Today we’re digging into data and how it’s prepared. Data: The Building Blocks of AI Machine-learning algorithms don’t just spring from nothing. Before they can extract or link […]
Entity Linking and Too Many (Tim) Cooks: Inside NLP and Search, Part II
Interested search technology—or AI generally? Over the next four weeks, we’re going to take an in-depth (and interesting!) look at the technology that makes modern search tick. This week, we’re talking all about entity linking. Linking This involves two steps. The first is entity linking, which correctly ties each extracted entity to a knowledge base […]
Natural Language Processing Search Engines
Interested search technology—or AI generally? Over the next four weeks, we’re going to take an in-depth (and interesting!) look at the technology that makes modern search tick. Let’s dive in with a look at the role AI plays in modern search engines. Introduction to Natural Language Processing and Search Engines Modern search results are remarkable. […]
Duplicate document detection and cross-lingual search
How to automate mundane tasks and find relevant text using text embedding Numbers are great, because they are easy to compare, tabulate and examine. Text? Not so much. But text embeddings let one manipulate and compare the meaning behind words and text like numbers. Basically, text embeddings convert words, phrases, or even whole documents into […]
Names Search for the Modern Health Agency
Storing, accessing and sharing electronic medical data with intelligent patient matching In the age of smartphones, cloud storage, and the internet of things, we have come to expect the information we want to be at our fingertips in seconds. One notable exception to this rule is medical records. Individuals viewing test results, doctors accessing a […]
Word Embeddings for Fuzzy Matching of Organization Names
Rosette’s name matching is enhanced by word embeddings to match based on semantics as well as phonetics Tracking mentions of particular organizations across news articles, social media, and internal communications is integral to the workflow of dozens of use-cases across industries. However it can be especially challenging to match names of companies and organizations because […]
Add Sentiment Analysis, Translated Names, Entities and More to Elasticsearch
New text analytics plugin painlessly delivers rich, faceted search An API key and a line of code is all it takes to speed your research, enhance voice of the customer systems, automate content recommendations and more. Rosette API for Elasticsearch We launched Rosette API last year to put text analytics in more hands. Through the […]
Rosette API Adds Support for “Arabizi” Script
Tackling the challenge of Arabic chat written in Latin script The Arabic chat language, known as “Arabizi” or “Arabish”, is a casual version of written Arabic that appeared when Arabic speakers began using Western keyboards on mobile phones and computers to spell out their native language with the Roman alphabet. With the growth of digital […]
How Emoji Reflects Our Evolving Society
Or, what does it mean to lemmatize and normalize emoji for text analytics? Emoticons 🙂 and emoji 😀😆 add a bit of the nonverbal communication that humans inherently crave in our electronic communications. The addition of a winking face 😉 softens a potentially harsh statement or expresses shared camaraderie far more succinctly and immediately than […]
Rosette API 1.7 Release
Great news! Yesterday we released Rosette API v. 1.7. We added support for Arabic sentiment analysis (beta), confidence scores for all extracted entities, pronominal resolution in targeted relationship extraction, and a new /transliteration endpoint for transforming romanized Arabic chat text (“Arabizi”) to standard Arabic script. We also introduced specialized linguistic analysis for emojis, emoticons, hashtags, […]
Vive la République et Vive la France!
Analyzing English and French tweets during the election weekend using Rapidminer and Rosette After the surprising results of the U.S. presidential election and the UK “Brexit” vote, many expected another populist upset in France’s recent election. As we now know, Emmanuel Macron of En Marche! defeated populist candidate Marine Le Pen of the National Front. […]
Are Positive or Negative Tweets More “Retweetable” in Brazilian Politics?
Spotlight on our Data Scientist Challenge Winner from the Rosette API Academic Program This winter, Basis Technology held a Data Scientist Challenge to encourage students in the Rosette API Academic Program to use both Rosette API and Rapidminer Studio in the data analytics project of their choice. The aim was to showcase how easy it can be […]