Rosette is now part of Babel Street! Read more >>

Tag: Rosette Cloud

Building a More Useful Hebrew Transliteration Scheme

When the Rosette® Name Translator team set out to build a Hebrew-to-Latin character translator, one of the first considerations was: Which Hebrew transliteration standard should we use? As the joke goes, “Standards are great because there are so many to choose from.” The existing Hebrew transliteration standards, ISO 259-2:1994 and UNGEGN (United Nations Group of […]

Duplicate document detection and cross-lingual search

How to automate mundane tasks and find relevant text using text embedding Numbers are great, because they are easy to compare, tabulate and examine. Text? Not so much. But text embeddings let one manipulate and compare the meaning behind words and text like numbers. Basically, text embeddings convert words, phrases, or even whole documents into […]

Rosette Cloud 1.11: New Entity Types, Hungarian Names, and Cross Language Semantics

We’re thrilled to announce the latest version of Rosette (1.11). It’s a big one — lots of exciting new features, enhancements, and improvements. We hope you’ll check it out! TL; DR check the release notes. Entities: Enhanced Extraction and Linking with New Types Rosette Entity Extraction & Linking now recognizes 700 new classes of entities […]

Eight Languages Added to Rosette 1.10.0

Match and translate Greek names • Extract sentiment from Persian text Rosette text analytics enables users to extract value from unstructured text. All of our capabilities are engineered with a multilingual architecture that enables expansion to any language. By processing text in the native language, Rosette delivers higher accuracy than solutions that rely on machine […]

Rosette Cloud 1.9: More Languages, Higher Accuracy, and Deep Neural Nets

Rosette Cloud 1.9 is out, delivering a new language for name matching, translation, and deduplication: Thai. We’ve also added a new deep neural network model for sentiment analysis, entity extraction offsets, salience scores for topic extraction, and more. Learn more below, or jump to the release notes. Name Matching The /name-similarity, /name-translation, and /name-deduplication endpoints […]

A Smarter Approach to Linguistic Comparison and Word Clouds

New community recipe enables vocabulary comparison and word cloud generation Every individual has a unique way of speaking and writing based upon their experiences, personal style, and culture. For the data scientist, analyzing, comparing, and visualizing the vocabularies of different texts can reveal valuable insights for applications such as data cleansing and authorship identification or […]

Just the Important Entities, Please

Salience scores and linking confidence scores for extracted entities come to Rosette Cloud Data scraped from the web is often very noisy and cumbersome to work with. Sorting through it to find the most valuable information is a vital step in converting raw data into actionable insights. The release of Rosette Cloud 1.8 aims to […]

A Document’s Vital Stats: Keyphrases and Concepts

New Rosette Cloud topics endpoint enables summarization, content organization and trend analysis We are creating new content online at an unprecedented rate. Globally, we compose 3.6 trillion words every day on email and social media, the equivalent of 36 million books.* Managing and deriving value from that volume of text data can only hope to […]

Rosette Cloud 1.8 Adds Topics, Salience, French Sentiment, and More

We’re excited to announce we released Rosette Cloud 1.8, including a new /topics endpoint. The topic extraction endpoint returns key phrases extracted from the input text, as well as general concepts that may not be explicitly mentioned. /topics can be used to tag and sort a large corpus of documents, so you can automatically filter […]

Relax, Your Sensitive Data Is Secure

A new Rosette Cloud script enables you to hide personally identifying information (PII) in your documents and data Often organizations need to share documents and information that may include personally identifiable information, whether out of good conscience or by legal mandate. Going through documents manually to identify and remove all potentially compromisable data is time […]

Add Sentiment Analysis, Translated Names, Entities and More to Elasticsearch

New text analytics plugin painlessly delivers rich, faceted search An API key and a line of code is all it takes to speed your research, enhance voice of the customer systems, automate content recommendations and more. Rosette API for Elasticsearch We launched Rosette API last year to put  text analytics in more hands. Through the […]

Rosette API Adds Support for “Arabizi” Script

Tackling the challenge of Arabic chat written in Latin script The Arabic chat language, known as “Arabizi” or “Arabish”, is a casual version of written Arabic that appeared when Arabic speakers began using Western keyboards on mobile phones and computers to spell out their native language with the Roman alphabet. With the growth of digital […]

How Emoji Reflects Our Evolving Society

Or, what does it mean to lemmatize and normalize emoji for text analytics? Emoticons 🙂 and emoji 😀😆 add a bit of the nonverbal communication that humans inherently crave in our electronic communications. The addition of a winking face 😉 softens a potentially harsh statement or expresses shared camaraderie far more succinctly and immediately than […]

Introducing: Rosettepedia

A text analytics recipe for entity extraction enhancement The Rosette Cloud team is always hard at work devising ways for our users to get more value from their unstructured text data. Last month we published a recipe on our community Github that combined multiple Rosette endpoints to produce document summaries. This month, we’re thrilled to […]

Vive la République et Vive la France!

Analyzing English and French tweets during the election weekend using Rapidminer and Rosette After the surprising results of the U.S. presidential election and the UK “Brexit” vote, many expected another populist upset in France’s recent election. As we now know, Emmanuel Macron of En Marche! defeated populist candidate Marine Le Pen of the National Front. […]

Are Positive or Negative Tweets More “Retweetable” in Brazilian Politics?

Spotlight on our Data Scientist Challenge Winner from the Rosette API Academic Program This winter, Basis Technology held a Data Scientist Challenge to encourage students in the Rosette API Academic Program to use both Rosette API and Rapidminer Studio in the data analytics project of their choice. The aim was to showcase how easy it can be […]

Using Deep Learning to Power Multilingual Text Embeddings for Global Analysis, Part II

Wait! Have you read Part I yet? Check it out, then come on back.  Putting Text Embeddings to Work Using the updated text embeddings endpoint in Rosette API 1.5, you’ll notice significant accuracy improvements on longer strings of text, both sentences and documents. We’ve also begun to incorporate text embeddings into some of our higher […]

Using Deep Learning to Power Multilingual Text Embeddings for Global Analysis, Part I

A Crash Course in Basic Text Embeddings A chronic problem with using machines to analyze human language is that the same meaning can be expressed using many different words. Take for example the sentence “Bill Gates was educated at Harvard.”  There are many ways to express this relationship: Bill Gates studied at Harvard, Bill Gates […]