AI engine for fast, accurate named entity recognition in 21+ languages

Overview
Full-featured, easily adaptable named entity recognition (NER)
Rosette® Entity Extractor (REX) delivers entities and a rich slate of entity information to enhance your application. Built on a flexible hybrid of processors using different techniques to maximize accuracy for each entity type, REX also:
- Continuously learns through Rosette Adaptation Studio, so it becomes more accurate on your data with a user-friendly, human-in-the-loop interface
- Distinguishes between similar entities by examining document context to link entities to real-life people, places, and organizations
- Highlights the entities that are most relevant to the content of a document with a salience score
- Groups multiple mentions of the same entity in one document (Barack Obama, Mr. Obama, he/ him/his) through coreference resolution
- Swiftly processes massive data with production-level speed and accuracy attained over 15+ years of experience in NER.
Rosette uses a synthesis of machine learning techniques, including perceptrons, support vector machines, word embeddings, and deep neural networks to balance performance and accuracy.
Real world applications
REX can be plugged into any system that needs “just named entities” (for example, metadata enrichment). It also serves as the foundation for event extraction, sentiment analysis, topic extraction, and other higher-level NLP technology.
- Due diligence — Accurately screens for specific people or companies in negative news, not just match words (entity search)
- Intelligence — Identifies types of events and gathers information on people and organizations of interest to enhance knowledge bases (event extraction, entity linking)
- Metadata tagging — Offers precisely targeted searches for content publishers and recommendation engines (entity search)
- Public relations and marketing — Detects spikes of negative social media posts about a company or brand name (entity-level sentiment analysis).
Product highlights
- Supports multiple languages
- Entity linking to knowledge bases
- Coreference resolution
- Hybrid of techniques, including deep learning models
- Confidence scores for each result
- Cloud or enterprise deployments
- Fast and scalable
- Industrial-strength support
- Active development with a minimum of six updates per year
How It Works
Hybrid of techniques
For each entity type, we choose the approach that will produce the most accurate results. In case of conflicting results from different processors, a sophisticated adjudication mechanism selects the answer that is most likely correct. The processors include:
- Statistical modeling and deep neural networks for context-sensitive extraction and never-seen-before entities, and used to find people, locations, and organizations
- Pattern matching to find “rare” entities that fit a defined pattern, such as credit card numbers and dates
- Entity lists for entities that can be exhaustively listed and are relatively unambiguous, such as religions and nationalities.
All three processors can be customized to increase accuracy and find different entities.
When solving a task with NLP, it is important to choose the most efficient algorithm available. Sometimes a list works far better then the most advanced deep learning-based algorithm. The difficult part is coordinating these various algorithms and disambiguating overlapping tags from various sub-engines. For the best possible entity disambiguation, you need a knowledge base. Rosette Identity Resolver is built specifically for that use case.
Entity linking
By default, REX is pretrained to link entity mentions to Wikipedia and DBpedia. By looking at the context of a mention and using the knowledge base entry, REX figures out when an article about “Neil Armstrong” is referring to the astronaut or the hockey referee.
Entity linking also improves REX performance on tweets and very short strings by simultaneously linking and extracting entities.
Continuous learning
A good out-of-the-box NER system can always improve with additional training on the specific data it is analyzing. That’s why REX comes with Rosette Adaptation Studio (RAS). With a user-friendly, point-and-click interface, subject matter experts can review sentences that REX was unsure about and correct errors. REX learns from these corrections to perform better and better on your data.
Quality training data
Rosette’s models are trained on a carefully curated corpus of millions of news articles, social media content, and blog posts. Our in-house team thoroughly annotates the data with native speakers and cross-checks the tags for consistency.
Tech Specs
Availability and platform support
Deployment availability: | |
Plugins: | |
Bindings: |
Supported languages*
Arabic | German | Korean | Spanish | |
Chinese, Simplified | Hebrew | Malay | Swedish | |
Chinese, Traditional | Hungarian | Pashto | Tagalog | |
Dutch | Indonesian | Persian | Urdu | |
English | Italian | Portuguese | Vietnamese | |
French | Japanese | Russian |
Entity types**
Person | Nationality | ID Number | Time | |
Location | Religion | Phone | Lat/Long | |
Organization | Money | Anatomy | ||
Product | Credit Card | Activity | Language | |
Title | URL | Food | Substance | |
Disease | Event | Species | ||
Measure | MISC | Distance | ||
Transport | Number | Date |
*Rosette also supports case-insensitive input for English (i.e., all uppercase or all lowercase text).
**In addition to the entity types above, Rosette recognizes over 450 sub-entity types and will link to a WikiData QID and DBpedia parse tree when it is available. As an example: “Ibuprofen” will be tagged as “SUBSTANCE”, linked to the WikiData ID: Q186969, and assigned the DBpedia tree ”ChemicalSubstance/Drug”.
Sample output:
{ "entities": [ { "type": "ORGANIZATION", "mention": "Securities and Exchange Commission", "normalized": "Securities and Exchange Commission", "count": 3, "mentionOffsets": [ { "startOffset": 4, "endOffset": 38 }, { "startOffset": 166, "endOffset": 169 }, { "startOffset": 536, "endOffset": 539 } ], "entityId": "Q953944", "confidence": 0.67070782, "linkingConfidence": 0.27190905, "dbpediaType": "Agent/Organisation/GovernmentAgency" }, { "type": "PERSON", "mention": "Bridget Fitzpatrick", "normalized": "Bridget Fitzpatrick", "count": 2, "mentionOffsets": [ { "startOffset": 99, "endOffset": 118 }, { "startOffset": 287, "endOffset": 298 } ], "entityId": "T1", "confidence": 0.92063326 }, { "type": "PERSON", "mention": "David Gottesman", "normalized": "David Gottesman", "count": 2, "mentionOffsets": [ { "startOffset": 174, "endOffset": 189 }, { "startOffset": 307, "endOffset": 316 } ], "entityId": "Q5234268", "confidence": 0.92488831, "linkingConfidence": 0.47211223, "dbpediaType": "Agent/Person" }, { "type": "TITLE", "mention": "Chief Litigation Counsel", "normalized": "Chief Litigation Counsel", "count": 1, "mentionOffsets": [ { "startOffset": 134, "endOffset": 158 } ], "entityId": "T2", "confidence": 0.3306601 }, { "type": "TITLE", "mention": "Deputy Chief Litigation Counsel", "normalized": "Deputy Chief Litigation Counsel", "count": 1, "mentionOffsets": [ { "startOffset": 229, "endOffset": 260 } ], "entityId": "T5", "confidence": 0.81287289 }, { "type": "TEMPORAL:DATE", "mention": "December 2016", "normalized": "December 2016", "count": 1, "mentionOffsets": [ { "startOffset": 268, "endOffset": 281 } ], "entityId": "T6" }, { "type": "TITLE", "mention": "Ms.", "normalized": "Ms.", "count": 1, "mentionOffsets": [ { "startOffset": 283, "endOffset": 286 } ], "entityId": "T7", "confidence": 0.76600134 }, { "type": "TITLE", "mention": "Mr.", "normalized": "Mr.", "count": 1, "mentionOffsets": [ { "startOffset": 303, "endOffset": 306 } ], "entityId": "T9", "confidence": 0.72353458 }, { "type": "TITLE", "mention": "Co-Acting Chief Litigation Counsel", "normalized": "Co-Acting Chief Litigation Counsel", "count": 1, "mentionOffsets": [ { "startOffset": 332, "endOffset": 366 } ], "entityId": "T11", "confidence": 0.03582656 }, { "type": "LOCATION", "mention": "Washington D.C.", "normalized": "Washington D.C.", "count": 1, "mentionOffsets": [ { "startOffset": 460, "endOffset": 475 } ], "entityId": "Q61", "linkingConfidence": 0.66086622, "dbpediaType": "Place/PopulatedPlace/Settlement" } ] }
Try the Demo
1) Select a sample or paste in your own text
2) Click “Analyze”
3) Select the “Entities” tab on the right side of the screen
4) Click on an entity for additional detail such as sentiment and knowledgebase links
Deployment
Rosette Cloud
Sign up today for a free 30-day trial
The SaaS version of Rosette is rapidly implemented, low maintenance and ideal for users who wish to pay based on monthly call volume. Numerous bindings through a RESTful API are supported.
Rosette Server Edition
This on-premise private cloud deployment puts all the functionality of Rosette Cloud behind your secure firewall, and enables advanced user settings, access to custom profiles (user-specific configuration setups), and deployment of custom models.
Rosette Java Edition
For on-premise systems that need the low-latency, high-speed integration of an SDK, Rosette Java is the way to go. It has been deployed in the most demanding, high-transaction environments, including web search engines, financial compliance, and border security.
Rosette Plugins
Just plug in Rosette for instant high-accuracy multilingual search and fuzzy name search for Elasticsearch or Apache Solr.
Quality documentation and support
Our support team responds to customers in less than a business day, and is committed to a satisfactory resolution. Users have access to in-depth documentation describing all the features, with code examples and a searchable knowledge base.
Visit our GitHub for bindings and documentation.
Questions?
Email: info@basistech.com
Phone: +1-617-386-2000
Select Customers Include
Learn More
Deep Search for Salesforce
AI-driven Search Application for Salesforce
KonaSearch is a best-in-class search application for Salesforce enabling users to search every field, file, and object across multiple orgs and other data sources.
