Rosette API 1.8 adds topics, salience, French sentiment, and more

24 Oct 2017
Blog

We’re excited to announce we released Rosette API 1.8, including a new /topics endpoint.

The topic extraction endpoint returns key phrases extracted from the input text, as well as general concepts that may not be explicitly mentioned. /topics can be used to tag and sort a large corpus of documents, so you can automatically filter for key areas of interest or track trends over time.

Topic extraction goes further in summarizing text than entity extraction or categorization because /topics is not constrained by a finite list of recognized entity types or categories. As /topics is still in “Labs” status, any feedback is very welcome!

The release also adds support for French sentiment analysis (beta), and exposes salience scores for all extracted entities and confidence scores for linked entities. Additionally, our /transliteration, /relationships, and /syntax/dependencies endpoint are now fully supported and have graduated from “Labs” status. TL; DR check the release notes.

Deeper insight into your text data

A highlight of the 1.8 release is the addition of the new /topics endpoint. For a given input, the endpoint will return two lists, “key phrases” and “concepts.” Key phrases are significant phrases or words extracted directly from the text that Rosette deems to be of vital importance. Concepts are themes detected within the text that do not have to be explicitly mentioned in the input.

Interested in entities beyond the 18 recognized entity types that the /entities endpoint extracts? Topics recognizes any important keywords within your documents, without category boundaries. /topics can also be used to tag and sort a large corpus of documents, allowing you to automatically filter for key areas of interest and search more effectively.

More information means more control

In the same vein, this release exposes salience and linking confidence scores to give you more insight into your text data for improved aggregation, sorting and search.

Entity salience scores highlight entities which are relevant to the main focus of a document. For example, submitting a Reuters article about The Beatles to the /entities endpoint will likely return Paul McCartney, John Lennon, Ringo Starr, George Harrison, and Reuters plus many other locations, organizations, and people. Salience ranks Reuters lowly, giving high scores to John, Paul, George, and Ringo–the entities that are the focus of a document.

Salience scores allow you to rank entities by importance, providing more accurate insight into your documents and enabling better tagging, sorting, and search.

Linking confidence scores tell you how likely the link between an in-document entity mention and its knowledge base-linked QID is correct. Use these scores for thresholding and removing false positives.

For example, an article about a possible revival of the popular 2000s spy drama “Alias” could mistakenly link Sydney Bristow, the show’s protagonist, to Sydney, New South Wales, the most populous city in Australia, albeit with a low confidence score. Setting a minimum threshold for confidence scores ensures that poor links like this would be thrown out, preserving the integrity of your data.

NOTE: In default mode, salience and confidence scores will not be returned automatically. Turn them on by adding an option to the request:

calculateSalience=true

or

calculateConfidence=true

respectively.

Learn more

For more information, check the release notes. You can read FAQs at support.rosette.com, and sign up for an API key at developer.rosette.com.

Already a Rosette API user and want to take advantage of our new /topics endpoint? Make sure you update your client binding — look for version 1.8.x in the package manager of your choice. The release does not include any breaking changes, so binding updates are not necessary to continue your current workflow, only to access new updates.

Happy coding!