Improve the Quality of Your Data

Find and reconcile duplicates in your data with Rosette’s industry-leading fuzzy matching algorithm

Request Demo

Unlike most data cleansing tools, Rosette uses advanced semantic and machine learning techniques to quickly and accurately find duplicates in your data. With Rosette, you can deduplicate names, organizations, and locations across different languages and improve the quality of your data even at scale.

data deduplication, data cleansing dedupe


Dedupe tens of millions of data entries without having lags in performance and accuracy.


Integrate into your database system and customize it to meet your business’ unique needs.

Cloud API

Easily deploy in the Cloud API and use it with the appropriate bindings for yourself. Sign up now to try.


Vast data quantities? Unique integration needs? Data security restrictions? Talk to our sales team to learn more about our on-premise solutions.

Intelligent Matching Technology Behind the Scene

Rosette assigns group ids based on the match threshold you specify for your list. With fluency across 17 languages and a deep understanding of the linguistic complexities of names, Rosette solves 13 different name phenomenon from nicknames to semantically similar names across languages and help you identify and reconcile duplicate names, organizations, and locations in your database.

Name Cluster ID
Cyndi McBoysen 1
Dmitri Shostakovich 3
Jim Hockenberry 2
Takeshi Suzuki 4
James Hawkenbury 2
Cindy MacBoysen 1
Дми́трий Шостако́вич 3
Organization Cluster ID
Carl’s Jr. 1
Burberry 2
Carl Jr.’s 1
Eagle Drugs 3
Burberry Group PLC 2

Frequently Asked Questions

What's the difference between Rosette's Cloud API and On-Premise solutions?

Rosette is Basis Technology’s flagship product, a suite of linguistic tools that support a variety of different languages and can be accessed through multiple interfaces. When we refer to the “Rosette API” or our “Cloud API” we’re talking about the SaaS version of Rosette, a cloud-based RESTful web service that supports most of Rosette’s overall functionality. We host our API in the AWS cloud. Results are returned as JSON and we offer seven different client bindings plus a RapidMiner extension.
Rosette can also be hosted by you and your organization on-premise. Our on-premise deployment gives you full control of your data and its security. In many cases, it also offers better latency than the Cloud API. Our On-Premise solutions also offer individualized customization options, including Rosette’s unique “state-ful” features — name indexing, custom entity extraction model training, and custom knowledge bases for entity linking — that are not available in the Cloud API.


Rosette On-Premise comes in two major “flavors” for different organizational needs: Java SDKs or on-premise API hosting. Contact us for more information and a custom evaluation.

What is a match threshold?

The match threshold sets the minimum similarity score required for two names to be considered duplicates. Rosette assigns group ids based on this threshold. We recommend starting with a .8 threshold, and experimenting with higher or lower values depending upon your use case and results.

How does your API count calls?

One line of a name equates to one call. Please let us know if you notice any discrepancies.

What type of entities can Rosette dedupe?

Rosette can match and deduplicate persons, organizations, and locations.

Request a Demo

Our global team of sales engineers are ready to walk you through a demo and answer your questions!