Arabic Chat Translator – Arabizi Translator

Convert Arabic text written with Roman letters and numbers to Modern Standard Arabic

Overview

What is “Arabizi”?

“Arabizi” is a casual version of written Arabic using Roman letters and numbers instead of Arabic script. It was invented by Arabic speakers using Western keyboards to type Arabic in online chats. With the explosion of digital communication, Arabizi became one of the most widely used online languages. Given that there are some 420 million Arabic speakers in the world, any global text analytics system must handle Arabic, and thus Arabizi, too.

Converting Arabizi to Modern Standard Arabic

NLP for Arabic assumes the text will be written in Arabic script. To make Arabizi accessible to text analysis, Rosette® Chat Translator (RCT) converts all Arabizi variations to Modern Standard Arabic, minimizing information loss and ensuring consistency across translations. Its linguistic algorithm looks at the frequency of the structural components of each word, together with a statistical model trained on the input of millions of internet users from all over the Arabic-speaking world. It can also convert standard Arabic into Arabizi.

Once Arabizi has been translated into Arabic, it can be run through linguistic analysis, such as morphological analysis, entity extraction, and sentiment analysis.

Product highlights

  • Arabizi ↔ Arabic conversion
  • Cloud or on-premise deployments
  • Fast and scalable
  • Industrial-strength support

How it Works

Accurately translate

RCT translates Arabizi (also called Romanized Arabic “chat”) into standard Arabic script with very high accuracy. This product leverages two fundamental techniques:

  • An algorithmic approach breaks down words into morphological components and phonemes to produce Arabic candidates. The candidates are ranked according to many metrics, such as the popularity of the phoneme mappings or how frequently the Arabic output is used in Arabic text.
  • A statistical approach uses a large database of Roman alphabet spellings generated from the input of millions of Arabic speakers online.

This dual algorithmic and statistical approach is increasingly recognized as the most effective method for text analysis and machine translation. Our technology is one of the few commercially available products offering Arabizi-to-Arabic translation. RCT is designed for performance and concurrency from the ground up. It is capable of converting thousands of words per second, enabling your applications to quickly process large databases of text. Available in the cloud and on-premise, the functionality can be integrated into any software environment.

Crowd-sourced translations

Unlike machine translation systems that rely on conventional dictionaries, RCT is powered by a database of 300 million Arabic words collected from thousands of websites. This approach enables translation from regional Arabic dialects used by Arabic-speaking online communities.

Tech Specs

Availability and platform support

Deployment Availability:
Bindings:

Supported Languages

Arabizi ↦ Modern Standard Arabic
Modern Standard Arabic ↦Arabizi
Sample input:
{
  "content": "ana r2ye7 el gam3a el sa3a 3 el 3asr"
}
Sample output:
{
  "transliteration": "أنا رايح الجامعة الساعة ٣ العصر"
}

Deployment

Rosette Cloud

Sign up today for a free 30-day trial

The SaaS version of Rosette is rapidly implemented, low maintenance and ideal for users who wish to pay based on monthly call volume. Numerous bindings through a RESTful API are supported.

Rosette Server Edition

This on-premise private cloud deployment puts all the functionality of Rosette Cloud behind your secure firewall, and enables advanced user settings, access to custom profiles (user-specific configuration setups), and deployment of custom models.

Rosette Java Edition

For on-premise systems that need the low-latency, high-speed integration of an SDK, Rosette Java is the way to go. It has been deployed in the most demanding, high-transaction environments, including web search engines, financial compliance, and border security.

Rosette Plugins

Just plug in Rosette for instant high-accuracy multilingual search and fuzzy name search for Elasticsearch or Apache Solr.

Quality documentation and support

Our support team responds to customers in less than a business day, and is committed to a satisfactory resolution. Users have access to in-depth documentation describing all the features, with code examples and a searchable knowledge base.

Visit our GitHub for bindings and documentation.

Questions?

Email: info@basistech.com

Phone: +1-617-386-2000

Select Customers

konasearch salesforce

Deep Search for Salesforce

AI-driven Search Application for Salesforce

KonaSearch is a best-in-class search application for Salesforce enabling users to search every field, file, and object across multiple orgs and other data sources.

View on AppExchange

SalesForce Search