23 Sep 2019

What’s in a Persian Name?


How Persian names are composed from history, social class, religious affiliation, and geographic origins

This month we have a special blog post by computational linguist and NLP engineer Zina Saadi.

For this article, we’ll focus on Persian names of Iran, where people speak a dialect of Persian often called Farsi. Many aspects of the language give Farsi its depth and complexity, including its phonology and morphology, possible spelling variations, and borrowings from other languages. Especially when tackling challenges such as building software for fuzzy name matching and translation, a full understanding of the ins and outs of Persian names is absolutely essential.

Given Names and Affixes

Interestingly, something as basic — from a Western viewpoint — as surnames didn’t exist in Iran until the rule of the Shah of Iran (1925 to 1941). Before 1925, a full Iranian name was a given name and a collection of affixes: morphemes that appear at the beginning (a prefix), end (a suffix), or middle of a word. For example:

Pre-Shah era, full Persian name:

Prefix(es) + Given Name(s) + Suffix(es)
Haji Mirza Hassan Tabrizi

Name affixes give insight to the person, speaking to their social class, education, religious affiliation, and origin or birth city.

Name-prefix examples:

This name has two prefixes that denote respect:
Hajji حاجى is a person who has completed the pilgrimage to Mecca and
Mir میر meaning a master, is a contraction with Amir (امیر ), meaning “prince” in Arabic.

Name-suffix examples:


The suffix Tehrani تهرانی refers to someone who is from Tehran.


The Alavi علوى suffix here is the Nisbah (Arabic morphological transformation rule) of someone having descended from the first religious Imam: Imam Ali ibn Abu Talib.

Dual-function affixes:

A single affix may often communicate different meanings, depending on where it appears attached to a name. For example, the affix Mirza ميرزا, as a name-prefix, denotes respect. As a suffix, it denotes royal descent.

Example of “Mirza” denoting respect: Mirza Ali (name of a 17th-century Persian physician)
Example of “Mirza” denoting royal descent: Iskander Ali Mirza (Persian: اسکندر على ميرزا)
(Urdu: اسکندر على مرزا)

Post-Shah Era: Farsi Surnames

During the reign of the Shah of Iran, there was a notable shift in Farsi name format. Therefore, Farsi names are distinguished between the pre-Shah era and the post-Shah era. For instance, suppose you are reading Persian philosophy books and you encounter the name2 ملاصدرا , which could be transliterated as Mulla Sadra or Mollasadra. This given name is the name of 17th-century Persian philosopher Ṣadr ad-Dīn Muḥammad Shīrāzī. The name-prefix is Mulla (also pronounced as Molla) plus Sadra, which is a contraction of the name Ṣadr ad-Dīn. As the surname Muḥammad Shīrāzī is present, it is clear this person was named after 1925.

Prefix
Given name
Surname
Era
Mulla
(a title specific to a religious group)
Sadra Pre-Shah-era name
Mollasadra Pre-Shah-era name (alternative transliteration)
Ṣadr ad-Dīn
(a contraction of Sadra)
Muḥammad Shīrāzī Post-Shah-era name

Since the Shah took power in 1925, surnames have been required, although name affixes remain in use. Below are some examples of the composition of these full names:

Given name
Surname
Comment
Kivan كيوان Muhammadi محمدى Single-word given name, single-word surname
Amir Hussein امير حسين Bahramzadah بهرمزدى Compound given name, surname with affix
Alireza عليرضا Darya-Bandari دريا بندری Compound given name, compound surname

Challenges to Persian Name Translation and Name Matching

Four aspects of the linguistics of the Persian language heavily influence how Persian names are formed: phonological and morphological specifications; orthographic (spelling) variations; and cross-lingual borrowings.

Phonological

Similar to English, Farsi consonants and vowels can have more than one pronunciation and the same sound can be represented by different characters.

Table A: Each character maps to one sound.

Table B: One sound (see the IPA column) maps to more than one character.

Table C: One sound maps to more than one vowel/character.

IPA sound
Example(s)
Transliteration
Explanation
/ɒː/ Arezou/Shadi Aleph and aleph with madda can both be pronounced /p:/
/uː/ Oraee/Mousa Aleph vav and vav can be both pronounced as the long vowel /uː/
/ɒi/ Ajay Final aleph yaa sukun is pronounced as /ɒi/
/ow/ Showvan Dhamma vav sukun is pronounced as /ow/
/æi/ Haidar Fatha yaa sukun is pronounced as /æi/
/ei/ Oveissi Kasra yaa sukun is pronounced as /ei/
/e/ Ilham/Zhale Aleph kasra and kasra heh are both pronounced as /e/
/o/ Oveissi Aleph dhamma is pronounced as /o/
/æ/ Afshar/Yahya Aleph fatha and yaa fatha are both pronounced as /æ/
/iː/ Issa/Pari Medial yaa and final yaa are both pronounced as the long vowel /iː/
Morphological Specifications

Farsi given names and surnames may have a variety of prefixes and suffixes attached:

  • Prefixes: Por (meaning full) as in
  • Suffixes: Zadeh (meaning descendent) as in
  • Prefixes and suffixes with the same meaning, such as Nezhad as in

Orthographic Variations

Spelling variations fall into a few categories:

  1. Foreign names written in Persian often vary in spelling (just as Persian names spelled in English do). Example: Variations of “Leonardo” in Persian (from Iran-News) can be
  2. Borrowed names from Arabic with a hamza character. The hamza (ء) can be dropped if it ends the name, such as وفاء vs. وفا or if it appears medially, such as in رضائی vs. رضایی or مسئول vs. مسول.
  3. Inconsistent use of the space character (Unicode codepoint U+0020) and the zero-width non-joiner character (U+200C). Although this variation is one that really only computers care about, it can make or break a search engine looking for Persian names. Depending on the typist, affixes can be joined to names using either character.
Name
English Transliteration
Description of the variation
Moslemy Zade Name with affix using Zero-Width Non-Joiner ZWNJ (U+200C) as the space between a name and its affix
Taqizadeh Name with affix using the whitespace (U+0020) character between them
Wafa Borrowed name from Arabic with hamza
Wafa Borrowed name from Arabic without hamza
Cross-lingual Borrowings

There are two types of borrowings in Persian names: borrowings from Arabic and borrowings from other languages.

Borrowings from Arabic
Names spelled with letters specific to Arabic (not used in purely Persian words), such as thaa (ث), dhaal (ذ), ain (ع), ghain (غ), Saad (ص), Dhaad (ض), Taa (ط), and Zha(ظ) are an indication that the name comes from Arabic. In addition, the mapping of taa marbuta (ة ) in Arabic converts to a final-heh (ه), final-alef (ا) , or final-teh (ت) in Persian. Thus for the case of writing Arabic names in Persian, the final Persian spelling might be one of three possibilities. (See table below.) Unless you are familiar with the name’s pronunciation in Persian, it is challenging to know which English transliteration to choose.

Arabic name
Persian equivalent
Comment
محبوبة Mahbouba محبوبه Mahboubeh Arabic taa marbuta replaced by Persian ه
سميرة Samira سميرا Samira Arabic taa marbuta replaced by Persian ا
هداية Hedaya هدايت Hedayat Arabic taa marbuta replaced by Persian ت

Borrowings from other languages
For names borrowed from languages other than Arabic, Persians apply their own pronunciation to the characters, which can be a challenge to matching names across languages.

Name in Persian and pronunciation
Origin of borrowed name
Meaning of name
اسلان Aslan Azerbaijani-Turkish lion
ارشمید Arashmid Greek Archimedes
شووان Showan/ShwAn Kurdish shepherd

Applications doing fuzzy name matching and translation for Persian need a deep understanding beginning with “What is a full name?” Linguistically speaking, what aspects of the language affect how a name is written? What social and cultural backgrounds should be considered when working with Persian names?

As this blog post has shown, Persian names are particularly tricky to fuzzy match, search for, or translate for several reasons:

  • Names can have the same meaning as a common noun and be literally translated by accident
  • There is a one-to-many or many-to-many mapping between Arabic characters and Latin characters that may vary, depending on the context in which the Persian character appears, and one sound may map to multiple characters or vice versa
  • There may be a varying number of expected name components
  • Electronically, names may be composed in different ways, concatenated, or using whitespace or a Zero Width Non-Joiner ZWNJ (U+200C)
  • Borrowings from other languages pose their own issues.

The name translator and name-matching functions within Rosette will take care of all these complexities. Try them out today by getting a free Rosette API key.