GUIDELINES FOR TRANSCRIBING LEVANTINE ARABIC:
IMPORTANT NOTICE!! These are the Guidelines for the Arabic Orthographic System-based Transcription
(=AOST: transcription in the "Yellow" area).
If you are engaged in MSA-based transcription (=MSAT: transcription in the "Green" area)
please use the MSAT Guidelines.
GENERAL OBSERVATIONS
The AOST transcription starts out as an exact copy of
the MSA-based Arabic transcription (MSAT), and it is in
transliteration (AOST-T) as well as in Arabic script (AOST-A).
The mapping between AOST-T and AOST-A is one-to-one and reversible.
Example:
DETAILED INSTRUCTIONS
(*) Transcribing the Definite Article "Al-"
Do not modify the definite article "Al" – leave it unvocalized, but add the shadda
to the word after "Al" when it begins with a geminated "shamsiyya" consonant
(e.g. Als~alAm) if that is the way it was pronounced.
Note that some dialects extend gemination to consonants
that are not geminated in MSA (e.g. Alj~amal). Note the vocalization of
the definite article when it is preceded by the preposition "li":
lils~alAm, lilmaktab. Examples:
(*) Transcribing Ta Marbuta
The ta marbuta is always be transcribed as "p" regardless of how it is pronounced.
However, if it is pronounced /t/ (such as when the word is in an idafa construction)
then you must add the Annotation Remark "(-ap Pronounced)." For example:
(*) Transcribing With non-Standard (Persian) Letters
AMADAT allows for the use of three non-standard (Persian) characters for AOST transcription:
(*) Transcribing "q" Pronounced as Glottal Stop
When the "q" is pronounced as a glottal stop, it should be transcribed as hamza.
Follow the standard rules of orthography, which are based on the surrounding short vowels.
(If there is known variation, such as شئون/شؤون and رءوس/رؤوس, follow the predominant Levantine spelling of the hamza).
Examples:
(*) Transcribing "q" Pronounced as /g/
When the "q" is pronounced as /g/ (as in some non-urban Levantine dialects),
it should be transcribed with the Persian letter گ.
Example:
(*) Transcribing Hamza
The hamza at the beginning of words will be written above or below alif
depending on the accompanying short vowel: above if the vowel is "a" or "u"; below if it's "i". For example:
(*) Transcribing the Short Vowel "a" before Alif
The short vowel "a" is redundant before the long vowel "A"
so there is no need to write it.
(Technical note: if the sequence "aA" is desired in the final transcription,
it can be generated automatically via a substitution script.
The reverse process – replacing all instances of "aA" with "A" – is even easier to implement).
(*) Transcribing "Dagger Alif" as Long Vowel "A"
There are several words that in MSA are written with a "dagger alif" diacritic.
On PC's this diacritic appears only in the word "Allah" (الله) because there is a special
glyph for the sequence of 3 letters "llh" (لله). The dagger alif should also appear in words
like هذا, هذي, and طه (if it were available). Note that the dagger alif is transcribed
in AOST as the long vowel "A":
(*) Transcribing Unstressed Long Vowels
In words with more than one long vowel, unstressed long vowels are often shortened.
For example: مباريات is often pronounced /mubaraya:t/.
This long-vowel shortening is related to word stress and will therefore
not be recorded in AOST transcription.
DISPLAY ISSUES
(*) Vocalization of Lam-Alif Ligature
On all platforms and in most applications the lam-alif ligature (لا) displays strangely
when a diacritic is inserted between the lam and the alif (e.g., "li>an~a" لِأَنَّ).
This is a display issue only, and the data should not be considered corrupted!
ISSUES TO BE CONSIDERED
(*) Transcribing the Long Vowels /e:/ and /o:/
The long vowels /e:/ or /o:/ should be transcribed as the diphthongs /ay/ and /aw/
if the word's MSA counterpart also has a diphthong (e.g., بيت, صوت).
However, if the MSA counterpart has a long vowel sound (e.g., موتور, موديل),
then transcribe the /e:/ and /o:/ sounds as the long vowels /iy/ and /uw/.)
Examples:
(*) Annotation of Epenthetic Vowels
For example, which of the following AOST transcriptions do we prefer, and why?
Guidelines for the Transcription of Arabic Dialects (EARS)
Tim Buckwalter, Mohamed Maamouri
Arabic Treebank Project
LDC, University of Pennsylvania
January 9, 2004
AOST TRANSCRIPTION
MSAT: بالضبط (ضحك) اما (إنقطاع) (إنقطاع) اما صحيح اللي بحكيه والا لأ
AOST-T (before changes): bAlDbT (DHk) AmA (<nqTAE) (<nqTAE) AmA SHyH Ally bHkyh wAlA l>
AOST-A (before changes): بالضبط (ضحك) اما (إنقطاع) (إنقطاع) اما صحيح اللي بحكيه والا لأ
The annotator's task is to modify the transliteration so that it more closely reflects actual pronunciation.
These modifications will consist of:
adding missing short vowels and diacritics,
removing certain letters that are not pronounced,
and modifying certain letters when they are pronounced differently than how they are written.
In addition to making changes to the transliteration text, the annotator will occasionally attach one or more
Annotation Remarks (e.g., "-h Deletion") to the relevant words in the text.
For example:
MSAT: بالضبط (ضحك) اما (إنقطاع) (إنقطاع) اما صحيح اللي بحكيه والا لأ
AOST-T (after changes): biAlz~abT >am~A SaHiyH <ill~iy baHkiy (-h Deletion) wil~A la>
AOST-A (after changes): وِلّا لَأ (-h Deletion) بِالزَّبط أَمّا صَحِيح إِللِّي بَحكِي
MSAT: القمر والشمس
AOST-T: Al>amar wiAl$~ams
AOST-A: الأَمَر وِالشَّمس
MSAT: مدرسة البنات مدرسة ثانوية
AOST-T: madrasp(-ap Pronounced) AlbinAt madrasap tAnawiy~ap
AOST-A: البِنات مَدرَسَة تانَوِيَّة (-ap Pronounced)مَدرَسة
Note that some words have ta marbuta that is often pronounced as /t/ outside of idafa constructions:
MSAT: AlHyAp
AOST-T: AlHayAp(-ap Pronounced)
AOST-A: (-ap Pronounced)الحَياة
/g/ گ گـ ـگـ ـگ
/č/ چ چـ ـچـ ـچ
/p/ پ پـ ـپـ ـپ
Examples:
MSAT: انت بتحكي انكليزي؟
AOST-T: <inta btiHkiy <inGliyziy?
AOST-A: إِنتَ بتِحكِي إِنگلِيزِي؟
MSAT: كيف حالك؟
AOST-T: Jiyf HAliJ?
AOST-A: چِيف حالِچ؟
MSAT: عندك كمبيوتر في البيت؟
AOST-T: Eindak kamPyuwtar fiy Albayt?
AOST-A: عِندَك كَمپيُوتَر فِي البَيت؟
MSAT: مين اللي بيقول هيك
AOST-T: miyn All~iy biy&uwl hayk?
AOST-A: مِين اللِّي بِيؤُول هَيك؟
MSAT: ما قريتش المقالة
AOST-T: mA >arayti$ Alma|lap
AOST-A: ما أَرَيتِش المَآلَة
MSAT: AnA bqwl lk
AOST-T: >anA baGuwl lak
AOST-A: أَنا بَگُول لَك
MSAT: وين امك
AOST-T: wayn <im~ak
AOST-A: وَين إِمَّك
MSAT: الله ـ هذا ـ هذي ـ طه
AOST-T: All~Ah - hA*A - hA*iy - TAha
AOST-A: اللّاه ـ هاذا ـ هاذِي ـ طاهَ
MSAT: صواريخ ـ مباريات ـ كانون
AOST-T: SawAriyx - mubArayAt - kAnuwn
AOST-A: صَوارِيخ ـ مُبارَيات ـ كانُون
MSAT: صوت الموتور ـ الموديل ـ البيت
AOST-T: Sawt Almuwtuwr - Almuwdiyl - Albayt
AOST-A: صَوت المُوتُور ـ المُودِيل ـ البَيت
(1.a)
MSAT: انا قلت لك
AOST-T: >nA >ult l~ak
AOST-A: أنا أُلت لَّك
(1.b)
MSAT: انا قلت لك
AOST-T: >nA >ulti l~ak
AOST-A: أنا أُلتِ لَّك
(1.c)
MSAT: انا قلت لك
AOST-T: >nA >ult il~ak
AOST-A: أنا أُلت ِلَّك
(1.d)
MSAT: انا قلت لك
AOST-T: >nA >ult <il~ak
AOST-A: أنا أُلت إِلَّك
(2.a)
MSAT: من البيت
AOST-T: min Albayt
AOST-A: مِن البَيت
(2.b)
MSAT: من البيت
AOST-T: mini Albayt
AOST-A: مِنِ البَيت
(2.c)
MSAT: من البيت
AOST-T: mina Albayt
AOST-A: مِنَ البَيت
(2.d)
MSAT: من البيت
AOST-T: min iAlbayt
AOST-A: مِن ِالبَيت