ARABIC MSA-BASED TRANSCRIPTION ISSUES

TO BE REVISED ACCORDING TO DECISIONS TAKEN AT JAN. 16, 2004 MEETING

Word-initial Hamza

Word-initial hamza is written above/below alif ( Ã Å )
but more frequently as a bare alif ( Ç )

* Word-initial hamza is a diacritic (a vowel marker):
             MSA: Ãä / Åä
             Dialect: ÃäÊ / ÅäÊ
* Usage is inconsistent; writer awareness is low:
             ÃäÇ / ÇäÇ
* Misapplied to words that start with eliding hamza:
             ÅÌáÓ Ü æÅÞÑÃ Ü æÃßÊÈ
             ÇáÈÍË ÈÅÓã ÇáãÄáÝ
             ßíÝ ÊÈÏà ÈÅÓÊÎÏÇãå
             ÇáæÖÚ ÇáÅÞÊÕÇÏí
             ÇáÅÊÍÇÏ ÇáÃæÑæÈí
             íæã ÇáÅËäíä

Options:

* Write all word-initial hamzas, consistently and correctly (difficult)
* Omit all word-initial hamzas, consistently (less difficult)
* Automatic removal of word-initial hamzas? (relatively easy)
             /[wf]?([bk]?(Al)?|ll?)[><]./
* Automatic insertion of missing hamzas? (difficult?)

Word-medial/final Hamza

MSA glottal stop → /y/ or vocalic length:
             MSA /jara:?id/ → Dialect /jara:yid/
             MSA /ra?s/ → Dialect /ra:s/
             MSA /bi?r/ → Dialect /bi:r/
             MSA /bada?/ → Dialect /bada/

Option 1: Try to render all words in MSA spelling
       Easy: /jara:yid/ → MSA ÌÑÇÆÏ
       Easy: /bada/ → MSA ÈÏÃ
       Hard: /ja:y/ → MSA ÌÇÆí
       Hard: /ja:ya/ → MSA ÌÇÆíÉ
       Hard: /ja:yi:n/ → MSA ÌÇÆííä
       Strange: /ra:yHa/ → MSA ÑÇÆÍÉ
       Strange: /?are:t/ → MSA ÞÑÃÊ


MSA variation:
       /ra?sma:l/   /ra:sma:l/
       /baša:yir/   /baša:?ir/
       /Hara:yir/   /Hara:?ir/
       /saga:yir/   /saga:?ir/
       /sara:yir/   /sara:?ir/
       /šaba:yik/   /šaba:?ik/
       /šafa:yif/   /šaba:?if/
       /maSa:yid/   /maSa:?id/
       /maDa:yiq/   /maDa:?iq/
       /mana:yir/   /mana:wir/   /mana:?ir/

Option 2: Render in MSA spelling only when easy
       Easy: /jara:yid/ → MSA ÌÑÇÆÏ
       Easy: /bada/ → MSA ÈÏÃ
Treat all others as lexical variation


Imperfect Verb 1st Pers. Sg.

       Variation: /?ana bašu:f baru:H ba?u:l/
       ÇäÇ ÈÇÔæÝ / ÈÇÑæÍ / ÈÇÞæá
       ÇäÇ ÈÔæÝ / ÈÑæÍ / ÈÞæá

Google frequencies

1.a   "AnA bqwl lk"  76  +   ">nA bqwl lk"  33 = 109
1.b   "AnA bAqwl lk" 19  +   ">nA bAqwl lk" 10 =  29
1.c   "AnA bqwllk"    3  +   ">nA bqwllk"      =   3
1.d   "AnA bAqwllk"   1  +   ">nA bAqwllk"   3 =   4
1.e   "AnA bqwlk"   136  +   ">nA bqwlk"    85 = 221
1.f   "AnA bAqwlk"   27  +   ">nA bAqwlk"    4 =  31
 
2.a   ">nA b$wf lk"      +   "AnA b$wf lk"   4
2.b   ">nA b$wflk"       +   "AnA b$wflk"    2
2.c   ">nA bA$wflk"      +   "AnA bA$wflk"
2.d   ">nA bA$wf lk"     +   "AnA bA$wf lk"
 
3.a   "AnA jyt lk"    2  +   ">nA jyt lk"    6 = 8
3.b   "AnA jytlk"     7  +   ">nA jytlk"     1 = 8
 
4.a   ÅÝÑíÞíÇý    33,400
4.b   ÃÝÑíÞíÇý    72,400
4.c   ÇÝÑíÞíÇý    38,800
 
5.a   ÅÓÈÇäíÇý    13,600
5.b   ÃÓÈÇäíÇý    19,000
5.c   ÇÓÈÇäíÇý    21,300
 
6.a   ÇáÅÓÈÇäíý    18,100
6.b   ÇáÃÓÈÇäíý    11,000
6.c   ÇáÇÓÈÇäíý    11,300