ARABIC MSA-BASED TRANSCRIPTION ISSUES
TO BE REVISED ACCORDING TO DECISIONS TAKEN AT JAN. 16, 2004 MEETING
Word-initial Hamza
Word-initial hamza is written above/below alif ( Ã Å )
but more frequently as a bare alif ( Ç )
* Word-initial hamza is a diacritic (a vowel marker):
MSA: Ãä / Åä
Dialect: ÃäÊ / ÅäÊ
* Usage is inconsistent; writer awareness is low:
ÃäÇ / ÇäÇ
* Misapplied to words that start with eliding hamza:
ÅÌáÓ Ü æÅÞÑÃ Ü æÃßÊÈ
ÇáÈÍË ÈÅÓã ÇáãÄáÝ
ßíÝ ÊÈÏà ÈÅÓÊÎÏÇãå
ÇáæÖÚ ÇáÅÞÊÕÇÏí
ÇáÅÊÍÇÏ ÇáÃæÑæÈí
íæã ÇáÅËäíä
Options:
* Write all word-initial hamzas, consistently and correctly (difficult)
* Omit all word-initial hamzas, consistently (less difficult)
* Automatic removal of word-initial hamzas? (relatively easy)
/[wf]?([bk]?(Al)?|ll?)[><]./
* Automatic insertion of missing hamzas? (difficult?)
Word-medial/final Hamza
MSA glottal stop → /y/ or vocalic length:
MSA /jara:?id/ → Dialect /jara:yid/
MSA /ra?s/ → Dialect /ra:s/
MSA /bi?r/ → Dialect /bi:r/
MSA /bada?/ → Dialect /bada/
Option 1: Try to render all words in MSA spelling
Easy: /jara:yid/ → MSA ÌÑÇÆÏ
Easy: /bada/ → MSA ÈÏÃ
Hard: /ja:y/ → MSA ÌÇÆí
Hard: /ja:ya/ → MSA ÌÇÆíÉ
Hard: /ja:yi:n/ → MSA ÌÇÆííä
Strange: /ra:yHa/ → MSA ÑÇÆÍÉ
Strange: /?are:t/ → MSA ÞÑÃÊ
MSA variation:
/ra?sma:l/ /ra:sma:l/
/baša:yir/ /baša:?ir/
/Hara:yir/ /Hara:?ir/
/saga:yir/ /saga:?ir/
/sara:yir/ /sara:?ir/
/šaba:yik/ /šaba:?ik/
/šafa:yif/ /šaba:?if/
/maSa:yid/ /maSa:?id/
/maDa:yiq/ /maDa:?iq/
/mana:yir/ /mana:wir/ /mana:?ir/
Option 2: Render in MSA spelling only when easy
Easy: /jara:yid/ → MSA ÌÑÇÆÏ
Easy: /bada/ → MSA ÈÏÃ
Treat all others as lexical variation
Imperfect Verb 1st Pers. Sg.
Variation: /?ana bašu:f baru:H ba?u:l/
ÇäÇ ÈÇÔæÝ / ÈÇÑæÍ / ÈÇÞæá
ÇäÇ ÈÔæÝ / ÈÑæÍ / ÈÞæá
Google frequencies
1.a "AnA bqwl lk" 76 + ">nA bqwl lk" 33 = 109
1.b "AnA bAqwl lk" 19 + ">nA bAqwl lk" 10 = 29
1.c "AnA bqwllk" 3 + ">nA bqwllk" = 3
1.d "AnA bAqwllk" 1 + ">nA bAqwllk" 3 = 4
1.e "AnA bqwlk" 136 + ">nA bqwlk" 85 = 221
1.f "AnA bAqwlk" 27 + ">nA bAqwlk" 4 = 31
2.a ">nA b$wf lk" + "AnA b$wf lk" 4
2.b ">nA b$wflk" + "AnA b$wflk" 2
2.c ">nA bA$wflk" + "AnA bA$wflk"
2.d ">nA bA$wf lk" + "AnA bA$wf lk"
3.a "AnA jyt lk" 2 + ">nA jyt lk" 6 = 8
3.b "AnA jytlk" 7 + ">nA jytlk" 1 = 8
4.a ÅÝÑíÞíÇý 33,400
4.b ÃÝÑíÞíÇý 72,400
4.c ÇÝÑíÞíÇý 38,800
5.a ÅÓÈÇäíÇý 13,600
5.b ÃÓÈÇäíÇý 19,000
5.c ÇÓÈÇäíÇý 21,300
6.a ÇáÅÓÈÇäíý 18,100
6.b ÇáÃÓÈÇäíý 11,000
6.c ÇáÇÓÈÇäíý 11,300