Ready.


Arabic, read, written, understood, ...

On this page, I tell about, how I transcribe Arabic text in Latin letters (ASCII)

This work is an improvement. Others existed. I modified theirs to fit the way I need, or want, it. This has a variety of uses. e.g: I think of a little applet that displays Arabic text, while the text is entered with a Latin keyboard.






Q: Why Arabic text, in Latin letters?


A:

For convenience. This helps both those who are learning Arabic, and those who own computers (and/or gadgets) without a convenient method for typing, and displaying with the Arabic letters.

For a student, a possible path for development is:

  1. Learn the frame-of-reference, for Arabic. How it differs.

    The student is reading/writing with the alphabet, that he/she already knows. No new alphabets (or, fonts) necessary, for reading-and-writing at your computer screens and keyboards.

    This lets comparing-and-contrasting the needs of an Arabic text, with the requirements of English, French, etc. e.g:

    • No capital versus small case
    • There are varieties of the s, z, and h sounds.
    • There are fixed rules, when pronouncing the text. It is almost, if not exactly, phonetical.
    • There are phonetical signs, within the text, as superscripts, to guide the pronounciation.
    • In fact, both the vowels, and the phonetical signs, are optional, in an Arabic text. The consonants are the substance, that the text revolves around. Yet, the meaning, and/or the pronouncing, of any sentence, is more easily interpretable with these extras.

      Otherwise, it takes more knowledge - both about the possible words with those consonants, and about the context where that sentence is spoken. In fact, there are seven such qiraats (q-ruh-ut) in reading Quran, as revealed to our prophet Muhammed (s.a.s), and reported in a hadith. Each of these seven qiraats are memorized, by the hafiz (huh-fiz), the Quran-memorizers.


  2. Get acquainted with Arabic letters, and ligatures.

    After having mastered the previous step, try get acquainted with the Arabic letter shapes. This is very easy, with this faithful mapping of the letters, I provide. Easily compare the Arabic text of the Quran, with those transcribed with this method, in Latin. The structure/pattern is exactly the same. This should help in a transfer of knowledge.

  3. Type at any Latin/ASCII keyboard.

    This is relief, both for fonts-shortage, and for keyboard-anomalies. Any keyboard would do. If Latin-text is possible, Arabic text is possible, too. e.g: Within the quotes of an HTML page, where the HTML itself is Latin-text, the quote is able to contain Arabic text, with this method.

    I plan publishing, inshaellah, a little typing-helper, later, at this page, for displaying the typed-text, as Arabic-text, too. i.e: The typist types with these rules, while the text is displayed as ordinary Arabic. No difference.

If you would like to learn all the seven qiraats, then, you may start with the phonetics here, then switch to the corresponding qiraat (the ASIM qiraat) in Arabic texts. And then, you may go on to learn the other six.






Q: Did other such methods exist?


A: The prior art

Yes, or no. This is the most comprehensive system, that I know. Others had more limited range, even when not altogether neglectful.

There are a lot of problems with a full neglect/ignorance. The least is the point that, it is a many-to-one mapping. e.g: Three different letters, may get written as the same letter "h" in such a neglectful text. Four different letters are mapped to the single letter of "z" in Arabic-to-Turkish conversions. That loses a lot of information. You get confusing "the Creator," with "the barber" when the "h" are different, in deed.

There are little patches that place an extra dot, or two, or a sign, above or below the regular Latin letters. This needs special fonts, then, if you would like to employ it, with your own texts.

In a Teach-yourself-Arabic text, I noticed, the Latin letters are complemented with a few Greek letters, e.g. theta, to increase the range of letters, presumably, to let Arabic varieties fit in, without mapping to arbitrary letters. Thereby, the similar letters in Latin and Greek, are employed to convey the shades of the "t" etc.

I came by this text, in late June 2004, after I had coded mine, although it is the oldest among these. I refer to it, for your information, as an alternative, out there.)

A tejwid (pronounciation) book has the only rule I liked, and I improved that idea, with other wish-list items of mine, for a full-mapping. The idea in that tejwid-book, was to employ the capital case vs. small case letters, to stand for different letters in Arabic. I think this is fine, for most cases. It conveys both the similarity, and the weight. e.g: The capital T is a heavier sound than the small-case t.

For the purposes of that tejwid textbook, a unidirectional-translation may suffice, but I would like to further it. The result is the set of rules on this page.






Q: What are the rules, for transcribing Arabic?


A:

At this site, I encode Arabic text, with the Latin-alphabet. Mostly, this substitutes, Arabic letters, with Latin equivalents.

Therefore, if you already know some text in Arabic, especially with the specific qiraat, that is employed (for the vowels and phonetics), you should easily recognize what corresponds to what.


For example...

Here is the 112th surah in Quran. The rules, and the replacements-table, follow after it.

bismi/\ll=a'hi/\RR=aHma'ni/\RR=aHi'Ym

Qul* huwe/\ll=a'hu eHad[un=+j]
/e\ll=a'hu/\SSamed[u+j]
lem* yelid*+L
we lem* yuWled*+L
we lem* yekun* lehu kufuwen= eHad[un=]

Equivalently, if only for vocalizing purposes, the display may show this (i.e: if not with Arabic letters, that is):

bismilla'hiRRaHma'niRRaHi'Ym

Qul huwella'hu eHad
ella'huSSamed
lem yelid
we lem yuWled
we lem yekun lehu kufuwen eHad

But this latter style, loses, possibly-important, information - even for sound-only. e.g: It does not suggest, where you are allowed to stop, where you are not, when reading it. It is not a good idea to stop at points marked with a "+L" , for example. The shedde-couples should not be pronounced with any pause between them.


double/quadruple-time

The capital-letters W and Y, are silent. They only contribute to vowel-expansion, i.e: more-time with the sound of the vowel, before themselves.

This is interpretation/qiraat dependent. In another context, such a W or Y, may receive vowels for themselves (after themselves), too. In such a case, they would, for example, get written as "wa," or "yi" i.e: An associated vowel, converts these to small-case, and lets own sound.

A silent-elif is similar. It is noticed as a capital-case vowel, after a consonant. e.g: Qa is different than QA. The latter has an elif, after Qa, and doubles the time.

The vowel-expansion is achievable without such silent-consonants, too. A single-quote (an apostrophe) after a vowel, doubles its time. e.g: Qa' takes double the time of Qa

Such a suffix of a single-quote, is similar to the short vertical-bar, in Arabic, both in shape, and function.

Similarly, a tilde "~" after a vowel, lengthens its sound two to four times.


jezm/sukun

A jezm is showable with an asterisk "*" , although It is not necessary, mostly, if ever. If there is a space, after a consonant, in a Latin text, you stop, any way.


double-hearing

A shedde-diacritic, in Arabic, doubles a consonant. I write such a doubled-consonant, with two consonants, followed by an equals-sign "=". e.g: /inn=e"

An "el" followed by a word that starts with a "shemsee" letter, is similar to a shedde, in pronounciation. I keep the unpronounced "l" letter, as "\" the backslash letter. e.g: /e\$$=emsi" That is, to find out whether it is a shedde with a shemsee letter, notice the double-consonants (a single-consonant, pronounced double), whether that is after a backslash.


the Non-Noons

The tenwin (double-hareke) sounds are written with the "n" sound followed by an equals-sign, and preferrably, a space after that. i.e: "n= " For example "rajulun= kebiYrun= " When you only pronounce it, you may neglect those equals-signs, but when you count letters, those signs tip you to discard those "n= " entries, because they do not contain any consonant "n" really. It is only a tenwin sound.

Similarly, the extra consonant, introduced with a shedde, is noticed with an equals-sign after it. e.g: "jenn=Atu" is written with two "n" letters, although there is a single consonant "n" in it. For consonant-counting, it is fine to neglect whether the equals-sign exists because of a tenwin, or a shedde. In any case, it is easy to tell the difference, too. Before the tenwin's "n" there is a vowel, whereas the shedde has the same consonant doubled. Therefore, these two are never confused.


for your info

The informative subscripts/superscripts, are the reading-aid extras, that you may find in a Quran text. Here, they are prefixed with a plus-sign. For an example, +T stands for the "Ta" superscript letter, at the end of those arabic sentences, where a stop is required, or possible, and neglecting the vowel before it, is required, when/if you stop there.

When stopping is not required, I write +T within square brackets, together with the vowel that would get neglected, if stopped there. e.g: [e+T], or [u+T].

When +T is a required pause, I write +T without any brackets around it, e.g: [e]+T, or [u]+T. The vowel, in this case, is only informative, because it is never pronouncable, with a +T that requires waiting/stopping. I inform about the vowel, because it is helpful when reading for learning/understanding the arabic text. (No need, though, if only vocalizing the text, and/or if the reader is able to infer the mansub vs. merfu vs mejrur, without the vowel-tip, too.)

Another important example, +L stands for the lam-elif ligature, as a superscript. It is found at the phrase-ends, where the reading must continue. It is a not-pausable point. As such, +L is necessarily enforced. i.e: Even if the reader thought of stopping there, he/she continues, after seeing the +L.


new couplings

I introduce three new coupling-concepts, for fitting Arabic letters, in Latin form.

The first is the lisping-letters: the lisping-se, and lisping zel. These are tip-of-the-tongue sounds, and represented with a c, with its tongue/teeth shape. The capital-C represents zel, the latter in alphabet.

The O is not a vowel in Arabic. Therefore, I employ it for two new concepts. When it is capital, as in leOall=e, it represents the letter Uyn, in Arabic. The Uyn, in fact, is a consonants, with a larynx'izing of elif'ful (or, hemze) sounds. Therefore, it is almost kin, with vowels. In fact, when other texts ignore its sound, they simply show the vowel, by itself. It is a heavy sound. Therefore, the capital-O is the perfect choice, I think.

When o is small case, it is the letter h, as with its Arabic shape-alike, and if it is prefixed with a colon, it is t. This letter, tamarbuta, is representable with an umlaut-o, ö, if the font has it. This is its Arabic shape.

Arabic alphabet ASCII coding Commentary
elif, and hemze [E] A U I, e, a, u, i, v, / The sound differs with the vowel-signs that go with it. Let's discuss this later, inshaellah , in the varieties of its handling.

An a, e, i, u sound, if printed as a capital letter, that means it is not only a (vowel telling) "hareke," but it is with a consonant "elif" that it stays on. i.e: It is an elif, and there is a hareke above/below it.

The capital-elif lengthens the sound (of the consonant-vowel pair, as it is). i.e: The capital "A" takes more time than "e" even if it is the same hareke, with an elif difference.

For pronounciation, "A" and "a" are, different, unless there is a "_'" (an apostrophe after an underscore). The "A" and "a_'" are equivalent, for pronounciation. Although "a" is not on an elif, it is the same result, with the help of a diacritic (an apostrophe on it, in Arabic text). i.e: Pronounce the same, count without an elif.

(The special need arises because after the thick-letters, the vowel "e" is always pronounced as "a", although its length does not increase. An apostrophe, prefixed with an underscore, does the job, and fits "u", too.)

The capital "E" is never pronounced, although counted as the letter elif. e.g: there is an elif in the Arabic spelling of the word Allah, but it is not pronounced. For humans, I would not show it, at all. It is most fitting for machine-processing, where we capture more-information with the "E" existence, but it may distract a human's attention. (You may get used to it, though.)

The "hemze" is a sign, not a consonant. The small case "v" represents hemze, here. It is both a prefix, and acts also as a letter, although it is not a consonant, really. e.g: "ve" is a hemze, with a hareke above it, and this is pronounced as "e" whereas "vA" is a consonant elif, with a hemze-sign above it, and a hareke above the hemze, and it is pronounced as "A"

A division sign, the slash "/" is an elif that has no functionality, although it is preserved in text. It acts as hemze, too, if there is a vowel after it. The hemze is implicit in it. Simply neglect it, when reading the text - unless for special purposes, such as counting, or calculating with the consonants.

Hemze may, as well, not get written, if a / exists. A vowel, after a blank, or at the very start, is with hemze. If it is capital case, E, A, I, U, it has an /, too. Therefore, both the hemze, and / are not needed, for a vowel at start of a block. Implicit.
be b .
te t, ö t is the consonant "te"

ö is the consonant "tamarbuta." When the ö, is not available, as with 7-bit ASCII, then we may write it, equivalently, as ":o" . Here, the colon, as a prefix, acts as a dead-char, such as customary with circumflex, before vowels.
se (lisping) c It is the Arabic lisping-s with triple dots on it. It is the s that is pronounced with the tip of the tongue. In English, it is most like the "th" sound.
I represent the tip-of-tongue sounds with the (tongue-shape) letter c in ASCII. This is the smaller c, the later (lisping) zel will be shown with a capital C.
jim j It is the usual j sound like in jump.
Its Arabic shape is with a dot at below/middle of "Ha".
Ha H The harsher h sound. For ASCII coding, I use a capital H, because it is a heavier sound than the h that will come at the end.
kha X The harshest h sound. Its Arabic shape is with a dot on top of "Ha".

This is a "kh" sound, as in khan. The letter 'X' (chi) in Greek is voiced that way. Therefore, I picked it for the ASCII coding of kha.
del d The usual d sound.
zel (lisping) C It is the lisping-z that is pronounced with the tip of the tongue. I show it with a (tongue-shape) capital C. You may find this easier, if you keep in mind that the letter c is tongue-shaped, and it is a lisping sound, that is, pronounced with the tip of the tongue.

Its shape, in Arabic, is with a dot on top of "del"

In both the Arabic alphabet, and the ASCII, the "zel" is later than "se." Therefore, the capital "C" is "zel," and small-case "c" is "se."

This is the only letter that I show with a ASCII-capital-letter, although it is not a heavy letter. In this case, both of the "c, C" are lisping; Neither is thick. The shape tips their voice, and we differentiate the two, with their alphabet order.
ra R, r The usual r sound. Ra, Ru, ri. (It is thick, other than with an "i")
ze z The usual z sound as in zap.
Its Arabic shape is with a dot on top of "ra"
sin s The usual s sound.
shin $ The "sh" sound as in she. I show it with the dollar-sign. (If you consider the "$" sign as a letter, then you should know that, as in the case of "c," the "$" is not thick, either. It is a sign, any way. You do not read it, in English texts. I reserved it for such.)
Sud S The heavier s sound in Arabic
Dud D The heavier d sound in Arabic
Tu(h) T The heavier t sound in Arabic
Zu(h) Z The heavier z sound in Arabic
Uyn O The larynx-consonant, which larynx'izes the sound of the vowel-sign that is pronounced with it, as compared with the "elif" having the same vowel-signs.

In Arabic, there are no "O" sounds that exist in English. So, I employ the ASCII "capital O" for the consonant "uyn," not a vowel. In any case, the wide-spread conception of that consonant as more similar to a vowel, than a consonant, a larynx version of elif (or, a hareke), fits this, superbly. i.e: For example, "Oa" is a larynx "a" sound.
Guyn G A (larynx?) sound somewhat like g as in go. The sound is like the French pronounciation of 'r' as in Paris.
Its Arabic shape is with a dot on top of "Uyn"
fe f The usual f sound, as in free.
Quf Q The usual q sound. (The name of it has the u being pronounced, as in sun, not like "quite," as usually "u" is pronounced in English, when it is after a "q")
kef k The usual k sound
lum [L,] l, \ The usual l sound as in lesson.

When pronouncing, a "la" is with a relatively heavier (thicker) sound, when there was a heavy sound before it (i.e: the previous syllable). e.g: The "la" in "Qala" is heavier than the "la" in "bela" No need to show this in a special/different way. You simply notice the previous syllable. That is how it is in Arabic, too, any way.

The backslash "\" stands for an "L" without any sound.

The capital "L" is the lam-elif ligature, especially as the superscript, "+L" at the end of many/most ayets.
mim m .
noon n If there is an equals-sign after "n" then it is not "noon" the consonant "n" Instead, it is a tenwin'ful "n"
wuw w, W The small case w is pronounced, as in sound sound as in what, where.

The capital case W is a meta-character, not pronounced here, (in that qiraat, that you prefer transcribing with), but it lengthens the duration of the vowel before it. Both are counted as consonant-letters, though.
he h The usual h sound, like in history
ya y, Y The small case y is pronounced, as in sound sound as in you. "The capital case Y is a meta-character, not pronounced here, (in that qiraat, that you prefer transcribing with), but it lengthens the duration of the vowel before it. For other studies/counting, both are counted as consonant-letters, though.
info-only [ ... ] Anything enclosed within a pair of square-brackets, is for info-only. e.g: It is sometimes readable, too. If you do not know about it, simply neglect any such, when vocalizing. They are helpful, though, for understanding arabic texts, (especially for novices?)






Q: Is this system perfect? Do these rules suffice?


A: Notes and disclaimers


notes about more-phonetic, less informative variants

The example with two alternative writings of the 112th surah, is important. The latter presentation will let you vocalize (pronounce) Quran, through Latin/ASCII coding. This corresponds to listening someone vocalizing the Quran, on some radio, or CD, etc.

When presented with a text, encoded that way, when studying it further, if there is a need for more, non-phonetical, information, you may consult the Arabic text of Quran. For a vocal treatment, with a given qiraat, such a presentation should suffice. (I use the ASIM qiraat, in the examples.)

In Arabic, the vowels and the transcription signs are optional, and even in reading the Quran there are revealed varieties. As such, the more I may encode it phonetically, the more it would represent a single qiraat, to the exclusion of the others. I hope to remain informative about it, though. I even include the no-sound-influence characters.

I continue studying these. The full-success, for this study, is the point where the Latin-encoding contains the full information, and it is readable, too, by a human, without major distraction. This, even when achieved, needs further work, to verify. (Human-performance research, interface improvements.) i.e: Perfection is w.r.t. its requirements, and needs testing.

In any case, though, even if this is not perfect, or not-announcable as perfect, yet, it is well developed, and good enough for the purposes of this site.


irrelevant to shouting, or silence

Internet netiquette suggests not writing texts in all-capital letters, because it is suggestive of shouting. But this is irrelevant in our case, where a text may appear in all-caps, or mostly capita letters, only because the Arabic letters in it are that way. It is not about shouting, or not shouting. Employ an exclamation-mark, if you need shouting, any way.




Forum: . . (Fair Menu . . . . . Fault Report? . . . . . Remedy for your case . . . . . Noticed Plagiarism?)

Referring#: 4
Last-Revised (text) on June 26, 2004 . . . that was http://www.geocities.com/ferzenr/arabic_text.htm
mirror for zilqarneyn.com, on Mar. 13, 2009
Written by: Ahmed Ferzan/Ferzen R Midyat-Zila (or, Earth)
Copyright (c) [2002,] 2003, 2004, 2009 Ferzan Midyat. All rights reserved.