Science Fair Project Encyclopedia
Esperanto is written in an alphabet of twenty-eight letters. Twenty-two of these are identical in form to letters of the English alphabet (q, w, x, and y being omitted). The remaining six are accented letters, which appear as follows: ĉ, ĝ, ĥ, ĵ, ŝ (c, g, h, j, and s circumflex), and ŭ (u breve). The full alphabet appears as follows:
With the exception of c (= [ts]) and the accented letters, the values of the letters are approximately those of the IPA (see Esperanto pronunciation). The alphabet has a nearly one-to-one correspondance of letter to sound; the only significant exceptions being the sequence kz, which is frequently pronounced [gz], as in ekzemple; and borrowed words such as ŭato that use initial ŭ for [w], which is normally an allophone of v.
The script is modeled after the Czech alphabet. However, the circumflex instead of haček of the letters ĉ, ĝ, ĥ, ĵ, ŝ avoids the appearance of any national version of the Roman alphabet, and the non-Slavic bases g, j of the letters ĝ and ĵ (instead of dž, ž) help preserve the printed appearance of Latinate and Germanic vocabulary. The letter ǔ of the diphthongs aǔ, eǔ appears to be taken from the Belarusian Łacinka alphabet, historically associated with the Polish-Lithuanian Commonwealth, but Łacinka was otherwise closer to Polish than Esperanto until Czech-inspired orthographic reforms two decades after Zamenhof went public.
The six Esperanto accented characters are included in the international form of Morse code.
In handwritten Esperanto, the accented letters cause no problems. However, since none of them appear on standard alphanumeric keyboards, various different methods have been devised for representing them in printed and typed text using more standard characters. The original method was what is now referred to as the "h-system," but a so-called "x-system" has become just as popular. These are described below. With the advent of Unicode, the need for such systems is lessening.
The names of the letters
Zamenhof simply tacked an -o onto each consonant to create the name of the letter, with the vowels representing themselves. This is fine for initialisms such as kotopo ("k.t.p.") for etcetera [from kaj tiel plu (and so forth)], but becomes a problem when spelling out words or names. Several consonantal distinctions are difficult for certain nationalities, who rely on the fact that normal Esperanto vocabulary seldom relies on these pairs to distinguish words. (That is, they don't form many minimal pairs.) Thus the pairs of letter names ĵo ĝo, ĥo ko (or ĥo ho), co ĉo (or co so), and ŭo vo are problematic. In addition, over a noisy telephone connection it quickly becomes apparent that voicing distinctions can be difficult to make out — noise confounds the pairs po bo, to do, ĉo ĝo, ko go, fo vo, so zo, ŝo ĵo, as well as the nasals mo no. In addition, lo ro is a difficult distinction for many Asians, Africans, and Pacific Islanders.
There have been several proposals to resolve this problem. The closest to international norms (and thus the easiest to remember) that also clarifies all the above distinctions is a modification of a proposal first put forward by KALOCSAY Kálmán. It uses the vowel e after the letter by default, but places e before the letter for sonorants and voiceless fricatives; uses a as the vowel for <h> and the voiceless plosives, after the international names ha and ka for <h> and <k>; and uses the French name ĵi for <ĵ>, the Greek ĥi (chi) for <ĥ>, and the English ar for <r>. The diphthong offglide <ŭ> is named eŭ, the only real possibility given Esperanto phonotactics besides aŭ, which as the word for "or" would be inappropriate. <M> is called om, as this alliterates well in the sequence l, m, n, o, p. In the full ABC rhyme, the accented letters are placed at the end, where w, x, y are found in English, so as not to disrupt the flow of the letters most of us learned as children:
- a, be, ce, de, e, ef, ge, ha,
- i, je, ka, el, om, en, o, pa,
- ar, es, ta, u, ve, ĉa, ĝe,
- ĥi kaj ĵi, eŝ, eŭ kaj ze,
- plus ku, ikso, ipsilono,
- jen la abece-kolono.
- a, be, ce, de, e, ef, ge, ha,
The original method of representing accented letters is due to the initiator of Esperanto, L. L. Zamenhof, who recommended using u in place of ŭ, and putting an h after a letter to indicate that the letter should have a circumflex. For example, the consonant ŝ is represented as sh, as in the words shi (ŝi, meaning she) and shanco (ŝanco, meaning chance).
Unfortunately this method suffers from two problems:
- h is already a consonant in the language, so its use for another purpose would make the pronunciation and sometimes the meaning of words ambiguous.
- Simplistic ASCII-based rules for sorting English words fail badly for sorting Esperanto ones, because lexicographically words starting with ĉ should follow words starting with c and precede words starting with d. For example ĉu should be sorted after ci lexicographically, but written in the h-system, chu would be incorrectly sorted before ci.
The most common system for typing in Esperanto today is the "x-system," which uses x after a letter to indicate that the letter should have an accent. For example, the consonant ŝ is represented as sx, as in the words sxi (ŝi) and sxanco (ŝanco).
This method solves both of the problems inherent in the h-system:
- x is not a consonant in the language, so its use introduces no ambiguity into the pronunciation or meaning.
- Words starting with cx now correctly follow words starting with c. Similarly, other accented letters are sorted after their unaccented counterparts. The sorting only fails when a word with cz or similar is encountered, but such words are relatively uncommon.
One problem with the x-system is when it is used alongside French text, because many French words end in ux. For example, aux (aŭ in Esperanto) is a word in both languages. This is most serious when one wants to automatically convert an X-system text file which also contains French text to Unicode; any automatic replacement will alter the French text as well. A few English words like "luxury" can also suffer from such search-and-replace routines. A few people have proposed using "vx" instead of "ux" for ŭ, but this variant of the system is rarely used.
Use of the caret
Another, less popular, system is the use of the caret character (^) to represent the accents, either before or after the letter to be accented. For example, ŝanco becomes ^sanco or s^anco. This shares the advantage of unambiguity with the x-system, and also has the advantage that the character itself resembles a circumflex accent, so that people unfamiliar with the system are likely to grasp what is meant. However, the system has not caught on in many places.
Many new Esperantists perceive the accented letters as a problem, and often propose "new" methods to transliterate Esperanto, sometimes with substantial modifications. Most of these proposals are ignored or shunned by the community, as such suggestions often come from people who do not know the language well.
The transliteration of Esperanto into ASCII is a topic known to cause flame wars and little constructive discussion, and the reduction of such behaviour is sometimes indicated as one of the main reasons to use Unicode and the proper accented letters.
The entire Esperanto alphabet is part of the Latin-3 and Unicode character sets, so the above systems are no longer necessary on web pages. Nonetheless, the x-system remains common on Usenet and in e-mail where encoding support is rare and the limited availability of keyboard configurations makes it difficult for many to type the special characters.
The HTML entities for the special Esperanto characters in Unicode are:
- C-circumflex: Ĉ
- c-circumflex: ĉ
- G-circumflex: Ĝ
- g-circumflex: ĝ
- H-circumflex: Ĥ
- h-circumflex: ĥ
- J-circumflex: Ĵ
- j-circumflex: ĵ
- S-circumflex: Ŝ
- s-circumflex: ŝ
- U-breve: Ŭ
- u-breve: ŭ
Practical Unicode for Esperanto
Adjusting a keyboard to type Unicode is actually relatively easy (all Windows variants of the Microsoft Windows NT family, such as 2000 and XP, for example, support Unicode; Windows 9x does not natively support Unicode).
Microsoft Windows: A page that describes how to use the excellent tool Keyman (free for personal use) in conjunction with a special (free) "keyword file" is available here. It can be configured to automatically run at startup. The advantage of using Keyman is that you can easily deactivate it—so your "abbreviations" (such as "cx," which are automatically converted to the corresponding Esperanto letter as you type) are not accidentally converted.
You can also use keyboard layout manager to define special keys: the most elementary thing is associating AltGr+g to ĝ and similar ones. The program has a simple and intuitive interface, but it may be necessary to define a new keyboard to avoid interference from Windows' system-file protection system, that may not permit modifications of important system files as keyboard drivers.
Many popular e-mail clients support Unicode, so you can happily use the tools described above to write e-mails using the Esperanto alphabet.
In Linux systems, one has first to activate Unicode by setting the environment variable LC_CTYPE=en_US.UTF-8 ; there are also non "en_US" Unicode layouts, and they function accordingly. There is even a special eo_XX.UTF-8 available at Bertil Wennergren's home page, along with a thorough explanation of how one implements Unicode and the keyboard in Linux.
On Mac OS X systems, Esperanto characters can be entered by activating the "U.S. Extended" keyboard layout in the "Input Menu" pane of the "International" system preferences. When the U.S. Extended layout is active, Esperanto characters can be entered as follows:
- C-circumflex ĉ = option+6 shift-c
- c-circumflex Ĉ = option+6 c
- G-circumflex: Ĝ = option+6 shift-g
- g-circumflex: ĝ = option+6 g
- H-circumflex: Ĥ = option+6 shift-h
- h-circumflex: ĥ = option+6 h
- J-circumflex: Ĵ = option+6 shift-j
- j-circumflex: ĵ = option+6 j
- S-circumflex: Ŝ = option+6 shift-s
- s-circumflex: ŝ = option+6 s
- U-breve: Ŭ = option+b shift-u
- u-breve: ŭ = option+b u
The option characters can be remembered by mnemonics: the 6 key contains the caret character, so option-6 places a caret over the following character. Option-b stands for breve.
An Esperanto locale would use "." as the thousands separator and "," as a decimal point. Time and date format among Esperantists is not so standardized as number format, but 24-hour time with colon between hour and minutes, and for dates, either yyyy-mm-dd or dd-mm-yyyy, would be international and unambiguous.
- eoconv, a tool to convert text between various Esperanto orthographies and character encodings
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details