Science Fair Project Encyclopedia
KOI8-U
KOI8-U is an 8-bit character encoding, designed to cover Ukrainian, which uses the Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight graphic characters with four Ukrainian letters Ґ, Є, І, and Ї in both upper case and lower case.
KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. In recent times, both might eventually give way to Unicode.
In Russian, KOI8 stands for Kod Obmena Informatsiey, 8 bit (Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.
| KOI8-U | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| x0 | x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | xA | xB | xC | xD | xE | xF | |
| 0x | unused | |||||||||||||||
| 1x | ||||||||||||||||
| 2x | SP | ! | " | # | $ | % | & | ' | ( | ) | * | + | , | - | . | / |
| 3x | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | : | ; | < | = | > | ? | |
| 4x | @ | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O |
| 5x | P | Q | R | S | T | U | V | W | X | Y | Z | [ | \ | ] | ^ | _ |
| 6x | ` | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o |
| 7x | p | q | r | s | t | u | v | w | x | y | z | { | | | } | ~ | |
| 8x | ─ | │ | ┌ | ┐ | └ | ┘ | ├ | ┤ | ┬ | ┴ | ┼ | ▀ | ▄ | █ | ▌ | ▐ |
| 9x | ░ | ▒ | ▓ | ⌠ | ■ | ∙ | √ | ≈ | ≤ | ≥ | NBSP | ⌡ | ° | ² | · | ÷ |
| Ax | ═ | ║ | ╒ | ё | є | ╔ | і | ї | ╗ | ╘ | ╙ | ╚ | ╛ | ґ | ╝ | ╞ |
| Bx | ╟ | ╠ | ╡ | Ё | Є | ╣ | І | Ї | ╦ | ╧ | ╨ | ╩ | ╪ | Ґ | ╬ | © |
| Cx | ю | а | б | ц | д | е | ф | г | х | и | й | к | л | м | н | о |
| Dx | п | я | р | с | т | у | ж | в | ь | ы | з | ш | э | щ | ч | ъ |
| Ex | Ю | А | Б | Ц | Д | Е | Ф | Г | Х | И | Й | К | Л | М | Н | О |
| Fx | П | Я | Р | С | Т | У | Ж | В | Ь | Ы | З | Ш | Э | Щ | Ч | Ъ |
In the table above, 20 is the regular SPACE character, and 9A is the NO-BREAK SPACE.
The difference with KOI8-R consists of the positions 0xA4; 0xA6; 0xA7; 0xAD; and 0xB4; 0xB6; 0xB7; 0xBD; which consist of extra letters that don't exist in Russian.
Although RFC 2319 says that character 95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251.
Some references have a typo and incorrectly state that character B4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).
See also
External links
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details


