Science Fair Project Encyclopedia
UTF-32
UTF-32 is a method of encoding Unicode characters, using a fixed amount of 32 bits for each character. It can be regarded as the simplest possible way, as all other Unicode Transformation Formats have variable-length encodings for various characters. However, a notable drawback of UTF-32 is that it requires up to two to four times the storage space of traditional encodings. UTF-32 is generally not as efficient on memory usage and memory bandwidth when compared to UTF-16 or UTF-8. This is why it is rarely used for external storage, but only internally when character handling is required to be as simple as possible.
UCS-4
ISO 10646 defines a 32-bit encoding form called UCS-4, in which each encoded character in the universal character set (UCS) is represented by a 32-bit friendly code value in the code space of integers between 0 and hexadecimal 7FFFFFFF.
UCS-4 is sufficient to represent all of the Unicode code space, which has 1114112 (= 220+216) code points and therefore requires only up to hexadecimal 10FFFF. Some people consider it wasteful to reserve such a large code space for mapping a relatively small set of code points, so a new encoding form, UTF-32, was proposed. UTF-32 is a subset of UCS-4 that uses 32-bit code values only in the 0 to 10FFFF code space.
UTF-32 and UCS-4
UTF-32 was originally a subset of the UCS-4 standard, but the Principles and Procedures document of JTC1/SC2/WG2 states that all future assignments of characters will be constrained to the BMP or the first 14 supplementary planes and has removed former provisions for private-use code positions in groups 60 to 7F and in planes E0 to FF.
Accordingly UCS-4 and UTF-32 can be now taken to be identical save that the UTF-32 standard has additional Unicode semantics that must be observed.
External Links
- The Unicode Standard 4.1, chapter 3 - formally defines UTF-32 in §3.10, D43-D45
- Unicode Standard Annex #19 - formally defined UTF-32 for Unicode 3.x (March 2001; last updated March 2002)
- Registration of new charsets: UTF-32, UTF-32BE, UTF-32LE - announcement of UTF-32 being added to the IANA charset registry (April 2002)
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details


