UNICODE < Transform

Introduction

Unicode is an effort to create a single character set that includes every single character needed by a written human language on our the planet.

An important goal of the Unicode character set is to distinguish between a character and its encoding in bits on a computer. ASCII and ISO 8859 are such character encodings. Unicode assigns to every character in the character set a number, called a code point. This number uniquely identifies a character in Unicode. The Unicode character set itself is thus not a standard character encoding. Because there is no encoding, there is not limit on the number of the characters that Unicode will be able to contain in the future.

There are several ways to represent Unicode code points in bits. Well known encodings are UTF-8, UTF-16, and UTF-32.

Resources

Unicode at Wikipedia
What is Unicode
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Unicode Technical Report \#17: Character Encoding Model

Revision: r1.2 - 30 Oct 2003 - 09:34 - MartinBravenboer

Transform > UNICODE