UTF-8 stands for Unicode
Transformation
Format-8. It is an octet (8-bit)
lossless encoding of Unicode characters.
UTF-8 encodes each Unicode character as a variable number of 1 to 4
octets, where the number of octets depends on the integer value assigned
to the Unicode character. It is an efficient encoding of Unicode
documents that use mostly US-ASCII characters because it represents each
character in the range U+0000 through U+007F as a single octet. UTF-8
is the default encoding for XML.
Standards
RFC
3629: UTF-8, a transformation format of ISO 10646. November 2003.
In particular, see the informal
description of UTF-8 in sections 2.5 and 2.6, pages 30-32, and a
much more formal
definition in sections 3.9 and 3.10, pages 77-81.