Unicode 对象和编解码器

Unicode 对象

Since the implementation of PEP 393 in Python 3.3, Unicode objects internally use a variety of representations, in order to allow handling the complete range of Unicode characters while staying memory efficient. There are special cases for strings where all code points are below 128, 256, or 65536; otherwise, code points must be below 1114112 (which is the full Unicode range).

UTF-8 representation is created on demand and cached in the Unicode object.

注意

The Py_UNICODE representation has been removed since Python 3.12 with deprecated APIs. See PEP 623 了解更多信息。

Unicode 类型

These are the basic Unicode object types used for the Unicode implementation in Python:

type Py_UCS4
type Py_UCS2
type Py_UCS1
属于 稳定 ABI (应用程序二进制接口) .

These types are typedefs for unsigned integer types wide enough to contain characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with single Unicode characters, use Py_UCS4 .

Added in version 3.3.