`codecs` — 编解码器注册和基类 ¶

源代码： Lib/codecs.py

该模块定义标准 Python 编解码器 (编码器和解码器) 基类，并提供对内部 Python 编解码器注册的访问 (管理编解码器和错误处理的查找过程)。大多数标准编解码器都是文本编码，将文本编码成字节 (和将字节解码成文本)，但还提供将文本编码成文本和将字节编码成字节的编解码器。自定义编解码器可以在任意类型之间编码和解码，但一些模块特征限定具体使用采用文本编码或采用编解码器编码成 bytes .

模块定义了采用任何编解码器编码和解码的下列函数：

codecs. encode ( obj , encoding = 'utf-8' , errors = 'strict' ) ¶

编码 obj 使用注册编解码器为 encoding .

错误 may be given to set the desired error handling scheme. The default error handler is 'strict' meaning that encoding errors raise ValueError (or a more codec specific subclass, such as UnicodeEncodeError ). Refer to 编解码器基类 for more information on codec error handling.

codecs. decode ( obj , encoding = 'utf-8' , errors = 'strict' ) ¶

解码 obj 使用注册编解码器为 encoding .

错误 may be given to set the desired error handling scheme. The default error handler is 'strict' meaning that decoding errors raise ValueError (or a more codec specific subclass, such as UnicodeDecodeError ). Refer to 编解码器基类 for more information on codec error handling.

还可以直接查找每个编解码器的完整细节：

codecs. lookup ( encoding ) ¶

Looks up the codec info in the Python codec registry and returns a CodecInfo object as defined below.

Encodings are first looked up in the registry’s cache. If not found, the list of registered search functions is scanned. If no CodecInfo object is found, a LookupError is raised. Otherwise, the CodecInfo object is stored in the cache and returned to the caller.

class codecs. CodecInfo ( encode , decode , streamreader = None , streamwriter = None , incrementalencoder = None , incrementaldecoder = None , 名称 = None ) ¶

Codec details when looking up the codec registry. The constructor arguments are stored in attributes of the same name:

名称 ¶: 编码的名称。

encode ¶
decode ¶: The stateless encoding and decoding functions. These must be functions or methods which have the same interface as the encode() and decode() methods of Codec instances (see Codec Interface ). The functions or methods are expected to work in a stateless mode.

incrementalencoder ¶
incrementaldecoder ¶: Incremental encoder and decoder classes or factory functions. These have to provide the interface defined by the base classes IncrementalEncoder and IncrementalDecoder , respectively. Incremental codecs can maintain state.

streamwriter ¶
streamreader ¶: Stream writer and reader classes or factory functions. These have to provide the interface defined by the base classes StreamWriter and StreamReader , respectively. Stream codecs can maintain state.

To simplify access to the various codec components, the module provides these additional functions which use lookup() for the codec lookup:

codecs. getencoder ( encoding ) ¶

Look up the codec for the given encoding and return its encoder function.

引发 LookupError 在找不到编码的情况下。

codecs. getdecoder ( encoding ) ¶

Look up the codec for the given encoding and return its decoder function.

引发 LookupError 在找不到编码的情况下。

codecs. getincrementalencoder ( encoding ) ¶

Look up the codec for the given encoding and return its incremental encoder class or factory function.

引发 LookupError in case the encoding cannot be found or the codec doesn’t support an incremental encoder.

codecs. getincrementaldecoder ( encoding ) ¶

Look up the codec for the given encoding and return its incremental decoder class or factory function.

引发 LookupError in case the encoding cannot be found or the codec doesn’t support an incremental decoder.

codecs. getreader ( encoding ) ¶

Look up the codec for the given encoding and return its StreamReader 类或工厂函数。

引发 LookupError 在找不到编码的情况下。

codecs. getwriter ( encoding ) ¶

Look up the codec for the given encoding and return its StreamWriter 类或工厂函数。

引发 LookupError 在找不到编码的情况下。

自定义编解码器可用于注册合适编解码器搜索功能：

codecs. register ( search_function ) ¶

Register a codec search function. Search functions are expected to take one argument, being the encoding name in all lower case letters with hyphens and spaces converted to underscores, and return a CodecInfo object. In case a search function cannot find a given encoding, it should return None .

3.9 版改变： Hyphens and spaces are converted to underscore.

值	含义
`'strict'`	引发 `UnicodeError` (or a subclass), this is the default. Implemented in `strict_errors()` .
`'ignore'`	Ignore the malformed data and continue without further notice. Implemented in `ignore_errors()` .
`'replace'`	Replace with a replacement marker. On encoding, use `?` (ASCII character). On decoding, use `�` (U+FFFD, the official REPLACEMENT CHARACTER). Implemented in `replace_errors()` .
`'backslashreplace'`	Replace with backslashed escape sequences. On encoding, use hexadecimal form of Unicode code point with formats `\xhh` `\uxxxx` `\Uxxxxxxxx` . On decoding, use hexadecimal form of byte value with format `\xhh` . Implemented in `backslashreplace_errors()` .
`'surrogateescape'`	On decoding, replace byte with individual surrogate code ranging from `U+DC80` to `U+DCFF` . This code will then be turned back into the same byte when the `'surrogateescape'` error handler is used when encoding the data. (See PEP 383 for more.)

值	含义
`'xmlcharrefreplace'`	Replace with XML/HTML numeric character reference, which is a decimal form of Unicode code point with format `&#num;` . Implemented in `xmlcharrefreplace_errors()` .
`'namereplace'`	替换采用 `\N{...}` escape sequences, what appears in the braces is the Name property from Unicode Character Database. Implemented in `namereplace_errors()` .

范围	编码
`U-00000000` … `U-0000007F`	0xxxxxxx
`U-00000080` … `U-000007FF`	110xxxxx 10xxxxxx
`U-00000800` … `U-0000FFFF`	1110xxxx 10xxxxxx 10xxxxxx
`U-00010000` … `U-0010FFFF`	11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

编解码器	别名	语言
ascii	646, us-ascii	English
big5	big5-tw, csbig5	繁体中文
big5hkscs	big5-hkscs, hkscs	繁体中文
cp037	IBM037, IBM039	English
cp273	273, IBM273, csIBM273	德语 Added in version 3.4.
cp424	EBCDIC-CP-HE, IBM424	希伯来语
cp437	437, IBM437	English
cp500	EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500	西欧
cp720		阿拉伯语
cp737		希腊语
cp775	IBM775	波罗的语
cp850	850, IBM850	西欧
cp852	852, IBM852	中东欧
cp855	855, IBM855	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp856		希伯来语
cp857	857, IBM857	土耳其语
cp858	858, IBM858	西欧
cp860	860, IBM860	葡萄牙语
cp861	861, CP-IS, IBM861	冰岛语
cp862	862, IBM862	希伯来语
cp863	863, IBM863	加拿大
cp864	IBM864	阿拉伯语
cp865	865, IBM865	Danish, Norwegian
cp866	866, IBM866	俄语
cp869	869, CP-GR, IBM869	希腊语
cp874		泰语
cp875		希腊语
cp932	932, ms932, mskanji, ms-kanji, windows-31j	日语
cp949	949, ms949, uhc	韩语
cp950	950, ms950	繁体中文
cp1006		乌尔都语
cp1026	ibm1026	土耳其语
cp1125	1125, ibm1125, cp866u, ruscii	乌克兰语 Added in version 3.4.
cp1140	ibm1140	西欧
cp1250	windows-1250	中东欧
cp1251	windows-1251	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
cp1252	windows-1252	西欧
cp1253	windows-1253	希腊语
cp1254	windows-1254	土耳其语
cp1255	windows-1255	希伯来语
cp1256	windows-1256	阿拉伯语
cp1257	windows-1257	波罗的语
cp1258	windows-1258	越南语
euc_jp	eucjp, ujis, u-jis	日语
euc_jis_2004	jisx0213, eucjis2004	日语
euc_jisx0213	eucjisx0213	日语
euc_kr	euckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001	韩语
gb2312	chinese, csiso58gb231280, euc-cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso-ir-58	简体中文
gbk	936, cp936, ms936	Unified Chinese
gb18030	gb18030-2000	Unified Chinese
hz	hzgb, hz-gb, hz-gb-2312	简体中文
iso2022_jp	csiso2022jp, iso2022jp, iso-2022-jp	日语
iso2022_jp_1	iso2022jp-1, iso-2022-jp-1	日语
iso2022_jp_2	iso2022jp-2, iso-2022-jp-2	Japanese, Korean, Simplified Chinese, Western Europe, Greek
iso2022_jp_2004	iso2022jp-2004, iso-2022-jp-2004	日语
iso2022_jp_3	iso2022jp-3, iso-2022-jp-3	日语
iso2022_jp_ext	iso2022jp-ext, iso-2022-jp-ext	日语
iso2022_kr	csiso2022kr, iso2022kr, iso-2022-kr	韩语
latin_1	iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1	西欧
iso8859_2	iso-8859-2, latin2, L2	中东欧
iso8859_3	iso-8859-3, latin3, L3	Esperanto, Maltese
iso8859_4	iso-8859-4, latin4, L4	波罗的语
iso8859_5	iso-8859-5, cyrillic	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
iso8859_6	iso-8859-6, arabic	阿拉伯语
iso8859_7	iso-8859-7, greek, greek8	希腊语
iso8859_8	iso-8859-8, hebrew	希伯来语
iso8859_9	iso-8859-9, latin5, L5	土耳其语
iso8859_10	iso-8859-10, latin6, L6	Nordic languages
iso8859_11	iso-8859-11, thai	泰语
iso8859_13	iso-8859-13, latin7, L7	波罗的语
iso8859_14	iso-8859-14, latin8, L8	Celtic languages
iso8859_15	iso-8859-15, latin9, L9	西欧
iso8859_16	iso-8859-16, latin10, L10	东南欧
johab	cp1361, ms1361	韩语
koi8_r		俄语
koi8_t		Tajik Added in version 3.5.
koi8_u		乌克兰语
kz1048	kz_1048, strk1048_2002, rk1048	Kazakh Added in version 3.5.
mac_cyrillic	maccyrillic	Bulgarian, Byelorussian, Macedonian, Russian, Serbian
mac_greek	macgreek	希腊语
mac_iceland	maciceland	冰岛语
mac_latin2	maclatin2, maccentraleurope, mac_centeuro	中东欧
mac_roman	macroman, macintosh	西欧
mac_turkish	macturkish	土耳其语
ptcp154	csptcp154, pt154, cp154, cyrillic-asian	Kazakh
shift_jis	csshiftjis, shiftjis, sjis, s_jis	日语
shift_jis_2004	shiftjis2004, sjis_2004, sjis2004	日语
shift_jisx0213	shiftjisx0213, sjisx0213, s_jisx0213	日语
utf_32	U32, utf32	所有语言
utf_32_be	UTF-32BE	所有语言
utf_32_le	UTF-32LE	所有语言
utf_16	U16, utf16	所有语言
utf_16_be	UTF-16BE	所有语言
utf_16_le	UTF-16LE	所有语言
utf_7	U7, unicode-1-1-utf-7	所有语言
utf_8	U8, UTF, utf8, cp65001	所有语言
utf_8_sig		所有语言

编解码器	别名	含义
idna		实现 RFC 3490 ，另请参阅 `encodings.idna` . Only `errors='strict'` is supported.
mbcs	ansi, dbcs	Windows only: Encode the operand according to the ANSI codepage (CP_ACP).
oem		Windows only: Encode the operand according to the OEM codepage (CP_OEMCP). Added in version 3.6.
palmos		Encoding of PalmOS 3.5.
punycode		实现 RFC 3492 . Stateful codecs are not supported.
raw_unicode_escape		Latin-1 encoding with `\uXXXX` and `\UXXXXXXXX` for other code points. Existing backslashes are not escaped in any way. It is used in the Python pickle protocol.
undefined		Raise an exception for all conversions, even empty strings. The error handler is ignored.
unicode_escape		Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped. Decode from Latin-1 source code. Beware that Python source code actually uses UTF-8 by default.

`codecs` — 编解码器注册和基类 ¶

编解码器基类 ¶

错误处理程序 ¶

无状态编码和解码 ¶

增量编码和解码 ¶

IncrementalEncoder 对象 ¶

IncrementalDecoder 对象 ¶

流编码和解码 ¶

StreamWriter 对象 ¶

StreamReader 对象 ¶

StreamReaderWriter 对象 ¶

StreamWriter 对象 ¶

编码和 Unicode ¶

标准编码 ¶

Python 特定编码 ¶

文本编码 ¶

二进制变换 ¶

文本变换 ¶

`encodings.idna` — 应用程序中的国际化域名 ¶

`encodings.mbcs` — Windows ANSI 代码页 ¶

`encodings.utf_8_sig` — 具有 BOM (字节序标记) 签名的 UTF-8 编解码器 ¶

内容表

上一话题

下一话题

本页

编解码器	别名	含义	编码器/解码器
base64_codec [ 1 ]	base64, base_64	Convert the operand to multiline MIME base64 (the result always includes a trailing `'\n'` ). 3.4 版改变： accepts any 像字节对象 as input for encoding and decoding	`base64.encodebytes()` / `base64.decodebytes()`
bz2_codec	bz2	Compress the operand using bz2.	`bz2.compress()` / `bz2.decompress()`
hex_codec	hex	Convert the operand to hexadecimal representation, with two digits per byte.	`binascii.b2a_hex()` / `binascii.a2b_hex()`
quopri_codec	quopri, quotedprintable, quoted_printable	Convert the operand to MIME quoted printable.	`quopri.encode()` with `quotetabs=True` / `quopri.decode()`
uu_codec	uu	Convert the operand using uuencode.
zlib_codec	zip, zlib	Compress the operand using gzip.	`zlib.compress()` / `zlib.decompress()`

内容表

上一话题

下一话题

本页

codecs — 编解码器注册和基类 ¶

编解码器基类 ¶

错误处理程序 ¶

无状态编码和解码 ¶

增量编码和解码 ¶

IncrementalEncoder 对象 ¶

IncrementalDecoder 对象 ¶

流编码和解码 ¶

StreamWriter 对象 ¶

StreamReader 对象 ¶

StreamReaderWriter 对象 ¶

StreamWriter 对象 ¶

编码和 Unicode ¶

标准编码 ¶

Python 特定编码 ¶

文本编码 ¶

二进制变换 ¶

文本变换 ¶

encodings.idna — 应用程序中的国际化域名 ¶

encodings.mbcs — Windows ANSI 代码页 ¶

encodings.utf_8_sig — 具有 BOM (字节序标记) 签名的 UTF-8 编解码器 ¶

内容表

上一话题

下一话题

本页

`codecs` — 编解码器注册和基类 ¶

`encodings.idna` — 应用程序中的国际化域名 ¶

`encodings.mbcs` — Windows ANSI 代码页 ¶

`encodings.utf_8_sig` — 具有 BOM (字节序标记) 签名的 UTF-8 编解码器 ¶