pickle — Python 对象序列化

源代码: Lib/pickle.py


The pickle 模块实现用于序列化和反序列化 Python 对象结构的二进制协议。 “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a 二进制文件 or 像字节对象 ) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [ 1 ] or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

警告

The pickle 模块 is not secure 。只取消腌制信任数据。

It is possible to construct malicious pickle data which will execute arbitrary code during unpickling . Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Consider signing data with hmac if you need to ensure that it has not been tampered with.

Safer serialization formats such as json may be more appropriate if you are processing untrusted data. See 同 json 比较 .

与其它 Python 模块的关系

比较同 marshal

Python 拥有更原语的序列化模块称为 marshal , but in general pickle should always be the preferred way to serialize Python objects. marshal exists primarily to support Python’s .pyc 文件。

The pickle module differs from marshal in several significant ways:

  • The pickle module keeps track of the objects it has already serialized, so that later references to the same object won’t be serialized again. marshal doesn’t do this.

    This has implications both for recursive objects and object sharing. Recursive objects are objects that contain references to themselves. These are not handled by marshal, and in fact, attempting to marshal recursive objects will crash your Python interpreter. Object sharing happens when there are multiple references to the same object in different places in the object hierarchy being serialized. pickle stores such objects only once, and ensures that all other references point to the master copy. Shared objects remain shared, which can be very important for mutable objects.

  • marshal cannot be used to serialize user-defined classes and their instances. pickle can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored.

  • The marshal serialization format is not guaranteed to be portable across Python versions. Because its primary job in life is to support .pyc files, the Python implementers reserve the right to change the serialization format in non-backwards compatible ways should the need arise. The pickle serialization format is guaranteed to be backwards compatible across Python releases provided a compatible pickle protocol is chosen and pickling and unpickling code deals with Python 2 to Python 3 type differences if your data is crossing that unique breaking change language boundary.

比较同 json

There are fundamental differences between the pickle protocols and JSON (JavaScript 对象表示法) :

  • JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to utf-8 ), while pickle is a binary serialization format;

  • JSON is human-readable, while pickle is not;

  • JSON is interoperable and widely used outside of the Python ecosystem, while pickle is Python-specific;

  • JSON, by default, can only represent a subset of the Python built-in types, and no custom classes; pickle can represent an extremely large number of Python types (many of them automatically, by clever usage of Python’s introspection facilities; complex cases can be tackled by implementing 特定对象 API );

  • Unlike pickle, deserializing untrusted JSON does not in itself create an arbitrary code execution vulnerability.

另请参阅

The json module: a standard library module allowing JSON serialization and deserialization.

数据流格式

The data format used by pickle is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects.

默认情况下, pickle data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data.

模块 pickletools contains tools for analyzing data streams generated by pickle . pickletools source code has extensive comments about opcodes used by pickle protocols.

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

  • Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.

  • Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.

  • Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes 。参考 PEP 307 for information about improvements brought by protocol 2.

  • Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.

  • Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.

  • Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

注意

Serialization is a more primitive notion than persistence; although pickle reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects. The pickle module can transform a complex object into a byte stream and it can transform the byte stream into an object with the same internal structure. Perhaps the most obvious thing to do with these byte streams is to write them onto a file, but it is also conceivable to send them across a network or store them in a database. The shelve module provides a simple interface to pickle and unpickle objects on DBM-style database files.

模块接口

To serialize an object hierarchy, you simply call the dumps() function. Similarly, to de-serialize a data stream, you call the loads() function. However, if you want more control over serialization and de-serialization, you can create a Pickler Unpickler object, respectively.

The pickle module provides the following constants:

pickle. HIGHEST_PROTOCOL

An integer, the highest protocol version available. This value can be passed as a protocol value to functions dump() and dumps() as well as the Pickler 构造函数。

pickle. DEFAULT_PROTOCOL

An integer, the default protocol version used for pickling. May be less than HIGHEST_PROTOCOL . Currently the default protocol is 4, first introduced in Python 3.4 and incompatible with previous versions.

3.0 版改变: 默认协议为 3。

3.8 版改变: 默认协议为 4。