什么可以腌制和取消腌制?
¶
The following types can be pickled:
-
built-in constants (
None
,
True
,
False
,
Ellipsis
,和
NotImplemented
);
-
integers, floating-point numbers, complex numbers;
-
strings, bytes, bytearrays;
-
tuples, lists, sets, and dictionaries containing only picklable objects;
-
functions (built-in and user-defined) accessible from the top level of a module (using
def
, not
lambda
);
-
classes accessible from the top level of a module;
-
instances of such classes whose the result of calling
__getstate__()
is picklable (see section
腌制类实例
了解细节)。
Attempts to pickle unpicklable objects will raise the
PicklingError
exception; when this happens, an unspecified number of bytes may have already been written to the underlying file. Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, a
RecursionError
will be raised in this case. You can carefully raise this limit with
sys.setrecursionlimit()
.
Note that functions (built-in and user-defined) are pickled by fully
合格名称
, not by value.
This means that only the function name is pickled, along with the name of the containing module and classes. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised.
Similarly, classes are pickled by fully qualified name, so the same restrictions in the unpickling environment apply. Note that none of the class’s code or data is pickled, so in the following example the class attribute
attr
is not restored in the unpickling environment:
class Foo:
attr = 'A class attribute'
picklestring = pickle.dumps(Foo)
These restrictions are why picklable functions and classes must be defined at the top level of a module.
Similarly, when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled. This is done on purpose, so you can fix bugs in a class or add methods to the class and still load objects that were created with an earlier version of the class. If you plan to have long-lived objects that will see many versions of a class, it may be worthwhile to put a version number in the objects so that suitable conversions can be made by the class’s
__setstate__()
方法。
腌制类实例
¶
In this section, we describe the general mechanisms available to you to define, customize, and control how class instances are pickled and unpickled.
In most cases, no additional code is needed to make instances picklable. By default, pickle will retrieve the class and the attributes of an instance via introspection. When a class instance is unpickled, its
__init__()
method is usually
not
invoked. The default behaviour first creates an uninitialized instance and then restores the saved attributes. The following code shows an implementation of this behaviour:
def save(obj):
return (obj.__class__, obj.__dict__)
def restore(cls, attributes):
obj = cls.__new__(cls)
obj.__dict__.update(attributes)
return obj
Classes can alter the default behaviour by providing one or several special methods:
-
对象。
__getnewargs_ex__
(
)
¶
-
In protocols 2 and newer, classes that implements the
__getnewargs_ex__()
method can dictate the values passed to the
__new__()
method upon unpickling. The method must return a pair
(args, kwargs)
where
args
is a tuple of positional arguments and
kwargs
a dictionary of named arguments for constructing the object. Those will be passed to the
__new__()
method upon unpickling.
You should implement this method if the
__new__()
method of your class requires keyword-only arguments. Otherwise, it is recommended for compatibility to implement
__getnewargs__()
.
3.6 版改变:
__getnewargs_ex__()
现在用于协议 2 和 3。
-
对象。
__getnewargs__
(
)
¶
-
This method serves a similar purpose as
__getnewargs_ex__()
, but supports only positional arguments. It must return a tuple of arguments
args
which will be passed to the
__new__()
method upon unpickling.
__getnewargs__()
will not be called if
__getnewargs_ex__()
有定义。
3.6 版改变:
在 Python 3.6 之前,
__getnewargs__()
was called instead of
__getnewargs_ex__()
in protocols 2 and 3.
-
对象。
__getstate__
(
)
¶
-
Classes can further influence how their instances are pickled by overriding the method
__getstate__()
. It is called and the returned object is pickled as the contents for the instance, instead of a default state. There are several cases:
-
For a class that has no instance
__dict__
and no
__slots__
, the default state is
None
.
-
For a class that has an instance
__dict__
and no
__slots__
, the default state is
self.__dict__
.
-
For a class that has an instance
__dict__
and
__slots__
, the default state is a tuple consisting of two dictionaries:
self.__dict__
, and a dictionary mapping slot names to slot values. Only slots that have a value are included in the latter.
-
For a class that has
__slots__
and no instance
__dict__
, the default state is a tuple whose first item is
None
and whose second item is a dictionary mapping slot names to slot values described in the previous bullet.
3.11 版改变:
Added the default implementation of the
__getstate__()
method in the
object
类。
-
对象。
__setstate__
(
state
)
¶
-
Upon unpickling, if the class defines
__setstate__()
, it is called with the unpickled state. In that case, there is no requirement for the state object to be a dictionary. Otherwise, the pickled state must be a dictionary and its items are assigned to the new instance’s dictionary.
Refer to the section
处理有状态对象
for more information about how to use the methods
__getstate__()
and
__setstate__()
.
注意
At unpickling time, some methods like
__getattr__()
,
__getattribute__()
,或
__setattr__()
may be called upon the instance. In case those methods rely on some internal invariant being true, the type should implement
__new__()
to establish such an invariant, as
__init__()
is not called when unpickling an instance.
As we shall see, pickle does not use directly the methods described above. In fact, these methods are part of the copy protocol which implements the
__reduce__()
special method. The copy protocol provides a unified interface for retrieving the data necessary for pickling and copying objects.
Although powerful, implementing
__reduce__()
directly in your classes is error prone. For this reason, class designers should use the high-level interface (i.e.,
__getnewargs_ex__()
,
__getstate__()
and
__setstate__()
) whenever possible. We will show, however, cases where using
__reduce__()
is the only option or leads to more efficient pickling or both.
-
对象。
__reduce__
(
)
¶
-
The interface is currently defined as follows. The
__reduce__()
method takes no argument and shall return either a string or preferably a tuple (the returned object is often referred to as the “reduce value”).
If a string is returned, the string should be interpreted as the name of a global variable. It should be the object’s local name relative to its module; the pickle module searches the module namespace to determine the object’s module. This behaviour is typically useful for singletons.
When a tuple is returned, it must be between two and six items long. Optional items can either be omitted, or
None
can be provided as their value. The semantics of each item are in order:
-
A callable object that will be called to create the initial version of the object.
-
A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument.
-
Optionally, the object’s state, which will be passed to the object’s
__setstate__()
method as previously described. If the object has no such method then, the value must be a dictionary and it will be added to the object’s
__dict__
属性。
-
Optionally, an iterator (and not a sequence) yielding successive items. These items will be appended to the object either using
obj.append(item)
or, in batch, using
obj.extend(list_of_items)
. This is primarily used for list subclasses, but may be used by other classes as long as they have
append and extend methods
with the appropriate signature. (Whether
append()
or
extend()
is used depends on which pickle protocol version is used as well as the number of items to append, so both must be supported.)
-
Optionally, an iterator (not a sequence) yielding successive key-value pairs. These items will be stored to the object using
obj[key] =
value
. This is primarily used for dictionary subclasses, but may be used by other classes as long as they implement
__setitem__()
.
-
Optionally, a callable with a
(obj, state)
signature. This callable allows the user to programmatically control the state-updating behavior of a specific object, instead of using
obj
’s static
__setstate__()
method. If not
None
, this callable will have priority over
obj
’s
__setstate__()
.
Added in version 3.8:
The optional sixth tuple item,
(obj, state)
, was added.
-
对象。
__reduce_ex__
(
protocol
)
¶
-
Alternatively, a
__reduce_ex__()
method may be defined. The only difference is this method should take a single integer argument, the protocol version. When defined, pickle will prefer it over the
__reduce__()
method. In addition,
__reduce__()
automatically becomes a synonym for the extended version. The main use for this method is to provide backwards-compatible reduce values for older Python releases.
外部对象的持久性
¶
For the benefit of object persistence, the
pickle
module supports the notion of a reference to an object outside the pickled data stream. Such objects are referenced by a persistent ID, which should be either a string of alphanumeric characters (for protocol 0)
or just an arbitrary object (for any newer protocol).
The resolution of such persistent IDs is not defined by the
pickle
module; it will delegate this resolution to the user-defined methods on the pickler and unpickler,
persistent_id()
and
persistent_load()
分别。
To pickle objects that have an external persistent ID, the pickler must have a custom
persistent_id()
method that takes an object as an argument and returns either
None
or the persistent ID for that object. When
None
is returned, the pickler simply pickles the object as normal. When a persistent ID string is returned, the pickler will pickle that object, along with a marker so that the unpickler will recognize it as a persistent ID.
To unpickle external objects, the unpickler must have a custom
persistent_load()
method that takes a persistent ID object and returns the referenced object.
Here is a comprehensive example presenting how persistent ID can be used to pickle external objects by reference.
# Simple example presenting how persistent ID can be used to pickle
# external objects by reference.
import pickle
import sqlite3
from collections import namedtuple
# Simple class representing a record in our database.
MemoRecord = namedtuple("MemoRecord", "key, task")
class DBPickler(pickle.Pickler):
def persistent_id(self, obj):
# Instead of pickling MemoRecord as a regular class instance, we emit a
# persistent ID.
if isinstance(obj, MemoRecord):
# Here, our persistent ID is simply a tuple, containing a tag and a
# key, which refers to a specific record in the database.
return ("MemoRecord", obj.key)
else:
# If obj does not have a persistent ID, return None. This means obj
# needs to be pickled as usual.
return None
class DBUnpickler(pickle.Unpickler):
def __init__(self, file, connection):
super().__init__(file)
self.connection = connection
def persistent_load(self, pid):
# This method is invoked whenever a persistent ID is encountered.
# Here, pid is the tuple returned by DBPickler.
cursor = self.connection.cursor()
type_tag, key_id = pid
if type_tag == "MemoRecord":
# Fetch the referenced record from the database and return it.
cursor.execute("SELECT * FROM memos WHERE key=?", (str(key_id),))
key, task = cursor.fetchone()
return MemoRecord(key, task)
else:
# Always raises an error if you cannot return the correct object.
# Otherwise, the unpickler will think None is the object referenced
# by the persistent ID.
raise pickle.UnpicklingError("unsupported persistent object")
def main():
import io
import pprint
# Initialize and populate our database.
conn = sqlite3.connect(":memory:")
cursor = conn.cursor()
cursor.execute("CREATE TABLE memos(key INTEGER PRIMARY KEY, task TEXT)")
tasks = (
'give food to fish',
'prepare group meeting',
'fight with a zebra',
)
for task in tasks:
cursor.execute("INSERT INTO memos VALUES(NULL, ?)", (task,))
# Fetch the records to be pickled.
cursor.execute("SELECT * FROM memos")
memos = [MemoRecord(key, task) for key, task in cursor]
# Save the records using our custom DBPickler.
file = io.BytesIO()
DBPickler(file).dump(memos)
print("Pickled records:")
pprint.pprint(memos)
# Update a record, just for good measure.
cursor.execute("UPDATE memos SET task='learn italian' WHERE key=1")
# Load the records from the pickle data stream.
file.seek(0)
memos = DBUnpickler(file, conn).load()
print("Unpickled records:")
pprint.pprint(memos)
if __name__ == '__main__':
main()
分派表
¶
If one wants to customize pickling of some classes without disturbing any other code which depends on pickling, then one can create a pickler with a private dispatch table.
The global dispatch table managed by the
copyreg
module is available as
copyreg.dispatch_table
. Therefore, one may choose to use a modified copy of
copyreg.dispatch_table
as a private dispatch table.
例如
f = io.BytesIO()
p = pickle.Pickler(f)
p.dispatch_table = copyreg.dispatch_table.copy()
p.dispatch_table[SomeClass] = reduce_SomeClass
创建实例化的
pickle.Pickler
with a private dispatch table which handles the
SomeClass
class specially. Alternatively, the code
class MyPickler(pickle.Pickler):
dispatch_table = copyreg.dispatch_table.copy()
dispatch_table[SomeClass] = reduce_SomeClass
f = io.BytesIO()
p = MyPickler(f)
does the same but all instances of
MyPickler
will by default share the private dispatch table. On the other hand, the code
copyreg.pickle(SomeClass, reduce_SomeClass)
f = io.BytesIO()
p = pickle.Pickler(f)
modifies the global dispatch table shared by all users of the
copyreg
模块。
处理有状态对象
¶
Here’s an example that shows how to modify pickling behavior for a class. The
TextReader
class below opens a text file, and returns the line number and line contents each time its
readline()
method is called. If a
TextReader
instance is pickled, all attributes
except
the file object member are saved. When the instance is unpickled, the file is reopened, and reading resumes from the last location. The
__setstate__()
and
__getstate__()
methods are used to implement this behavior.
class TextReader:
"""Print and number lines in a text file."""
def __init__(self, filename):
self.filename = filename
self.file = open(filename)
self.lineno = 0
def readline(self):
self.lineno += 1
line = self.file.readline()
if not line:
return None
if line.endswith('\n'):
line = line[:-1]
return "%i: %s" % (self.lineno, line)
def __getstate__(self):
# Copy the object's state from self.__dict__ which contains
# all our instance attributes. Always use the dict.copy()
# method to avoid modifying the original state.
state = self.__dict__.copy()
# Remove the unpicklable entries.
del state['file']
return state
def __setstate__(self, state):
# Restore instance attributes (i.e., filename and lineno).
self.__dict__.update(state)
# Restore the previously opened file's state. To do so, we need to
# reopen it and read from it until the line count is restored.
file = open(self.filename)
for _ in range(self.lineno):
file.readline()
# Finally, save the file.
self.file = file
A sample usage might be something like this:
>>> reader = TextReader("hello.txt")
>>> reader.readline()
'1: Hello world!'
>>> reader.readline()
'2: I am line number two.'
>>> new_reader = pickle.loads(pickle.dumps(reader))
>>> new_reader.readline()
'3: Goodbye!'
类型、函数和其它对象的自定义缩减
¶
Added in version 3.8.
Sometimes,
dispatch_table
may not be flexible enough. In particular we may want to customize pickling based on another criterion than the object’s type, or we may want to customize the pickling of functions and classes.
For those cases, it is possible to subclass from the
Pickler
class and implement a
reducer_override()
method. This method can return an arbitrary reduction tuple (see
__reduce__()
). It can alternatively return
NotImplemented
to fallback to the traditional behavior.
If both the
dispatch_table
and
reducer_override()
are defined, then
reducer_override()
method takes priority.
注意
For performance reasons,
reducer_override()
may not be called for the following objects:
None
,
True
,
False
, and exact instances of
int
,
float
,
bytes
,
str
,
dict
,
set
,
frozenset
,
list
and
tuple
.
Here is a simple example where we allow pickling and reconstructing a given class:
import io
import pickle
class MyClass:
my_attribute = 1
class MyPickler(pickle.Pickler):
def reducer_override(self, obj):
"""Custom reducer for MyClass."""
if getattr(obj, "__name__", None) == "MyClass":
return type, (obj.__name__, obj.__bases__,
{'my_attribute': obj.my_attribute})
else:
# For any other object, fallback to usual reduction
return NotImplemented
f = io.BytesIO()
p = MyPickler(f)
p.dump(MyClass)
del MyClass
unpickled_class = pickle.loads(f.getvalue())
assert isinstance(unpickled_class, type)
assert unpickled_class.__name__ == "MyClass"
assert unpickled_class.my_attribute == 1
波段外缓冲
¶
Added in version 3.8.
In some contexts, the
pickle
module is used to transfer massive amounts of data. Therefore, it can be important to minimize the number of memory copies, to preserve performance and resource consumption. However, normal operation of the
pickle
module, as it transforms a graph-like structure of objects into a sequential stream of bytes, intrinsically involves copying data to and from the pickle stream.
This constraint can be eschewed if both the
provider
(the implementation of the object types to be transferred) and the
consumer
(the implementation of the communications system) support the out-of-band transfer facilities provided by pickle protocol 5 and higher.
提供者 API
¶
The large data objects to be pickled must implement a
__reduce_ex__()
method specialized for protocol 5 and higher, which returns a
PickleBuffer
instance (instead of e.g. a
bytes
object) for any large data.
A
PickleBuffer
对象
signals
that the underlying buffer is eligible for out-of-band data transfer. Those objects remain compatible with normal usage of the
pickle
module. However, consumers can also opt-in to tell
pickle
that they will handle those buffers by themselves.
消费者 API
¶
A communications system can enable custom handling of the
PickleBuffer
objects generated when serializing an object graph.
On the sending side, it needs to pass a
buffer_callback
自变量对于
Pickler
(or to the
dump()
or
dumps()
function), which will be called with each
PickleBuffer
generated while pickling the object graph. Buffers accumulated by the
buffer_callback
will not see their data copied into the pickle stream, only a cheap marker will be inserted.
On the receiving side, it needs to pass a
buffers
自变量对于
Unpickler
(or to the
load()
or
loads()
function), which is an iterable of the buffers which were passed to
buffer_callback
. That iterable should produce buffers in the same order as they were passed to
buffer_callback
. Those buffers will provide the data expected by the reconstructors of the objects whose pickling produced the original
PickleBuffer
对象。
Between the sending side and the receiving side, the communications system is free to implement its own transfer mechanism for out-of-band buffers. Potential optimizations include the use of shared memory or datatype-dependent compression.
范例
¶
Here is a trivial example where we implement a
bytearray
subclass able to participate in out-of-band buffer pickling:
class ZeroCopyByteArray(bytearray):
def __reduce_ex__(self, protocol):
if protocol >= 5:
return type(self)._reconstruct, (PickleBuffer(self),), None
else:
# PickleBuffer is forbidden with pickle protocols <= 4.
return type(self)._reconstruct, (bytearray(self),)
@classmethod
def _reconstruct(cls, obj):
with memoryview(obj) as m:
# Get a handle over the original buffer object
obj = m.obj
if type(obj) is cls:
# Original buffer object is a ZeroCopyByteArray, return it
# as-is.
return obj
else:
return cls(obj)
重构器 (
_reconstruct
class method) returns the buffer’s providing object if it has the right type. This is an easy way to simulate zero-copy behaviour on this toy example.
On the consumer side, we can pickle those objects the usual way, which when unserialized will give us a copy of the original object:
b = ZeroCopyByteArray(b"abc")
data = pickle.dumps(b, protocol=5)
new_b = pickle.loads(data)
print(b == new_b) # True
print(b is new_b) # False: a copy was made
But if we pass a
buffer_callback
and then give back the accumulated buffers when unserializing, we are able to get back the original object:
b = ZeroCopyByteArray(b"abc")
buffers = []
data = pickle.dumps(b, protocol=5, buffer_callback=buffers.append)
new_b = pickle.loads(data, buffers=buffers)
print(b == new_b) # True
print(b is new_b) # True: no copy was made
This example is limited by the fact that
bytearray
allocates its own memory: you cannot create a
bytearray
instance that is backed by another object’s memory. However, third-party datatypes such as NumPy arrays do not have this limitation, and allow use of zero-copy pickling (or making as few copies as possible) when transferring between distinct processes or systems.
另请参阅
PEP 574
– Pickle protocol 5 with out-of-band data
限定全局
¶
By default, unpickling will import any class or function that it finds in the pickle data. For many applications, this behaviour is unacceptable as it permits the unpickler to import and invoke arbitrary code. Just consider what this hand-crafted pickle data stream does when loaded:
>>> import pickle
>>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
hello world
0
In this example, the unpickler imports the
os.system()
function and then apply the string argument “echo hello world”. Although this example is inoffensive, it is not difficult to imagine one that could damage your system.
For this reason, you may want to control what gets unpickled by customizing
Unpickler.find_class()
. Unlike its name suggests,
Unpickler.find_class()
is called whenever a global (i.e., a class or a function) is requested. Thus it is possible to either completely forbid globals or restrict them to a safe subset.
Here is an example of an unpickler allowing only few safe classes from the
builtins
module to be loaded:
import builtins
import io
import pickle
safe_builtins = {
'range',
'complex',
'set',
'frozenset',
'slice',
}
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
# Only allow safe classes from builtins.
if module == "builtins" and name in safe_builtins:
return getattr(builtins, name)
# Forbid everything else.
raise pickle.UnpicklingError("global '%s.%s' is forbidden" %
(module, name))
def restricted_loads(s):
"""Helper function analogous to pickle.loads()."""
return RestrictedUnpickler(io.BytesIO(s)).load()
A sample usage of our unpickler working as intended:
>>> restricted_loads(pickle.dumps([1, 2, range(15)]))
[1, 2, range(0, 15)]
>>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.")
Traceback (most recent call last):
...
pickle.UnpicklingError: global 'os.system' is forbidden
>>> restricted_loads(b'cbuiltins\neval\n'
... b'(S\'getattr(__import__("os"), "system")'
... b'("echo hello world")\'\ntR.')
Traceback (most recent call last):
...
pickle.UnpicklingError: global 'builtins.eval' is forbidden
As our examples shows, you have to be careful with what you allow to be unpickled. Therefore if security is a concern, you may want to consider alternatives such as the marshalling API in
xmlrpc.client
or third-party solutions.
范例
¶
For the simplest code, use the
dump()
and
load()
函数。
import pickle
# An arbitrary collection of objects supported by pickle.
data = {
'a': [1, 2.0, 3+4j],
'b': ("character string", b"byte string"),
'c': {None, True, False}
}
with open('data.pickle', 'wb') as f:
# Pickle the 'data' dictionary using the highest protocol available.
pickle.dump(data, f, pickle.HIGHEST_PROTOCOL)
The following example reads the resulting pickled data.
import pickle
with open('data.pickle', 'rb') as f:
# The protocol version used is detected automatically, so we do not
# have to specify it.
data = pickle.load(f)
另请参阅
-
模块
copyreg
-
Pickle interface constructor registration for extension types.
-
模块
pickletools
-
Tools for working with and analyzing pickled data.
-
模块
shelve
-
Indexed databases of objects; uses
pickle
.
-
模块
copy
-
Shallow and deep object copying.
-
模块
marshal
-
High-performance serialization of built-in types.
脚注