隔离扩展模块 ¶

Who should read this ¶

This guide is written for maintainers of C-API extensions who would like to make that extension safer to use in applications where Python itself is used as a library.

背景 ¶

An interpreter is the context in which Python code runs. It contains configuration (e.g. the import path) and runtime state (e.g. the set of imported modules).

Python supports running multiple interpreters in one process. There are two cases to think about—users may run interpreters:

in sequence, with several Py_InitializeEx() / Py_FinalizeEx() cycles, and
in parallel, managing “sub-interpreters” using Py_NewInterpreter() / Py_EndInterpreter() .

Both cases (and combinations of them) would be most useful when embedding Python within a library. Libraries generally shouldn’t make assumptions about the application that uses them, which include assuming a process-wide “main Python interpreter”.

Historically, Python extension modules don’t handle this use case well. Many extension modules (and even some stdlib modules) use per-process global state, because C static variables are extremely easy to use. Thus, data that should be specific to an interpreter ends up being shared between interpreters. Unless the extension developer is careful, it is very easy to introduce edge cases that lead to crashes when a module is loaded in more than one interpreter in the same process.

Unfortunately, per-interpreter state is not easy to achieve. Extension authors tend to not keep multiple interpreters in mind when developing, and it is currently cumbersome to test the behavior.

Enter Per-Module State ¶

Instead of focusing on per-interpreter state, Python’s C API is evolving to better support the more granular per-module state. This means that C-level data should be attached to a module object . Each interpreter creates its own module object, keeping the data separate. For testing the isolation, multiple module objects corresponding to a single extension can even be loaded in a single interpreter.

Per-module state provides an easy way to think about lifetime and resource ownership: the extension module will initialize when a module object is created, and clean up when it’s freed. In this regard, a module is just like any other PyObject * ; there are no “on interpreter shutdown” hooks to think—or forget—about.

Note that there are use cases for different kinds of “globals”: per-process, per-interpreter, per-thread or per-task state. With per-module state as the default, these are still possible, but you should treat them as exceptional cases: if you need them, you should give them additional care and testing. (Note that this guide does not cover them.)

Isolated Module Objects ¶

The key point to keep in mind when developing an extension module is that several module objects can be created from a single shared library. For example:

>>> import sys
>>> import binascii
>>> old_binascii = binascii
>>> del sys.modules['binascii']
>>> import binascii  # create a new module object
>>> old_binascii == binascii
False

As a rule of thumb, the two modules should be completely independent. All objects and state specific to the module should be encapsulated within the module object, not shared with other module objects, and cleaned up when the module object is deallocated. Since this just is a rule of thumb, exceptions are possible (see Managing Global State ), but they will need more thought and attention to edge cases.

While some modules could do with less stringent restrictions, isolated modules make it easier to set clear expectations and guidelines that work across a variety of use cases.

隔离扩展模块 ¶

Who should read this ¶

背景 ¶

Enter Per-Module State ¶

Isolated Module Objects ¶

Surprising Edge Cases ¶

Making Modules Safe with Multiple Interpreters ¶

Managing Global State ¶

Managing Per-Module State ¶

Opt-Out: Limiting to One Module Object per Process ¶

Module State Access from Functions ¶

Heap Types ¶

Changing Static Types to Heap Types ¶

Defining Heap Types ¶

Garbage-Collection Protocol ¶

`tp_traverse` in Python 3.8 and lower ¶

Delegating `tp_traverse` ¶

定义 `tp_dealloc` ¶

Not overriding `tp_free` ¶

Avoiding `PyObject_New` ¶

Module State Access from Classes ¶

Module State Access from Regular Methods ¶

Module State Access from Slot Methods, Getters and Setters ¶

Lifetime of the Module State ¶

Open Issues ¶

Per-Class Scope ¶

Lossless Conversion to Heap Types ¶

内容表

上一话题

下一话题

本页

内容表

上一话题

下一话题

本页

隔离扩展模块 ¶

Who should read this ¶

背景 ¶

Enter Per-Module State ¶

Isolated Module Objects ¶

Surprising Edge Cases ¶

Making Modules Safe with Multiple Interpreters ¶

Managing Global State ¶

Managing Per-Module State ¶

Opt-Out: Limiting to One Module Object per Process ¶

Module State Access from Functions ¶

Heap Types ¶

Changing Static Types to Heap Types ¶

Defining Heap Types ¶

Garbage-Collection Protocol ¶

tp_traverse in Python 3.8 and lower ¶

Delegating tp_traverse ¶

定义 tp_dealloc ¶

Not overriding tp_free ¶

Avoiding PyObject_New ¶

Module State Access from Classes ¶

Module State Access from Regular Methods ¶

Module State Access from Slot Methods, Getters and Setters ¶

Lifetime of the Module State ¶

Open Issues ¶

Per-Class Scope ¶

Lossless Conversion to Heap Types ¶

内容表

上一话题

下一话题

本页

`tp_traverse` in Python 3.8 and lower ¶

Delegating `tp_traverse` ¶

定义 `tp_dealloc` ¶

Not overriding `tp_free` ¶

Avoiding `PyObject_New` ¶