若您知道如何按 C 编程,向 Python 添加新的内置模块就非常容易。这种 扩展模块 可以做 2 件在 Python 中无法直接完成的事情:可以实现新的内置对象类型,可以调用 C 库函数和系统调用。
为支持扩展,Python API (应用程序程序员接口) 定义了一组函数、宏和变量,提供对 Python 运行时系统大多数方面的访问。Python API 被纳入在 C 源文件中通过包括头
"Python.h"
.
扩展模块的编译从属其所谓用法及系统设置;细节在以后章节中给出。
注意
C 扩展接口特定于 CPython,且扩展模块不工作于其它 Python 实现。在很多情况下,避免编写 C 扩展并保留对其它实现的可移植性是可能的。例如,若用例是调用 C 库函数或系统调用,应考虑使用
ctypes
模块或
cffi
库而不是编写自定义 C 代码。这些模块让您编写 Python 代码以接口 C 代码,且在 Python 实现之间更可移植 (相比编写和编译 C 扩展模块)。
让我们创建扩展模块称为
spam
(Monty Python 粉丝喜爱的食物 …) 并假设我们想创建 Python 接口到 C 库函数
system()
[1]
。此函数接受以 Null 结尾的字符串作为自变量,并返回整数。想要从 Python 调用此函数,如下所示:
>>> import spam
>>> status = spam.system("ls -l")
开始创建文件
spammodule.c
。(在过去,若模块称为
spam
,包含其实现的 C 文件将称为
spammodule.c
;若模块名很长,像
spammify
,模块名称可以仅仅为
spammify.c
.)
文件的首行可以是:
#include <Python.h>
拉出 Python API (可以添加模块目的描述注释和版权声明,若喜欢)。
注意
由于 Python 可能定义一些 (影响某些系统标准头的) 预处理器定义,
must
包括
Python.h
在包括任何标准头之前。
所有用户可见符号的定义通过
Python.h
拥有前缀为
Py
or
PY
,除在标准头文件中定义的那些外。为方便起见,且由于它们被广泛用于 Python 解释器,
"Python.h"
包括一些标准头文件:
<stdio.h>
,
<string.h>
,
<errno.h>
,和
<stdlib.h>
。若您的系统不存在之后的头文件,它会声明函数
malloc()
,
free()
and
realloc()
直接。
接下来添加到模块文件中的是要被调用的 C 函数,当 Python 表达式
spam.system(string)
被评估时 (我们很快将看到它是如何被调用的):
static PyObject *
spam_system(PyObject *self, PyObject *args)
{
const char *command;
int sts;
if (!PyArg_ParseTuple(args, "s", &command))
return NULL;
sts = system(command);
return PyLong_FromLong(sts);
}
有从 Python 自变量列表直接翻译(例如,单表达式
"ls
-l"
) 成传递给 C 函数的自变量。C 函数始终有 2 自变量,按照惯例命名
self
and
args
.
self 自变量指向用于模块级函数的模块对象;对于方法,它将指向对象实例。
args
自变量将是指针指向包含自变量的 Python 元组对象。元组的每项相当于调用参数列表中的自变量。自变量是 Python 对象 — 为了在 C 函数中能对它们做任何事情,必须将它们转换为 C 值。函数
PyArg_ParseTuple()
在 Python API 中校验自变量类型并将它们转换为 C 值。它使用模板字符串以确定要求自变量类型及存储转换值的 C 变量类型。稍后更多关于这。
PyArg_ParseTuple()
返回 True (非零) 若所有自变量拥有正确类型且其组件已存储在其传递地址的变量中。返回 False (零) 若传递的是无效自变量列表。在后一种情况下,它还引发适当异常,以便调用函数可以返回
NULL
立即 (如在范例中看到的)。
贯穿 Python 解释器的重要约定如下:当函数失败时,应该设置异常条件并返回错误值 (通常是
NULL
指针)。异常存储在解释器内的静态全局变量中;若此变量为
NULL
不会发生异常。第 2 全局变量存储异常的关联值 (第 2 自变量为
raise
)。第 3 变量包含堆栈回溯,若错误发源于 Python 代码。这 3 变量是 Python 中结果相当于 C 的
sys.exc_info()
(见章节对于模块
sys
在 Python 库参考中)。知道它们以理解错误是如何传递的很重要。
Python API 定义了许多函数来设置各种类型的异常。
最常见的一个是
PyErr_SetString()
。其自变量是异常对象和 C 字符串。异常对象通常是预定义对象,像
PyExc_ZeroDivisionError
。C 字符串指示错误原因 (转换为 Python 字符串对象) 并存储为异常关联值。
另一有用函数是
PyErr_SetFromErrno()
,只接受异常自变量和构造关联值通过审查全局变量
errno
。最一般函数是
PyErr_SetObject()
,接受 2 对象自变量、异常及其关联值。不需要
Py_INCREF()
对象 (传递给这些函数中的任何一个)。
可以非破坏性测试是否有设置异常采用
PyErr_Occurred()
。这返回当前异常对象,或
NULL
若没有发生异常。通常不需要调用
PyErr_Occurred()
来看在函数调用中是否发生错误,因为应该能从返回值辨别。
当函数
f
调用另一函数
g
检测后者失败,
f
本身应该返回错误值 (通常是
NULL
or
-1
)。它应该
not
调用某一
PyErr_*()
函数 — 一个已有调用是通过
g
.
f
的调用者,那么还应该将错误指示返回给
its
调用者,再次
without
调用
PyErr_*()
,依此类推 — 出错的最详细原因已被首先检测它的函数所报告。一旦错误到达 Python 解释器主循环,这会中止目前正执行 Python 代码,并试着查找由 Python 程序员指定的异常处理程序。
(在某些情况下,模块实际上可以给出更详细错误消息通过调用另一
PyErr_*()
函数,且在这种情况下,这样做很好。不管怎样,作为一般规则,这不是必要的,且可能导致错误原因有关信息的丢失:大多数操作可能出于各种原因失败。)
要忽略由失败调用函数设置的异常,必须明确清零异常条件通过调用
PyErr_Clear()
。C 代码应唯一时间调用
PyErr_Clear()
的是,若不想将错误传递给解释器,而是完全想要自己处理它 (可能通过尝试其它事情,或假装事情没有出错)。
每次失败
malloc()
调用必须被转换成异常 — 直接调用者的
malloc()
(或
realloc()
) 必须调用
PyErr_NoMemory()
并返回故障指示器本身。所有对象创建函数 (例如
PyLong_FromLong()
) 已做到这点,因此,此注意事项只相关那些有调用
malloc()
直接。
另请注意,采用重要异常的
PyArg_ParseTuple()
和好友,返回整数状态的函数通常返回正值或 0 对于成功和
-1
对于故障,像 Unix 系统调用。
最后,小心清理垃圾 (通过使
Py_XDECREF()
or
Py_DECREF()
调用已创建的对象) 当返回错误指示器时!
选择引发哪个异常完全由您。有对应所有内置 Python 异常的预声明 C 对象,譬如
PyExc_ZeroDivisionError
,可以直接使用。当然,应明智选择异常 — 不要使用
PyExc_TypeError
意味着文件无法打开 (这应该可能是
PyExc_IOError
)。若自变量列表有什么地方出错,
PyArg_ParseTuple()
函数通常引发
PyExc_TypeError
。若自变量值必须在特定范围内或必须满足其它条件,
PyExc_ValueError
很合适。
还可以为模块定义唯一新异常。为此,通常在文件开头声明静态对象变量:
static PyObject *SpamError;
并在模块的初始化函数中初始化它 (
PyInit_spam()
) 采用异常对象 (现删去错误校验):
PyMODINIT_FUNC
PyInit_spam(void)
{
PyObject *m;
m = PyModule_Create(&spammodule);
if (m == NULL)
return NULL;
SpamError = PyErr_NewException("spam.error", NULL, NULL);
Py_INCREF(SpamError);
PyModule_AddObject(m, "error", SpamError);
return m;
}
注意,异常对象的 Python 名称是
spam.error
。
PyErr_NewException()
函数可能创建类采用基类
Exception
(除非传入另一个类而不是
NULL
),描述在
内置异常
.
另请注意
SpamError
变量保留对新近创建异常类的引用;这是有意的!由于通过外部代码可以从模块移除异常,因此需要类拥有引用以确保它不会被丢弃,导致
SpamError
变为悬空指针。若它变为悬空指针,引发异常的 C 代码可能导致核心转储或其它意外副作用。
讨论使用
PyMODINIT_FUNC
作为在此范例中稍后返回类型的函数。
spam.error
异常可以在扩展模块中被引发,使用调用
PyErr_SetString()
如下所示:
static PyObject *
spam_system(PyObject *self, PyObject *args)
{
const char *command;
int sts;
if (!PyArg_ParseTuple(args, "s", &command))
return NULL;
sts = system(command);
if (sts < 0) {
PyErr_SetString(SpamError, "System command failed");
return NULL;
}
return PyLong_FromLong(sts);
}
回到范例函数,现在应该能够理解此语句:
if (!PyArg_ParseTuple(args, "s", &command))
return NULL;
它返回
NULL
(返回对象指针的函数错误指示器) 若在自变量列表中检测到错误,则设置依赖异常通过
PyArg_ParseTuple()
。否则自变量的字符串值已拷贝到局部变量
command
。这是指针赋值且不应修改它指向的字符串 (因此在标准 C 中,变量
command
应该被正确声明为
const
char
*command
).
下一语句调用 Unix 函数
system()
,把刚刚获得的字符串传递给它从
PyArg_ParseTuple()
:
sts = system(command);
Our
spam.system()
函数必须返回值
sts
作为 Python 对象。做到这点是使用函数
PyLong_FromLong()
.
return PyLong_FromLong(sts);
在这种情况下,它将返回整数对象。(是的,即使整数是 Python 堆中的对象!)
若有返回无用自变量的 C 函数 (函数返回
void
),相应 Python 函数必须返回
None
。需要此习语以做到这点 (其被实现通过
Py_RETURN_NONE
宏):
Py_INCREF(Py_None);
return Py_None;
Py_None
是 C 名称对于特殊 Python 对象
None
。它是真正的 Python 对象,而不是
NULL
指针,在大多数上下文中意味着错误,正如所见。
我答应展示如何
spam_system()
被调用从 Python 程序。首先,需要在方法表中列表其名称和地址:
static PyMethodDef SpamMethods[] = {
...
{"system", spam_system, METH_VARARGS,
"Execute a shell command."},
...
{NULL, NULL, 0, NULL} /* Sentinel */
};
注意,第 3 条目 (
METH_VARARGS
)。这是告诉解释器用于 C 函数的调用约定标志。通常应始终是
METH_VARARGS
or
METH_VARARGS
|
METH_KEYWORDS
;值
0
意味着过时变体
PyArg_ParseTuple()
被使用。
当仅使用
METH_VARARGS
,函数应该期望将 Python 级别的参数作为可接受元组传入,对于剖析凭借
PyArg_ParseTuple()
;下文提供有关此函数的更多信息。
METH_KEYWORDS
位可以在第 3 字段中设置,若应将关键词自变量传递给函数。在这种情况下,C 函数应接受第 3
PyObject
*
参数 (将是关键词字典)。使用
PyArg_ParseTupleAndKeywords()
以剖析这种函数的自变量。
必须在模块定义结构中引用方法表:
static struct PyModuleDef spammodule = {
PyModuleDef_HEAD_INIT,
"spam", /* name of module */
spam_doc, /* module documentation, may be NULL */
-1, /* size of per-interpreter state of the module,
or -1 if the module keeps state in global variables. */
SpamMethods
};
反过来,此结构必须传递给解释器在模块的初始化函数中。初始化函数必须命名为
PyInit_name()
,其中
name
是模块名称,且应该是唯一非
static
项定义在模块文件中:
PyMODINIT_FUNC
PyInit_spam(void)
{
return PyModule_Create(&spammodule);
}
注意 PyMODINIT_FUNC 将函数声明为
PyObject
*
返回类型,声明平台要求的任何特殊连锁声明,而 C++ 将函数声明为
extern
"C"
.
当 Python 程序导入模块
spam
对于首次,
PyInit_spam()
被调用。(见下文了解有关嵌入 Python 的注释) 它调用
PyModule_Create()
,返回模块对象,并将内置函数对象插入新近创建模块根据表 (数组
PyMethodDef
结构) 找到的在模块定义中。
PyModule_Create()
返回指向由它创建的模块对象的指针。它可能因致命错误 (对于某些错误) 而中止,或返回
NULL
若模块初始化无法令人满意。init 函数必须将模块对象返回给它的调用者,以便随后将其插入
sys.modules
.
当嵌入 Python 时,
PyInit_spam()
函数不会被自动调用,除非有条目在
PyImport_Inittab
表。要将模块添加到初始化表,使用
PyImport_AppendInittab()
,可选紧跟模块导入:
int
main(int argc, char *argv[])
{
wchar_t *program = Py_DecodeLocale(argv[0], NULL);
if (program == NULL) {
fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
exit(1);
}
/* Add a built-in module, before Py_Initialize */
PyImport_AppendInittab("spam", PyInit_spam);
/* Pass argv[0] to the Python interpreter */
Py_SetProgramName(program);
/* Initialize the Python interpreter. Required. */
Py_Initialize();
/* Optionally import the module; alternatively,
import can be deferred until the embedded script
imports it. */
PyImport_ImportModule("spam");
...
PyMem_RawFree(program);
return 0;
}
注意
移除条目从
sys.modules
或将编译模块导入进程中的多个解释器 (或紧跟
fork()
不干预
exec()
) 可能对某些扩展模块产生问题。扩展模块作者应谨慎行事,当初始化内部数据结构时。
Python 源代码分发包括更实质范例模块,如
Modules/xxmodule.c
。此文件可用作模板或作为简单范例阅读。
注意
不像
spam
范例,
xxmodule
使用
多阶段初始化
(Python 3.5 的新功能),其中 PyModuleDef 结构返回自
PyInit_spam
,而模块的创建留给导入机器。有关多阶段初始化的细节,见
PEP 489
.
有 2 件事情要做,在可以使用新扩展之前:编译并将其链接到 Python 系统。若使用动态加载,细节可能从属系统使用的动态加载风格;见构建扩展模块有关章节 (章节 构建 C/C++ 扩展 ) 和仅在 Windows 构建的有关额外信息 (章节 在 Windows 构建 C/C++ 扩展 ) 了解关于这的更多信息。
若不能使用动态加载,或想要使模块成为 Python 解释器的永久部分,将不得不更改配置设置并重建解释器。很幸运,这在 Unix 非常简单:仅仅放置文件 (
spammodule.c
例如) 在
Modules/
目录对于解包后的源代码分发,添加行到文件
Modules/Setup.local
描述您的文件:
spam spammodule.o
和重新构建解释器通过运行
make
在顶层目录中。还可以运行
make
在
Modules/
子目录,但必须先重新构建
Makefile
通过运行
make
Makefile。(这是必要的每次改变
Setup
文件。)
若模块要求链接额外库,还可以在配置文件行中列出这些,例如:
spam spammodule.o -lX11
到目前为止,我们全力以赴于使 C 函数从 Python 可调用。反过来也很有用:从 C 调用 Python 函数。对于支持所谓 callback (回调) 函数的库来说,尤其如此。若 C 接口使回调可用,则等效 Python 经常需要向 Python 程序员提供回调机制;实现将要求从 C 回调调用 Python 回调函数。其他用法也可想象。
幸而,Python 解释器很容易递归调用,且调用 Python 函数有标准接口。(我不想停留在如何采用特定字符串作为输入调用 Python 解析器 — 若您感兴趣,可以查看实现的
-c
命令行选项在
Modules/main.c
从 Python 源代码。)
Calling a Python function is easy. First, the Python program must somehow pass you the Python function object. You should provide a function (or some other interface) to do this. When this function is called, save a pointer to the Python function object (be careful to
Py_INCREF()
it!) in a global variable — or wherever you see fit. For example, the following function might be part of a module definition:
static PyObject *my_callback = NULL;
static PyObject *
my_set_callback(PyObject *dummy, PyObject *args)
{
PyObject *result = NULL;
PyObject *temp;
if (PyArg_ParseTuple(args, "O:set_callback", &temp)) {
if (!PyCallable_Check(temp)) {
PyErr_SetString(PyExc_TypeError, "parameter must be callable");
return NULL;
}
Py_XINCREF(temp); /* Add a reference to new callback */
Py_XDECREF(my_callback); /* Dispose of previous callback */
my_callback = temp; /* Remember new callback */
/* Boilerplate to return "None" */
Py_INCREF(Py_None);
result = Py_None;
}
return result;
}
此函数必须注册采用解释器使用
METH_VARARGS
标志;这的描述在章节
模块方法表和初始化函数
。
PyArg_ParseTuple()
函数及其自变量的文档化在章节
提取扩展函数中的参数
.
宏
Py_XINCREF()
and
Py_XDECREF()
increment/decrement the reference count of an object and are safe in the presence of
NULL
pointers (but note that
temp
will not be
NULL
in this context). More info on them in section
引用计数
.
Later, when it is time to call the function, you call the C function
PyObject_CallObject()
. This function has two arguments, both pointers to arbitrary Python objects: the Python function, and the argument list. The argument list must always be a tuple object, whose length is the number of arguments. To call the Python function with no arguments, pass in NULL, or an empty tuple; to call it with one argument, pass a singleton tuple.
Py_BuildValue()
returns a tuple when its format string consists of zero or more format codes between parentheses. For example:
int arg;
PyObject *arglist;
PyObject *result;
...
arg = 123;
...
/* Time to call the callback */
arglist = Py_BuildValue("(i)", arg);
result = PyObject_CallObject(my_callback, arglist);
Py_DECREF(arglist);
PyObject_CallObject()
returns a Python object pointer: this is the return value of the Python function.
PyObject_CallObject()
is “reference-count-neutral” with respect to its arguments. In the example a new tuple was created to serve as the argument list, which is
Py_DECREF()
-ed immediately after the
PyObject_CallObject()
调用。
The return value of
PyObject_CallObject()
is “new”: either it is a brand new object, or it is an existing object whose reference count has been incremented. So, unless you want to save it in a global variable, you should somehow
Py_DECREF()
the result, even (especially!) if you are not interested in its value.
Before you do this, however, it is important to check that the return value isn’t
NULL
. If it is, the Python function terminated by raising an exception. If the C code that called
PyObject_CallObject()
is called from Python, it should now return an error indication to its Python caller, so the interpreter can print a stack trace, or the calling Python code can handle the exception. If this is not possible or desirable, the exception should be cleared by calling
PyErr_Clear()
。例如:
if (result == NULL)
return NULL; /* Pass error back */
...use result...
Py_DECREF(result);
Depending on the desired interface to the Python callback function, you may also have to provide an argument list to
PyObject_CallObject()
. In some cases the argument list is also provided by the Python program, through the same interface that specified the callback function. It can then be saved and used in the same manner as the function object. In other cases, you may have to construct a new tuple to pass as the argument list. The simplest way to do this is to call
Py_BuildValue()
. For example, if you want to pass an integral event code, you might use the following code:
PyObject *arglist;
...
arglist = Py_BuildValue("(l)", eventcode);
result = PyObject_CallObject(my_callback, arglist);
Py_DECREF(arglist);
if (result == NULL)
return NULL; /* Pass error back */
/* Here maybe use the result */
Py_DECREF(result);
Note the placement of
Py_DECREF(arglist)
immediately after the call, before the error check! Also note that strictly speaking this code is not complete:
Py_BuildValue()
may run out of memory, and this should be checked.
You may also call a function with keyword arguments by using
PyObject_Call()
, which supports arguments and keyword arguments. As in the above example, we use
Py_BuildValue()
to construct the dictionary.
PyObject *dict;
...
dict = Py_BuildValue("{s:i}", "name", val);
result = PyObject_Call(my_callback, NULL, dict);
Py_DECREF(dict);
if (result == NULL)
return NULL; /* Pass error back */
/* Here maybe use the result */
Py_DECREF(result);
PyArg_ParseTuple()
函数的声明如下:
int PyArg_ParseTuple(PyObject *arg, const char *format, ...);
arg argument must be a tuple object containing an argument list passed from Python to a C function. The format argument must be a format string, whose syntax is explained in 解析自变量和构建值 in the Python/C API Reference Manual. The remaining arguments must be addresses of variables whose type is determined by the format string.
Note that while
PyArg_ParseTuple()
checks that the Python arguments have the required types, it cannot check the validity of the addresses of C variables passed to the call: if you make mistakes there, your code will probably crash or at least overwrite random bits in memory. So be careful!
Note that any Python object references which are provided to the caller are borrowed references; do not decrement their reference count!
一些范例调用:
#define PY_SSIZE_T_CLEAN /* Make "s#" use Py_ssize_t rather than int. */
#include <Python.h>
int ok;
int i, j;
long k, l;
const char *s;
Py_ssize_t size;
ok = PyArg_ParseTuple(args, ""); /* No arguments */
/* Python call: f() */
ok = PyArg_ParseTuple(args, "s", &s); /* A string */
/* Possible Python call: f('whoops!') */
ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
/* Possible Python call: f(1, 2, 'three') */
ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
/* A pair of ints and a string, whose size is also returned */
/* Possible Python call: f((1, 2), 'three') */
{
const char *file;
const char *mode = "r";
int bufsize = 0;
ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
/* A string, and optionally another string and an integer */
/* Possible Python calls:
f('spam')
f('spam', 'w')
f('spam', 'wb', 100000) */
}
{
int left, top, right, bottom, h, v;
ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
&left, &top, &right, &bottom, &h, &v);
/* A rectangle and a point */
/* Possible Python call:
f(((0, 0), (400, 300)), (10, 10)) */
}
{
Py_complex c;
ok = PyArg_ParseTuple(args, "D:myfunction", &c);
/* a complex, also providing a function name for errors */
/* Possible Python call: myfunction(1+2j) */
}
PyArg_ParseTupleAndKeywords()
函数的声明如下:
int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict,
const char *format, char *kwlist[], ...);
arg
and
format
parameters are identical to those of the
PyArg_ParseTuple()
function. The
kwdict
parameter is the dictionary of keywords received as the third parameter from the Python runtime. The
kwlist
parameter is a
NULL
-terminated list of strings which identify the parameters; the names are matched with the type information from
format
from left to right. On success,
PyArg_ParseTupleAndKeywords()
returns true, otherwise it returns false and raises an appropriate exception.
注意
Nested tuples cannot be parsed when using keyword arguments! Keyword parameters passed in which are not present in the
kwlist
will cause
TypeError
被引发。
Here is an example module which uses keywords, based on an example by Geoff Philbrick ( philbrick @ hks . com ):
#include "Python.h"
static PyObject *
keywdarg_parrot(PyObject *self, PyObject *args, PyObject *keywds)
{
int voltage;
char *state = "a stiff";
char *action = "voom";
char *type = "Norwegian Blue";
static char *kwlist[] = {"voltage", "state", "action", "type", NULL};
if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist,
&voltage, &state, &action, &type))
return NULL;
printf("-- This parrot wouldn't %s if you put %i Volts through it.\n",
action, voltage);
printf("-- Lovely plumage, the %s -- It's %s!\n", type, state);
Py_RETURN_NONE;
}
static PyMethodDef keywdarg_methods[] = {
/* The cast of the function is necessary since PyCFunction values
* only take two PyObject* parameters, and keywdarg_parrot() takes
* three.
*/
{"parrot", (PyCFunction)keywdarg_parrot, METH_VARARGS | METH_KEYWORDS,
"Print a lovely skit to standard output."},
{NULL, NULL, 0, NULL} /* sentinel */
};
static struct PyModuleDef keywdargmodule = {
PyModuleDef_HEAD_INIT,
"keywdarg",
NULL,
-1,
keywdarg_methods
};
PyMODINIT_FUNC
PyInit_keywdarg(void)
{
return PyModule_Create(&keywdargmodule);
}
此函数搭档
PyArg_ParseTuple()
。它的声明如下:
PyObject *Py_BuildValue(const char *format, ...);
It recognizes a set of format units similar to the ones recognized by
PyArg_ParseTuple()
, but the arguments (which are input to the function, not output) must not be pointers, just values. It returns a new Python object, suitable for returning from a C function called from Python.
One difference with
PyArg_ParseTuple()
: while the latter requires its first argument to be a tuple (since Python argument lists are always represented as tuples internally),
Py_BuildValue()
does not always build a tuple. It builds a tuple only if its format string contains two or more format units. If the format string is empty, it returns
None
; if it contains exactly one format unit, it returns whatever object is described by that format unit. To force it to return a tuple of size 0 or one, parenthesize the format string.
Examples (to the left the call, to the right the resulting Python value):
Py_BuildValue("") None
Py_BuildValue("i", 123) 123
Py_BuildValue("iii", 123, 456, 789) (123, 456, 789)
Py_BuildValue("s", "hello") 'hello'
Py_BuildValue("y", "hello") b'hello'
Py_BuildValue("ss", "hello", "world") ('hello', 'world')
Py_BuildValue("s#", "hello", 4) 'hell'
Py_BuildValue("y#", "hello", 4) b'hell'
Py_BuildValue("()") ()
Py_BuildValue("(i)", 123) (123,)
Py_BuildValue("(ii)", 123, 456) (123, 456)
Py_BuildValue("(i,i)", 123, 456) (123, 456)
Py_BuildValue("[i,i]", 123, 456) [123, 456]
Py_BuildValue("{s:i,s:i}",
"abc", 123, "def", 456) {'abc': 123, 'def': 456}
Py_BuildValue("((ii)(ii)) (ii)",
1, 2, 3, 4, 5, 6) (((1, 2), (3, 4)), (5, 6))
In languages like C or C++, the programmer is responsible for dynamic allocation and deallocation of memory on the heap. In C, this is done using the functions
malloc()
and
free()
. In C++, the operators
new
and
delete
are used with essentially the same meaning and we’ll restrict the following discussion to the C case.
Every block of memory allocated with
malloc()
should eventually be returned to the pool of available memory by exactly one call to
free()
. It is important to call
free()
at the right time. If a block’s address is forgotten but
free()
is not called for it, the memory it occupies cannot be reused until the program terminates. This is called a
memory leak
. On the other hand, if a program calls
free()
for a block and then continues to use the block, it creates a conflict with re-use of the block through another
malloc()
call. This is called
using freed memory
. It has the same bad consequences as referencing uninitialized data — core dumps, wrong results, mysterious crashes.
Common causes of memory leaks are unusual paths through the code. For instance, a function may allocate a block of memory, do some calculation, and then free the block again. Now a change in the requirements for the function may add a test to the calculation that detects an error condition and can return prematurely from the function. It’s easy to forget to free the allocated memory block when taking this premature exit, especially when it is added later to the code. Such leaks, once introduced, often go undetected for a long time: the error exit is taken only in a small fraction of all calls, and most modern machines have plenty of virtual memory, so the leak only becomes apparent in a long-running process that uses the leaking function frequently. Therefore, it’s important to prevent leaks from happening by having a coding convention or strategy that minimizes this kind of errors.
Since Python makes heavy use of
malloc()
and
free()
, it needs a strategy to avoid memory leaks as well as the use of freed memory. The chosen method is called
reference counting
. The principle is simple: every object contains a counter, which is incremented when a reference to the object is stored somewhere, and which is decremented when a reference to it is deleted. When the counter reaches zero, the last reference to the object has been deleted and the object is freed.
An alternative strategy is called
automatic garbage collection
. (Sometimes, reference counting is also referred to as a garbage collection strategy, hence my use of “automatic” to distinguish the two.) The big advantage of automatic garbage collection is that the user doesn’t need to call
free()
explicitly. (Another claimed advantage is an improvement in speed or memory usage — this is no hard fact however.) The disadvantage is that for C, there is no truly portable automatic garbage collector, while reference counting can be implemented portably (as long as the functions
malloc()
and
free()
are available — which the C Standard guarantees). Maybe some day a sufficiently portable automatic garbage collector will be available for C. Until then, we’ll have to live with reference counts.
While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are the weakness of garbage collection implemented using only reference counting. Reference cycles consist of objects which contain (possibly indirect) references to themselves, so that each object in the cycle has a reference count which is non-zero. Typical reference counting implementations are not able to reclaim the memory belonging to any objects in a reference cycle, or referenced from the objects in the cycle, even though there are no further references to the cycle itself.
The cycle detector is able to detect garbage cycles and can reclaim them.
gc
module exposes a way to run the detector (the
collect()
function), as well as configuration interfaces and the ability to disable the detector at runtime. The cycle detector is considered an optional component; though it is included by default, it can be disabled at build time using the
--without-cycle-gc
option to the
configure
script on Unix platforms (including Mac OS X). If the cycle detector is disabled in this way, the
gc
module will not be available.
There are two macros,
Py_INCREF(x)
and
Py_DECREF(x)
, which handle the incrementing and decrementing of the reference count.
Py_DECREF()
also frees the object when the count reaches zero. For flexibility, it doesn’t call
free()
directly — rather, it makes a call through a function pointer in the object’s
type object
. For this purpose (and others), every object also contains a pointer to its type object.
The big question now remains: when to use
Py_INCREF(x)
and
Py_DECREF(x)
? Let’s first introduce some terms. Nobody “owns” an object; however, you can
own a reference
to an object. An object’s reference count is now defined as the number of owned references to it. The owner of a reference is responsible for calling
Py_DECREF()
when the reference is no longer needed. Ownership of a reference can be transferred. There are three ways to dispose of an owned reference: pass it on, store it, or call
Py_DECREF()
. Forgetting to dispose of an owned reference creates a memory leak.
It is also possible to
borrow
[2]
a reference to an object. The borrower of a reference should not call
Py_DECREF()
. The borrower must not hold on to the object longer than the owner from which it was borrowed. Using a borrowed reference after the owner has disposed of it risks using freed memory and should be avoided completely
[3]
.
The advantage of borrowing over owning a reference is that you don’t need to take care of disposing of the reference on all possible paths through the code — in other words, with a borrowed reference you don’t run the risk of leaking when a premature exit is taken. The disadvantage of borrowing over owning is that there are some subtle situations where in seemingly correct code a borrowed reference can be used after the owner from which it was borrowed has in fact disposed of it.
A borrowed reference can be changed into an owned reference by calling
Py_INCREF()
. This does not affect the status of the owner from which the reference was borrowed — it creates a new owned reference, and gives full owner responsibilities (the new owner must dispose of the reference properly, as well as the previous owner).
Whenever an object reference is passed into or out of a function, it is part of the function’s interface specification whether ownership is transferred with the reference or not.
Most functions that return a reference to an object pass on ownership with the reference. In particular, all functions whose function it is to create a new object, such as
PyLong_FromLong()
and
Py_BuildValue()
, pass ownership to the receiver. Even if the object is not actually new, you still receive ownership of a new reference to that object. For instance,
PyLong_FromLong()
maintains a cache of popular values and can return a reference to a cached item.
Many functions that extract objects from other objects also transfer ownership with the reference, for instance
PyObject_GetAttrString()
. The picture is less clear, here, however, since a few common routines are exceptions:
PyTuple_GetItem()
,
PyList_GetItem()
,
PyDict_GetItem()
,和
PyDict_GetItemString()
all return references that you borrow from the tuple, list or dictionary.
函数
PyImport_AddModule()
also returns a borrowed reference, even though it may actually create the object it returns: this is possible because an owned reference to the object is stored in
sys.modules
.
When you pass an object reference into another function, in general, the function borrows the reference from you — if it needs to store it, it will use
Py_INCREF()
to become an independent owner. There are exactly two important exceptions to this rule:
PyTuple_SetItem()
and
PyList_SetItem()
. These functions take over ownership of the item passed to them — even if they fail! (Note that
PyDict_SetItem()
and friends don’t take over ownership — they are “normal.”)
When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns. Only when such a borrowed reference must be stored or passed on, it must be turned into an owned reference by calling
Py_INCREF()
.
The object reference returned from a C function that is called from Python must be an owned reference — ownership is transferred from the function to its caller.
There are a few situations where seemingly harmless use of a borrowed reference can lead to problems. These all have to do with implicit invocations of the interpreter, which can cause the owner of a reference to dispose of it.
The first and most important case to know about is using
Py_DECREF()
on an unrelated object while borrowing a reference to a list item. For instance:
void
bug(PyObject *list)
{
PyObject *item = PyList_GetItem(list, 0);
PyList_SetItem(list, 1, PyLong_FromLong(0L));
PyObject_Print(item, stdout, 0); /* BUG! */
}
This function first borrows a reference to
list[0]
, then replaces
list[1]
采用值
0
, and finally prints the borrowed reference. Looks harmless, right? But it’s not!
Let’s follow the control flow into
PyList_SetItem()
. The list owns references to all its items, so when item 1 is replaced, it has to dispose of the original item 1. Now let’s suppose the original item 1 was an instance of a user-defined class, and let’s further suppose that the class defined a
__del__()
method. If this class instance has a reference count of 1, disposing of it will call its
__del__()
方法。
由于它是以 Python 编写的,因此
__del__()
method can execute arbitrary Python code. Could it perhaps do something to invalidate the reference to
item
in
bug()
? You bet! Assuming that the list passed into
bug()
is accessible to the
__del__()
method, it could execute a statement to the effect of
del
list[0]
, and assuming this was the last reference to that object, it would free the memory associated with it, thereby invalidating
item
.
The solution, once you know the source of the problem, is easy: temporarily increment the reference count. The correct version of the function reads:
void
no_bug(PyObject *list)
{
PyObject *item = PyList_GetItem(list, 0);
Py_INCREF(item);
PyList_SetItem(list, 1, PyLong_FromLong(0L));
PyObject_Print(item, stdout, 0);
Py_DECREF(item);
}
This is a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his
__del__()
methods would fail…
The second case of problems with a borrowed reference is a variant involving threads. Normally, multiple threads in the Python interpreter can’t get in each other’s way, because there is a global lock protecting Python’s entire object space. However, it is possible to temporarily release this lock using the macro
Py_BEGIN_ALLOW_THREADS
, and to re-acquire it using
Py_END_ALLOW_THREADS
. This is common around blocking I/O calls, to let other threads use the processor while waiting for the I/O to complete. Obviously, the following function has the same problem as the previous one:
void
bug(PyObject *list)
{
PyObject *item = PyList_GetItem(list, 0);
Py_BEGIN_ALLOW_THREADS
...some blocking I/O call...
Py_END_ALLOW_THREADS
PyObject_Print(item, stdout, 0); /* BUG! */
}
In general, functions that take object references as arguments do not expect you to pass them NULL pointers, and will dump core (or cause later core dumps) if you do so. Functions that return object references generally return NULL only to indicate that an exception occurred. The reason for not testing for NULL arguments is that functions often pass the objects they receive on to other function — if each function were to test for NULL , there would be a lot of redundant tests and the code would run more slowly.
It is better to test for
NULL
only at the “source:” when a pointer that may be
NULL
is received, for example, from
malloc()
or from a function that may raise an exception.
宏
Py_INCREF()
and
Py_DECREF()
do not check for
NULL
pointers — however, their variants
Py_XINCREF()
and
Py_XDECREF()
do.
The macros for checking for a particular object type (
Pytype_Check()
) don’t check for
NULL
pointers — again, there is much code that calls several of these in a row to test an object against various different expected types, and this would generate redundant tests. There are no variants with
NULL
checking.
The C function calling mechanism guarantees that the argument list passed to C functions (
args
in the examples) is never
NULL
— in fact it guarantees that it is always a tuple
[4]
.
It is a severe error to ever let a NULL pointer “escape” to the Python user.
It is possible to write extension modules in C++. Some restrictions apply. If the main program (the Python interpreter) is compiled and linked by the C compiler, global or static objects with constructors cannot be used. This is not a problem if the main program is linked by the C++ compiler. Functions that will be called by the Python interpreter (in particular, module initialization functions) have to be declared using
extern
"C"
. It is unnecessary to enclose the Python header files in
extern
"C"
{...}
— they use this form already if the symbol
__cplusplus
is defined (all recent C++ compilers define this symbol).
Many extension modules just provide new functions and types to be used from Python, but sometimes the code in an extension module can be useful for other extension modules. For example, an extension module could implement a type “collection” which works like lists without order. Just like the standard Python list type has a C API which permits extension modules to create and manipulate lists, this new collection type should have a set of C functions for direct manipulation from other extension modules.
At first sight this seems easy: just write the functions (without declaring them
static
, of course), provide an appropriate header file, and document the C API. And in fact this would work if all extension modules were always linked statically with the Python interpreter. When modules are used as shared libraries, however, the symbols defined in one module may not be visible to another module. The details of visibility depend on the operating system; some systems use one global namespace for the Python interpreter and all extension modules (Windows, for example), whereas others require an explicit list of imported symbols at module link time (AIX is one example), or offer a choice of different strategies (most Unices). And even if symbols are globally visible, the module whose functions one wishes to call might not have been loaded yet!
Portability therefore requires not to make any assumptions about symbol visibility. This means that all symbols in extension modules should be declared
static
, except for the module’s initialization function, in order to avoid name clashes with other extension modules (as discussed in section
模块方法表和初始化函数
). And it means that symbols that
should
be accessible from other extension modules must be exported in a different way.
Python provides a special mechanism to pass C-level information (pointers) from one extension module to another one: Capsules. A Capsule is a Python data type which stores a pointer (
void
*
). Capsules can only be created and accessed via their C API, but they can be passed around like any other Python object. In particular, they can be assigned to a name in an extension module’s namespace. Other extension modules can then import this module, retrieve the value of this name, and then retrieve the pointer from the Capsule.
There are many ways in which Capsules can be used to export the C API of an extension module. Each function could get its own Capsule, or all C API pointers could be stored in an array whose address is published in a Capsule. And the various tasks of storing and retrieving the pointers can be distributed in different ways between the module providing the code and the client modules.
Whichever method you choose, it’s important to name your Capsules properly. The function
PyCapsule_New()
takes a name parameter (
const
char
*
); you’re permitted to pass in a
NULL
name, but we strongly encourage you to specify a name. Properly named Capsules provide a degree of runtime type-safety; there is no feasible way to tell one unnamed Capsule from another.
In particular, Capsules used to expose C APIs should be given a name following this convention:
modulename.attributename
The convenience function
PyCapsule_Import()
makes it easy to load a C API provided via a Capsule, but only if the Capsule’s name matches this convention. This behavior gives C API users a high degree of certainty that the Capsule they load contains the correct C API.
The following example demonstrates an approach that puts most of the burden on the writer of the exporting module, which is appropriate for commonly used library modules. It stores all C API pointers (just one in the example!) in an array of
void
pointers which becomes the value of a Capsule. The header file corresponding to the module provides a macro that takes care of importing the module and retrieving its C API pointers; client modules only have to call this macro before accessing the C API.
The exporting module is a modification of the
spam
module from section
简单范例
. The function
spam.system()
does not call the C library function
system()
directly, but a function
PySpam_System()
, which would of course do something more complicated in reality (such as adding “spam” to every command). This function
PySpam_System()
is also exported to other extension modules.
函数
PySpam_System()
是纯 C 函数,声明
static
like everything else:
static int
PySpam_System(const char *command)
{
return system(command);
}
函数
spam_system()
is modified in a trivial way:
static PyObject *
spam_system(PyObject *self, PyObject *args)
{
const char *command;
int sts;
if (!PyArg_ParseTuple(args, "s", &command))
return NULL;
sts = PySpam_System(command);
return PyLong_FromLong(sts);
}
In the beginning of the module, right after the line
#include "Python.h"
two more lines must be added:
#define SPAM_MODULE
#include "spammodule.h"
#define
is used to tell the header file that it is being included in the exporting module, not a client module. Finally, the module’s initialization function must take care of initializing the C API pointer array:
PyMODINIT_FUNC
PyInit_spam(void)
{
PyObject *m;
static void *PySpam_API[PySpam_API_pointers];
PyObject *c_api_object;
m = PyModule_Create(&spammodule);
if (m == NULL)
return NULL;
/* Initialize the C API pointer array */
PySpam_API[PySpam_System_NUM] = (void *)PySpam_System;
/* Create a Capsule containing the API pointer array's address */
c_api_object = PyCapsule_New((void *)PySpam_API, "spam._C_API", NULL);
if (c_api_object != NULL)
PyModule_AddObject(m, "_C_API", c_api_object);
return m;
}
注意,
PySpam_API
is declared
static
; otherwise the pointer array would disappear when
PyInit_spam()
terminates!
The bulk of the work is in the header file
spammodule.h
,看起来像这样:
#ifndef Py_SPAMMODULE_H
#define Py_SPAMMODULE_H
#ifdef __cplusplus
extern "C" {
#endif
/* Header file for spammodule */
/* C API functions */
#define PySpam_System_NUM 0
#define PySpam_System_RETURN int
#define PySpam_System_PROTO (const char *command)
/* Total number of C API pointers */
#define PySpam_API_pointers 1
#ifdef SPAM_MODULE
/* This section is used when compiling spammodule.c */
static PySpam_System_RETURN PySpam_System PySpam_System_PROTO;
#else
/* This section is used in modules that use spammodule's API */
static void **PySpam_API;
#define PySpam_System \
(*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM])
/* Return -1 on error, 0 on success.
* PyCapsule_Import will set an exception if there's an error.
*/
static int
import_spam(void)
{
PySpam_API = (void **)PyCapsule_Import("spam._C_API", 0);
return (PySpam_API != NULL) ? 0 : -1;
}
#endif
#ifdef __cplusplus
}
#endif
#endif /* !defined(Py_SPAMMODULE_H) */
All that a client module must do in order to have access to the function
PySpam_System()
是要调用函数 (或者而不是宏)
import_spam()
在其初始化函数中:
PyMODINIT_FUNC
PyInit_client(void)
{
PyObject *m;
m = PyModule_Create(&clientmodule);
if (m == NULL)
return NULL;
if (import_spam() < 0)
return NULL;
/* additional initialization can happen here */
return m;
}
The main disadvantage of this approach is that the file
spammodule.h
is rather complicated. However, the basic structure is the same for each function that is exported, so it has to be learned only once.
Finally it should be mentioned that Capsules offer additional functionality, which is especially useful for memory allocation and deallocation of the pointer stored in a Capsule. The details are described in the Python/C API Reference Manual in the section
胶囊
and in the implementation of Capsules (files
Include/pycapsule.h
and
Objects/pycapsule.c
在 Python 源代码分发中)。
脚注
| [1] |
此函数接口已存在于标准模块
os
— 它被选作简单明了范例。
|
| [2] | The metaphor of “borrowing” a reference is not completely correct: the owner still has a copy of the reference. |
| [3] | Checking that the reference count is at least 1 does not work — the reference count itself could be in freed memory and may thus be reused for another object! |
| [4] | These guarantees don’t hold when you use the “old” style calling convention — this is still found in much existing code. |