As usual, Python’s standard library received a number of enhancements and bug fixes. Here’s a partial list of the most notable changes, sorted alphabetically by module name. Consult the
Misc/NEWS
file in the source tree for a more complete list of changes, or look through the CVS logs for all the details.
-
The
asyncore
模块的
loop()
function now has a
count
parameter that lets you perform a limited number of passes through the polling loop. The default is still to loop forever.
-
The
base64
module now has more complete
RFC 3548
support for Base64, Base32, and Base16 encoding and decoding, including optional case folding and optional alternative alphabets. (Contributed by Barry Warsaw.)
-
The
bisect
module now has an underlying C implementation for improved performance. (Contributed by Dmitry Vasiliev.)
-
The CJKCodecs collections of East Asian codecs, maintained by Hye-Shik Chang, was integrated into 2.4. The new encodings are:
-
Chinese (PRC): gb2312, gbk, gb18030, big5hkscs, hz
-
Chinese (ROC): big5, cp950
-
-
Japanese: cp932, euc-jis-2004, euc-jp, euc-jisx0213, iso-2022-jp,
-
iso-2022-jp-1, iso-2022-jp-2, iso-2022-jp-3, iso-2022-jp-ext, iso-2022-jp-2004, shift-jis, shift-jisx0213, shift-jis-2004
-
Korean: cp949, euc-kr, johab, iso-2022-kr
-
Some other new encodings were added: HP Roman8, ISO_8859-11, ISO_8859-16, PCTP-154, and TIS-620.
-
The UTF-8 and UTF-16 codecs now cope better with receiving partial input. Previously the
StreamReader
class would try to read more data, making it impossible to resume decoding from the stream. The
read()
method will now return as much data as it can and future calls will resume decoding where previous ones left off. (Implemented by Walter Dörwald.)
-
There is a new
collections
module for various specialized collection datatypes. Currently it contains just one type,
deque
, a double-ended queue that supports efficiently adding and removing elements from either end:
>>> from collections import deque
>>> d = deque('ghi') # make a new deque with three items
>>> d.append('j') # add a new entry to the right side
>>> d.appendleft('f') # add a new entry to the left side
>>> d # show the representation of the deque
deque(['f', 'g', 'h', 'i', 'j'])
>>> d.pop() # return and remove the rightmost item
'j'
>>> d.popleft() # return and remove the leftmost item
'f'
>>> list(d) # list the contents of the deque
['g', 'h', 'i']
>>> 'h' in d # search the deque
True
Several modules, such as the
Queue
and
threading
modules, now take advantage of
collections.deque
for improved performance. (Contributed by Raymond Hettinger.)
-
The
ConfigParser
classes have been enhanced slightly. The
read()
method now returns a list of the files that were successfully parsed, and the
set()
方法引发
TypeError
if passed a
value
argument that isn’t a string. (Contributed by John Belmonte and David Goodger.)
-
The
curses
module now supports the ncurses extension
use_default_colors()
. On platforms where the terminal supports transparency, this makes it possible to use a transparent background. (Contributed by Jörg Lehmann.)
-
The
difflib
module now includes an
HtmlDiff
class that creates an HTML table showing a side by side comparison of two versions of a text. (Contributed by Dan Gass.)
-
The
email
package was updated to version 3.0, which dropped various deprecated APIs and removes support for Python versions earlier than 2.3. The 3.0 version of the package uses a new incremental parser for MIME messages, available in the
email.FeedParser
module. The new parser doesn’t require reading the entire message into memory, and doesn’t raise exceptions if a message is malformed; instead it records any problems in the
defect
attribute of the message. (Developed by Anthony Baxter, Barry Warsaw, Thomas Wouters, and others.)
-
The
heapq
module has been converted to C. The resulting tenfold improvement in speed makes the module suitable for handling high volumes of data. In addition, the module has two new functions
nlargest()
and
nsmallest()
that use heaps to find the N largest or smallest values in a dataset without the expense of a full sort. (Contributed by Raymond Hettinger.)
-
The
httplib
module now contains constants for HTTP status codes defined in various HTTP-related RFC documents. Constants have names such as
OK
,
CREATED
,
CONTINUE
,和
MOVED_PERMANENTLY
; use pydoc to get a full list. (Contributed by Andrew Eland.)
-
The
imaplib
module now supports IMAP’s THREAD command (contributed by Yves Dionne) and new
deleteacl()
and
myrights()
methods (contributed by Arnaud Mazin).
-
The
itertools
module gained a
groupby(iterable[, *func*])
函数。
iterable
is something that can be iterated over to return a stream of elements, and the optional
func
parameter is a function that takes an element and returns a key value; if omitted, the key is simply the element itself.
groupby()
then groups the elements into subsequences which have matching values of the key, and returns a series of 2-tuples containing the key value and an iterator over the subsequence.
Here’s an example to make this clearer. The
key
function simply returns whether a number is even or odd, so the result of
groupby()
is to return consecutive runs of odd or even numbers.
>>> import itertools
>>> L = [2, 4, 6, 7, 8, 9, 11, 12, 14]
>>> for key_val, it in itertools.groupby(L, lambda x: x % 2):
... print key_val, list(it)
...
0 [2, 4, 6]
1 [7]
0 [8]
1 [9, 11]
0 [12, 14]
>>>
groupby()
is typically used with sorted input. The logic for
groupby()
is similar to the Unix
uniq
filter which makes it handy for eliminating, counting, or identifying duplicate elements:
>>> word = 'abracadabra'
>>> letters = sorted(word) # Turn string into a sorted list of letters
>>> letters
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'c', 'd', 'r', 'r']
>>> for k, g in itertools.groupby(letters):
... print k, list(g)
...
a ['a', 'a', 'a', 'a', 'a']
b ['b', 'b']
c ['c']
d ['d']
r ['r', 'r']
>>> # List unique letters
>>> [k for k, g in groupby(letters)]
['a', 'b', 'c', 'd', 'r']
>>> # Count letter occurrences
>>> [(k, len(list(g))) for k, g in groupby(letters)]
[('a', 5), ('b', 2), ('c', 1), ('d', 1), ('r', 2)]
(Contributed by Hye-Shik Chang.)
-
itertools
also gained a function named
tee(iterator, N)
that returns
N
independent iterators that replicate
iterator
。若
N
is omitted, the default is 2.
>>> L = [1,2,3]
>>> i1, i2 = itertools.tee(L)
>>> i1,i2
(<itertools.tee object at 0x402c2080>, <itertools.tee object at 0x402c2090>)
>>> list(i1) # Run the first iterator to exhaustion
[1, 2, 3]
>>> list(i2) # Run the second iterator to exhaustion
[1, 2, 3]
注意,
tee()
has to keep copies of the values returned by the iterator; in the worst case, it may need to keep all of them. This should therefore be used carefully if the leading iterator can run far ahead of the trailing iterator in a long stream of inputs. If the separation is large, then you might as well use
list()
instead. When the iterators track closely with one another,
tee()
is ideal. Possible applications include bookmarking, windowing, or lookahead iterators. (Contributed by Raymond Hettinger.)
-
A number of functions were added to the
locale
module, such as
bind_textdomain_codeset()
to specify a particular encoding and a family of
l*gettext()
functions that return messages in the chosen encoding. (Contributed by Gustavo Niemeyer.)
-
Some keyword arguments were added to the
logging
package’s
basicConfig()
function to simplify log configuration. The default behavior is to log messages to standard error, but various keyword arguments can be specified to log to a particular file, change the logging format, or set the logging level. For example:
import logging
logging.basicConfig(filename='/var/log/application.log',
level=0, # Log all messages
format='%(levelname):%(process):%(thread):%(message)')
Other additions to the
logging
package include a
log(level, msg)
convenience method, as well as a
TimedRotatingFileHandler
class that rotates its log files at a timed interval. The module already had
RotatingFileHandler
, which rotated logs once the file exceeded a certain size. Both classes derive from a new
BaseRotatingHandler
class that can be used to implement other rotating handlers.
(Changes implemented by Vinay Sajip.)
-
The
marshal
module now shares interned strings on unpacking a data structure. This may shrink the size of certain pickle strings, but the primary effect is to make
.pyc
files significantly smaller. (Contributed by Martin von Löwis.)
-
The
nntplib
模块的
NNTP
class gained
description()
and
descriptions()
methods to retrieve newsgroup descriptions for a single group or for a range of groups. (Contributed by Jürgen A. Erhard.)
-
Two new functions were added to the
operator
模块,
attrgetter(attr)
and
itemgetter(index)
. Both functions return callables that take a single argument and return the corresponding attribute or item; these callables make excellent data extractors when used with
map()
or
sorted()
。例如:
>>> L = [('c', 2), ('d', 1), ('a', 4), ('b', 3)]
>>> map(operator.itemgetter(0), L)
['c', 'd', 'a', 'b']
>>> map(operator.itemgetter(1), L)
[2, 1, 4, 3]
>>> sorted(L, key=operator.itemgetter(1)) # Sort list by second tuple item
[('d', 1), ('c', 2), ('b', 3), ('a', 4)]
(Contributed by Raymond Hettinger.)
-
The
optparse
module was updated in various ways. The module now passes its messages through
gettext.gettext()
, making it possible to internationalize Optik’s help and error messages. Help messages for options can now include the string
'%default'
, which will be replaced by the option’s default value. (Contributed by Greg Ward.)
-
The long-term plan is to deprecate the
rfc822
module in some future Python release in favor of the
email
package. To this end, the
email.Utils.formatdate
function has been changed to make it usable as a replacement for
rfc822.formatdate()
. You may want to write new e-mail processing code with this in mind. (Change implemented by Anthony Baxter.)
-
新的
urandom(n)
function was added to the
os
module, returning a string containing
n
bytes of random data. This function provides access to platform-specific sources of randomness such as
/dev/urandom
on Linux or the Windows CryptoAPI. (Contributed by Trevor Perrin.)
-
Another new function:
os.path.lexists(path)
returns true if the file specified by
path
exists, whether or not it’s a symbolic link. This differs from the existing
os.path.exists(path)
function, which returns false if
path
is a symlink that points to a destination that doesn’t exist. (Contributed by Beni Cherniavsky.)
-
新的
getsid()
function was added to the
posix
module that underlies the
os
module. (Contributed by J. Raynor.)
-
The
poplib
module now supports POP over SSL. (Contributed by Hector Urtubia.)
-
The
profile
module can now profile C extension functions. (Contributed by Nick Bastin.)
-
The
random
module has a new method called
getrandbits(N)
that returns a long integer
N
bits in length. The existing
randrange()
method now uses
getrandbits()
where appropriate, making generation of arbitrarily large random numbers more efficient. (Contributed by Raymond Hettinger.)
-
The regular expression language accepted by the
re
module was extended with simple conditional expressions, written as
(?(group)A|B)
.
group
is either a numeric group ID or a group name defined with
(?P<group>...)
earlier in the expression. If the specified group matched, the regular expression pattern
A
will be tested against the string; if the group didn’t match, the pattern
B
will be used instead. (Contributed by Gustavo Niemeyer.)
-
The
re
module is also no longer recursive, thanks to a massive amount of work by Gustavo Niemeyer. In a recursive regular expression engine, certain patterns result in a large amount of C stack space being consumed, and it was possible to overflow the stack. For example, if you matched a 30000-byte string of
a
characters against the expression
(a|b)+
, one stack frame was consumed per character. Python 2.3 tried to check for stack overflow and raise a
RuntimeError
exception, but certain patterns could sidestep the checking and if you were unlucky Python could segfault. Python 2.4’s regular expression engine can match this pattern without problems.
-
The
signal
module now performs tighter error-checking on the parameters to the
signal.signal()
function. For example, you can’t set a handler on the
SIGKILL
signal; previous versions of Python would quietly accept this, but 2.4 will raise a
RuntimeError
异常。
-
Two new functions were added to the
socket
模块。
socketpair()
returns a pair of connected sockets and
getservbyport(port)
looks up the service name for a given port number. (Contributed by Dave Cole and Barry Warsaw.)
-
The
sys.exitfunc()
function has been deprecated. Code should be using the existing
atexit
module, which correctly handles calling multiple exit functions. Eventually
sys.exitfunc()
will become a purely internal interface, accessed only by
atexit
.
-
The
tarfile
module now generates GNU-format tar files by default. (Contributed by Lars Gustäbel.)
-
The
threading
module now has an elegantly simple way to support thread-local data. The module contains a
local
class whose attribute values are local to different threads.
import threading
data = threading.local()
data.number = 42
data.url = ('www.python.org', 80)
Other threads can assign and retrieve their own values for the
number
and
url
attributes. You can subclass
local
to initialize attributes or to add methods. (Contributed by Jim Fulton.)
-
The
timeit
module now automatically disables periodic garbage collection during the timing loop. This change makes consecutive timings more comparable. (Contributed by Raymond Hettinger.)
-
The
weakref
module now supports a wider variety of objects including Python functions, class instances, sets, frozensets, deques, arrays, files, sockets, and regular expression pattern objects. (Contributed by Raymond Hettinger.)
-
The
xmlrpclib
module now supports a multi-call extension for transmitting multiple XML-RPC calls in a single HTTP operation. (Contributed by Brian Quinlan.)
-
The
mpz
,
rotor
,和
xreadlines
modules have been removed.
doctest
¶
The
doctest
module underwent considerable refactoring thanks to Edward Loper and Tim Peters. Testing can still be as simple as running
doctest.testmod()
, but the refactorings allow customizing the module’s operation in various ways
新的
DocTestFinder
class extracts the tests from a given object’s docstrings:
def f (x, y):
""">>> f(2,2)
4
>>> f(3,2)
6
"""
return x*y
finder = doctest.DocTestFinder()
# Get list of DocTest instances
tests = finder.find(f)
新的
DocTestRunner
class then runs individual tests and can produce a summary of the results:
runner = doctest.DocTestRunner()
for t in tests:
tried, failed = runner.run(t)
runner.summarize(verbose=1)
The above example produces the following output:
1 items passed all tests:
2 tests in f
2 tests in 1 items.
2 passed and 0 failed.
Test passed.
DocTestRunner
uses an instance of the
OutputChecker
class to compare the expected output with the actual output. This class takes a number of different flags that customize its behaviour; ambitious users can also write a completely new subclass of
OutputChecker
.
The default output checker provides a number of handy features. For example, with the
doctest.ELLIPSIS
option flag, an ellipsis (
...
) in the expected output matches any substring, making it easier to accommodate outputs that vary in minor ways:
def o (n):
""">>> o(1)
<__main__.C instance at 0x...>
>>>
"""
Another special string,
<BLANKLINE>
, matches a blank line:
def p (n):
""">>> p(1)
<BLANKLINE>
>>>
"""
Another new capability is producing a diff-style display of the output by specifying the
doctest.REPORT_UDIFF
(unified diffs),
doctest.REPORT_CDIFF
(context diffs), or
doctest.REPORT_NDIFF
(delta-style) option flags. For example:
def g (n):
""">>> g(4)
here
is
a
lengthy
>>>"""
L = 'here is a rather lengthy list of words'.split()
for word in L[:n]:
print word
Running the above function’s tests with
doctest.REPORT_UDIFF
specified, you get the following output:
**********************************************************************
File "t.py", line 15, in g
Failed example:
g(4)
Differences (unified diff with -expected +actual):
@@ -2,3 +2,3 @@
is
a
-lengthy
+rather
**********************************************************************