`re` — 正则表达式运算 ¶

源代码： Lib/re/

此模块提供类似于在 Perl 中找到那些正则表达式匹配操作。

要搜索的模式和字符串两者可以是 Unicode 字符串 ( str ) 及 8 位字符串 ( bytes ). However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a byte pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string.

Regular expressions use the backslash character ( '\' ) to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\\\' as the pattern string, because the regular expression must be \\ , and each backslash must be expressed as \\ inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError . This behaviour will happen even if it is a valid escape sequence for a regular expression.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r' . So r"\n" is a two-character string containing '\' and 'n' ，而 "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

It is important to note that most regular expression operations are available as module-level functions and methods on compiled regular expressions . The functions are shortcuts that don’t require you to compile a regex object first, but miss some fine-tuning parameters.

另请参阅

第 3 方 regex 模块，其拥有的 API 兼容标准库 re 模块，但提供额外功能及更彻底的 Unicode 支持。

正则表达式语法 ¶

A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression (or if a given regular expression matches a particular string, which comes down to the same thing).

Regular expressions can be concatenated to form new regular expressions; if A and B are both regular expressions, then AB is also a regular expression. In general, if a string p 匹配 A and another string q 匹配 B ，字符串 pq will match AB. This holds unless A or B contain low precedence operations; boundary conditions between A and B ; or have numbered group references. Thus, complex expressions can easily be constructed from simpler primitive expressions like the ones described here. For details of the theory and implementation of regular expressions, consult the Friedl book [Frie09] , or almost any textbook about compiler construction.

A brief explanation of the format of regular expressions follows. For further information and a gentler presentation, consult the 正则表达式怎么样 .

Regular expressions can contain both special and ordinary characters. Most ordinary characters, like 'A' , 'a' ，或 '0' , are the simplest regular expressions; they simply match themselves. You can concatenate ordinary characters, so last matches the string 'last' . (In the rest of this section, we’ll write RE’s in this special style , usually without quotes, and strings to be matched 'in single quotes' )。

Some characters, like '|' or '(' , are special. Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted.

Repetition operators or quantifiers ( * , + , ? , {m,n} , etc) cannot be directly nested. This avoids ambiguity with the non-greedy modifier suffix ? , and with other modifiers in other implementations. To apply a second repetition to an inner repetition, parentheses may be used. For example, the expression (?:a{6})* matches any multiple of six 'a' 字符。

The special characters are:

.

^

$

*

+

?

*? , +? , ??

*+ , ++ , ?+

{m}
{m,n}
{m,n}?
{m,n}+

\

[]

|

(...)

(?...)
(?aiLmsux)

(?:...)
(?aiLmsux-imsx:...)
(?>...)

(?P<name>...)

Context of reference to group “quote”	Ways to reference it
in the same pattern itself	`(?P=quote)` (as shown) `\1`
when processing match object m	`m.group('quote')` `m.end('quote')` (etc.)
in a string passed to the repl 自变量 `re.sub()`	`\g<quote>` `\g<1>` `\1`

(?P=name)

(?#...)

(?=...)

(?!...)

(?<=...)

(?<!...)
(?(id/name)yes-pattern|no-pattern)

\number

\A

\b

\B

\d
re. findall ( pattern , string , flags = 0 ) ¶

re. finditer ( pattern , string , flags = 0 ) ¶

re. sub ( pattern , repl , string , count = 0 , flags = 0 ) ¶

模式可以是字符串或模式对象 .

可选自变量 count is the maximum number of pattern occurrences to be replaced; count must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous empty match, so sub('x*', '-', 'abxd') 返回 '-a-b--d-' .

In string-type repl arguments, in addition to the character escapes and backreferences described above, \g<name> will use the substring matched by the group named name , as defined by the (?P<name>...) 句法。 \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2 , but isn’t ambiguous in a replacement such as \g<2>0 . \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0' . The backreference \g<0> substitutes in the entire substring matched by the RE.

3.1 版改变：添加可选 flags 自变量。

3.5 版改变：不匹配组以空字符串替换。

3.6 版改变：未知转义在 pattern 组成通过 '\' 和 ASCII 字母现在是错误的。

3.7 版改变：未知转义在 repl 组成通过 '\' 和 ASCII 字母现在是错误的。

3.7 版改变： Empty matches for the pattern are replaced when adjacent to a previous non-empty match.

Deprecated since version 3.11: Group id containing anything except ASCII digits. Group names containing non-ASCII characters in bytes replacement strings.

re. subn ( pattern , repl , string , count = 0 , flags = 0 ) ¶

re. escape ( pattern ) ¶

This function must not be used for the replacement string in sub() and subn() , only backslashes should be escaped. For example:

>>> digits_re = r'\d+'
>>> sample = '/usr/sbin/sendmail - 0 errors, 12 warnings'
>>> print(re.sub(digits_re, digits_re.replace('\\', r'\\'), sample))
/usr/sbin/sendmail - \d+ errors, \d+ warnings

3.3 版改变： '_' 字符不再转义。

3.7 版改变： Only characters that can have special meaning in a regular expression are escaped. As a result, '!' , '"' , '%' , "'" , ',' , '/' , ':' , ';' , '<' , '=' , '>' , '@' ，和 "`" are no longer escaped.

re. purge ( ) ¶

异常 ¶

exception re. error ( msg , pattern = None , pos = None ) ¶

msg ¶

pattern ¶

pos ¶

lineno ¶

colno ¶

正则表达式对象 ¶

Compiled regular expression objects support the following methods and attributes:

Pattern. search ( string [ , pos [ , endpos ] ] ) ¶

Pattern. match ( string [ , pos [ , endpos ] ] ) ¶

Pattern. fullmatch ( string [ , pos [ , endpos ] ] ) ¶

3.4 版新增。

Pattern. split ( string , maxsplit = 0 ) ¶

Pattern. findall ( string [ , pos [ , endpos ] ] ) ¶

Pattern. finditer ( string [ , pos [ , endpos ] ] ) ¶

Pattern. sub ( repl , string , count = 0 ) ¶

Pattern. subn ( repl , string , count = 0 ) ¶

Pattern. flags ¶

Pattern. groups ¶

Pattern. groupindex ¶

Pattern. pattern ¶

Match. expand ( template ) ¶

Match. group ( [ group1 , ... ] ) ¶

若正则表达式使用 (?P<name>...) 句法， groupN arguments may also be strings identifying groups by their group name. If a string argument is not used as a group name in the pattern, an IndexError 异常被引发。

中等复杂范例：

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
>>> m.group('first_name')
'Malcolm'
>>> m.group('last_name')
'Reynolds'

Named groups can also be referred to by their index:

>>> m.group(1)
'Malcolm'
>>> m.group(2)
'Reynolds'

If a group matches multiple times, only the last match is accessible:

>>> m = re.match(r"(..)+", "a1b2c3")  # Matches 3 times.
>>> m.group(1)                        # Returns only the last match.
'c3'

Match. __getitem__ ( g ) ¶

Named groups are supported as well:

>>> m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Isaac Newton")
>>> m['first_name']
'Isaac'
>>> m['last_name']
'Newton'

3.6 版新增。

Match. groups ( default = None ) ¶

If we make the decimal place and everything after it optional, not all groups might participate in the match. These groups will default to None 除非 default argument is given:

>>> m = re.match(r"(\d+)\.?(\d+)?", "24")
>>> m.groups()      # Second group defaults to None.
('24', None)
>>> m.groups('0')   # Now, the second group defaults to '0'.
('24', '0')

Match. groupdict ( default = None ) ¶

Match. start ( [ group ] ) ¶
Match. end ( [ group ] ) ¶

注意， m.start(group) will equal m.end(group) if group matched a null string. For example, after m = re.search('b(c?)', 'cba') , m.start(0) is 1, m.end(0) is 2, m.start(1) and m.end(1) are both 2, and m.start(2) 引发 IndexError 异常。

An example that will remove remove_this from email addresses:

>>> email = "tony@tiremove_thisger.net"
>>> m = re.search("remove_this", email)
>>> email[:m.start()] + email[m.end():]
'tony@tiger.net'

Match. span ( [ group ] ) ¶

`scanf()` 令牌	正则表达式
`%c`	`.`
`%5c`	`.{5}`
`%d`	`[-+]?\d+`
`%e` , `%E` , `%f` , `%g`	`[-+]?(\d+(\.\d*)?\|\.\d+)([eE][-+]?\d+)?`
`%i`	`[-+]?(0[xX][\dA-Fa-f]+\|0[0-7]*\|\d+)`
`%o`	`[-+]?[0-7]+`
`%s`	`\S+`
`%u`	`\d+`
`%x` , `%X`	`[-+]?(0[xX])?[\dA-Fa-f]+`

`re` — 正则表达式运算 ¶

正则表达式语法 ¶

模块内容 ¶

Flags ¶

函数 ¶

异常 ¶

正则表达式对象 ¶

Match 对象 ¶

正则表达式范例 ¶

校验对 ¶

模拟 scanf() ¶

search() vs. match() ¶

制作电话簿 ¶

文本处理 ¶

查找所有副词 ¶

查找所有副词及其位置 ¶

原生字符串表示法 ¶

编写令牌化器 ¶

内容表

上一话题

下一话题

本页

快速搜索

re — 正则表达式运算 ¶

正则表达式语法 ¶

模块内容 ¶

Flags ¶

函数 ¶

异常 ¶

正则表达式对象 ¶

Match 对象 ¶

正则表达式范例 ¶

校验对 ¶

模拟 scanf() ¶

search() vs. match() ¶

制作电话簿 ¶

文本处理 ¶

查找所有副词 ¶

查找所有副词及其位置 ¶

原生字符串表示法 ¶

编写令牌化器 ¶

内容表

上一话题

下一话题

本页

快速搜索

`re` — 正则表达式运算 ¶