The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also
PEP 3131
进一步了解细节。
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers include the uppercase and lowercase letters
A
through
Z
, the underscore
_
and, except for the first character, the digits
0
through
9
. Python 3.0 introduced additional characters from outside the ASCII range (see
PEP 3131
). For these characters, the classification uses the version of the Unicode Character Database as included in the
unicodedata
模块。
Identifiers are unlimited in length. Case is significant.
All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC.
2.4.
文字
¶
文字是某些内置类型的常量值的表示法。
2.4.1.
字符串和 bytes 文字
¶
字符串文字由以下词法定义描述:
stringliteral ::= [stringprefix](shortstring | longstring)
stringprefix ::= "r" | "u" | "R" | "U" | "f" | "F"
| "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
shortstring ::= "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring ::= "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem ::= shortstringchar | stringescapeseq
longstringitem ::= longstringchar | stringescapeseq
shortstringchar ::= <any source character except "\" or newline or the quote>
longstringchar ::= <any source character except "\">
stringescapeseq ::= "\" <any source character>
bytesliteral ::= bytesprefix(shortbytes | longbytes)
bytesprefix ::= "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
shortbytes ::= "'" shortbytesitem* "'" | '"' shortbytesitem* '"'
longbytes ::= "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""'
shortbytesitem ::= shortbyteschar | bytesescapeseq
longbytesitem ::= longbyteschar | bytesescapeseq
shortbyteschar ::= <any ASCII character except "\" or newline or the quote>
longbyteschar ::= <any ASCII character except "\">
bytesescapeseq ::= "\" <any ASCII character>
One syntactic restriction not indicated by these productions is that whitespace is not allowed between the
stringprefix
or
bytesprefix
and the rest of the literal. The source character set is defined by the encoding declaration; it is UTF-8 if no encoding declaration is given in the source file; see section
编码声明
.
In plain English: Both types of literals can be enclosed in matching single quotes (
'
) 或双引号 (
"
). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as
triple-quoted strings
). The backslash (
\
) character is used to give special meaning to otherwise ordinary characters like
n
, which means ‘newline’ when escaped (
\n
). It can also be used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. See
escape sequences
below for examples.
Bytes literals are always prefixed with
'b'
or
'B'
; they produce an instance of the
bytes
type instead of the
str
type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.
Both string and bytes literals may optionally be prefixed with a letter
'r'
or
'R'
; such constructs are called
raw string literals
and
raw bytes literals
respectively and treat backslashes as literal characters. As a result, in raw string literals,
'\U'
and
'\u'
escapes are not treated specially.
Added in version 3.3:
The
'rb'
prefix of raw bytes literals has been added as a synonym of
'br'
.
Support for the unicode legacy literal (
u'value'
) was reintroduced to simplify the maintenance of dual Python 2.x and 3.x codebases. See
PEP 414
了解更多信息。
字符串文字采用
'f'
or
'F'
在其前缀中是
格式化字符串文字
;见
f-strings
。
'f'
可以组合
'r'
,但不采用
'b'
or
'u'
,因此,原生格式化字符串是可能的,但格式化字节文字是不可能的。
In triple-quoted literals, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the literal. (A “quote” is the character used to open the literal, i.e. either
'
or
"
)。
2.4.1.1.
Escape sequences
¶
除非
'r'
or
'R'
prefix is present, escape sequences in string and bytes literals are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are:
|
转义序列
|
含义
|
注意事项
|
|
\
<newline>
|
反斜杠和换行符被忽略
|
(1)
|
|
\\
|
反斜杠 (
\
)
|
|
|
\'
|
单引号 (
'
)
|
|
|
\"
|
双引号 (
"
)
|
|
|
\a
|
ASCII 响铃 (BEL)
|
|
|
\b
|
ASCII 退格 (BS)
|
|
|
\f
|
ASCII 换页 (FF)
|
|
|
\n
|
ASCII 换行 (LF)
|
|
|
\r
|
ASCII CR (回车)
|
|
|
\t
|
ASCII 水平制表符 (TAB)
|
|
|
\v
|
ASCII 垂直制表符 (VT)
|
|
|
\ooo
|
字符具有八进制值
ooo
|
(2,4)
|
|
\xhh
|
字符具有十六进制值
hh
|
(3,4)
|
Escape sequences only recognized in string literals are:
|
转义序列
|
含义
|
注意事项
|
|
\N{name}
|
字符命名
name
在 Unicode 数据库中
|
(5)
|
|
\uxxxx
|
字符具有 16 位十六进制值
xxxx
|
(6)
|
|
\Uxxxxxxxx
|
字符具有 32 位十六进制值
xxxxxxxx
|
(7)
|
注意事项:
-
A backslash can be added at the end of a line to ignore the newline:
>>> 'This string will not include \
... backslashes or newline characters.'
'This string will not include backslashes or newline characters.'
The same result can be achieved using
triple-quoted strings
, or parentheses and
string literal concatenation
.
-
As in Standard C, up to three octal digits are accepted.
3.11 版改变:
Octal escapes with value larger than
0o377
produce a
DeprecationWarning
.
3.12 版改变:
Octal escapes with value larger than
0o377
produce a
SyntaxWarning
. In a future Python version they will be eventually a
SyntaxError
.
-
Unlike in Standard C, exactly two hex digits are required.
-
In a bytes literal, hexadecimal and octal escapes denote the byte with the given value. In a string literal, these escapes denote a Unicode character with the given value.
-
3.3 版改变:
Support for name aliases
has been added.
-
Exactly four hex digits are required.
-
Any Unicode character can be encoded this way. Exactly eight hex digits are required.
Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e.,
the backslash is left in the result
. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals.
3.6 版改变:
Unrecognized escape sequences produce a
DeprecationWarning
.
3.12 版改变:
Unrecognized escape sequences produce a
SyntaxWarning
. In a future Python version they will be eventually a
SyntaxError
.
Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example,
r"\""
is a valid string literal consisting of two characters: a backslash and a double quote;
r"\"
is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically,
a raw literal cannot end in a single backslash
(since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal,
not
as a line continuation.
2.4.2.
字符串文字串联
¶
Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus,
"hello" 'world'
相当于
"helloworld"
. This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings, for example:
re.compile("[A-Za-z_]" # letter or underscore
"[A-Za-z0-9_]*" # letter, digit or underscore
)
Note that this feature is defined at the syntactical level, but implemented at compile time. The ‘+’ operator must be used to concatenate string expressions at run time. Also note that literal concatenation can use different quoting styles for each component (even mixing raw strings and triple quoted strings), and formatted string literals may be concatenated with plain string literals.
2.4.3.
f-strings
¶
Added in version 3.6.
A
格式化字符串文字
or
f-string
是加前缀的字符串文字采用
'f'
or
'F'
. These strings may contain replacement fields, which are expressions delimited by curly braces
{}
. While other string literals always have a constant value, formatted strings are really expressions evaluated at run time.
Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is:
f_string ::= (literal_char | "{{" | "}}" | replacement_field)*
replacement_field ::= "{" f_expression ["="] ["!" conversion] [":" format_spec] "}"
f_expression ::= (conditional_expression | "*" or_expr)
("," conditional_expression | "," "*" or_expr)* [","]
| yield_expression
conversion ::= "s" | "r" | "a"
format_spec ::= (literal_char | replacement_field)*
literal_char ::= <any code point except "{", "}" or NULL>
The parts of the string outside curly braces are treated literally, except that any doubled curly braces
'{{'
or
'}}'
are replaced with the corresponding single curly brace. A single opening curly bracket
'{'
marks a replacement field, which starts with a Python expression. To display both the expression text and its value after evaluation, (useful in debugging), an equal sign
'='
may be added after the expression. A conversion field, introduced by an exclamation point
'!'
may follow. A format specifier may also be appended, introduced by a colon
':'
. A replacement field ends with a closing curly bracket
'}'
.
Expressions in formatted string literals are treated like regular Python expressions surrounded by parentheses, with a few exceptions. An empty expression is not allowed, and both
lambda
and assignment expressions
:=
must be surrounded by explicit parentheses. Each expression is evaluated in the context where the formatted string literal appears, in order from left to right. Replacement expressions can contain newlines in both single-quoted and triple-quoted f-strings and they can contain comments. Everything that comes after a
#
inside a replacement field is a comment (even closing braces and quotes). In that case, replacement fields must be closed in a different line.
>>> f"abc{a # This is a comment }"
... + 3}"
'abc5'
3.7 版改变:
Prior to Python 3.7, an
await
expression and comprehensions containing an
async for
clause were illegal in the expressions in formatted string literals due to a problem with the implementation.
3.12 版改变:
Prior to Python 3.12, comments were not allowed inside f-string replacement fields.
当等号
'='
is provided, the output will have the expression text, the
'='
and the evaluated value. Spaces after the opening brace
'{'
, within the expression and after the
'='
are all retained in the output. By default, the
'='
causes the
repr()
of the expression to be provided, unless there is a format specified. When a format is specified it defaults to the
str()
of the expression unless a conversion
'!r'
is declared.
Added in version 3.8:
等号
'='
.
If a conversion is specified, the result of evaluating the expression is converted before formatting. Conversion
'!s'
调用
str()
on the result,
'!r'
调用
repr()
,和
'!a'
调用
ascii()
.
The result is then formatted using the
format()
protocol. The format specifier is passed to the
__format__()
method of the expression or conversion result. An empty string is passed when the format specifier is omitted. The formatted result is then included in the final value of the whole string.
Top-level format specifiers may include nested replacement fields. These nested fields may include their own conversion fields and
格式说明符
, but may not include more deeply nested replacement fields. The
格式说明符迷你语言
is the same as that used by the
str.format()
方法。
Formatted string literals may be concatenated, but replacement fields cannot be split across literals.
格式化字符串文字的一些范例:
>>> name = "Fred"
>>> f"He said his name is {name!r}."
"He said his name is 'Fred'."
>>> f"He said his name is {repr(name)}." # repr() is equivalent to !r
"He said his name is 'Fred'."
>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal("12.34567")
>>> f"result: {value:{width}.{precision}}" # nested fields
'result: 12.35'
>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%B %d, %Y}" # using date format specifier
'January 27, 2017'
>>> f"{today=:%B %d, %Y}" # using date format specifier and debugging
'today=January 27, 2017'
>>> number = 1024
>>> f"{number:#0x}" # using integer format specifier
'0x400'
>>> foo = "bar"
>>> f"{ foo = }" # preserves whitespace
" foo = 'bar'"
>>> line = "The mill's closed"
>>> f"{line = }"
'line = "The mill\'s closed"'
>>> f"{line = :20}"
"line = The mill's closed "
>>> f"{line = !r:20}"
'line = "The mill\'s closed" '
Reusing the outer f-string quoting type inside a replacement field is permitted:
>>> a = dict(x=2)
>>> f"abc {a["x"]} def"
'abc 2 def'
3.12 版改变:
Prior to Python 3.12, reuse of the same quoting type of the outer f-string inside a replacement field was not possible.
Backslashes are also allowed in replacement fields and are evaluated the same way as in any other context:
>>> a = ["a", "b", "c"]
>>> print(f"List a contains:\n{"\n".join(a)}")
List a contains:
a
b
c
3.12 版改变:
Prior to Python 3.12, backslashes were not permitted inside an f-string replacement field.
Formatted string literals cannot be used as docstrings, even if they do not include expressions.
>>> def foo():
... f"Not a docstring"
...
>>> foo.__doc__ is None
True
另请参阅
PEP 498
for the proposal that added formatted string literals, and
str.format()
, which uses a related format string mechanism.
2.4.4.
数值文字
¶
There are three types of numeric literals: integers, floating-point numbers, and imaginary numbers. There are no complex literals (complex numbers can be formed by adding a real number and an imaginary number).
Note that numeric literals do not include a sign; a phrase like
-1
is actually an expression composed of the unary operator ‘
-
’ and the literal
1
.
2.4.5.
整数文字
¶
下列词法定义描述整数文字:
integer ::= decinteger | bininteger | octinteger | hexinteger
decinteger ::= nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger ::= "0" ("b" | "B") (["_"] bindigit)+
octinteger ::= "0" ("o" | "O") (["_"] octdigit)+
hexinteger ::= "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::= "1"..."9"
digit ::= "0"..."9"
bindigit ::= "0" | "1"
octdigit ::= "0"..."7"
hexdigit ::= digit | "a"..."f" | "A"..."F"
There is no limit for the length of integer literals apart from what can be stored in available memory.
Underscores are ignored for determining the numeric value of the literal. They can be used to group digits for enhanced readability. One underscore can occur between digits, and after base specifiers like
0x
.
Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.
一些整数文字范例:
7 2147483647 0o177 0b100110111
3 79228162514264337593543950336 0o377 0xdeadbeef
100_000_000_000 0b_1110_0101
3.6 版改变:
Underscores are now allowed for grouping purposes in literals.
2.4.6.
Floating-point literals
¶
Floating-point literals are described by the following lexical definitions:
floatnumber ::= pointfloat | exponentfloat
pointfloat ::= [digitpart] fraction | digitpart "."
exponentfloat ::= (digitpart | pointfloat) exponent
digitpart ::= digit (["_"] digit)*
fraction ::= "." digitpart
exponent ::= ("e" | "E") ["+" | "-"] digitpart
Note that the integer and exponent parts are always interpreted using radix 10. For example,
077e010
is legal, and denotes the same number as
77e10
. The allowed range of floating-point literals is implementation-dependent. As in integer literals, underscores are supported for digit grouping.
Some examples of floating-point literals:
3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93
3.6 版改变:
Underscores are now allowed for grouping purposes in literals.
2.4.7.
虚数文字
¶
虚数文字的描述是通过下列词法定义:
imagnumber ::= (floatnumber | digitpart) ("j" | "J")
An imaginary literal yields a complex number with a real part of 0.0. Complex numbers are represented as a pair of floating-point numbers and have the same restrictions on their range. To create a complex number with a nonzero real part, add a floating-point number to it, e.g.,
(3+4j)
。一些虚数文字范例:
3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j
2.1.3. 注释 ¶
注释开头的哈希字符 (
#) that is not part of a string literal, and ends at the end of the physical line. A comment signifies the end of the logical line unless the implicit line joining rules are invoked. Comments are ignored by the syntax.