2. 词法分析 ¶

A Python program is read by a parser . Input to the parser is a stream of tokens , generated by the 词法分析器 . This chapter describes how the lexical analyzer breaks a file into tokens.

Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see PEP 3120 for details. If the source file cannot be decoded, a SyntaxError 被引发。

2.1. 行结构 ¶

Python 程序被分成许多 逻辑行 .

2.1.1. 逻辑行 ¶

The end of a logical line is represented by the token NEWLINE. Statements cannot cross logical line boundaries except where NEWLINE is allowed by the syntax (e.g., between statements in compound statements). A logical line is constructed from one or more physical lines by following the explicit or implicit 行联接 规则。

2.1.2. 物理行 ¶

A physical line is a sequence of characters terminated by an end-of-line sequence. In source files and strings, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform. The end of input also serves as an implicit terminator for the final physical line.

When embedding Python, source code strings should be passed to Python APIs using the standard C conventions for newline characters (the \n character, representing ASCII LF, is the line terminator).

2.1.3. 注释 ¶

注释开头的哈希字符 ( # ) that is not part of a string literal, and ends at the end of the physical line. A comment signifies the end of the logical line unless the implicit line joining rules are invoked. Comments are ignored by the syntax.

2.1.4. 编码声明 ¶

若 Python 脚本第 1 (或第 2) 行注释匹配正则表达式 coding[=:]\s*([-\w.]+) ，将作为编码声明处理此注释；表达式的第 1 组命名源代码文件的编码。编码声明必须单独出现在一行中。若编码声明在第 2 行，第 1 行也必须是仅注释行。推荐的编码表达式形式

# -*- coding: <encoding-name> -*-

转义序列	含义	注意事项
`\` <newline>	反斜杠和换行符被忽略	(1)
`\\`	反斜杠 ( `\` )
`\'`	单引号 ( `'` )
`\"`	双引号 ( `"` )
`\a`	ASCII 响铃 (BEL)
`\b`	ASCII 退格 (BS)
`\f`	ASCII 换页 (FF)
`\n`	ASCII 换行 (LF)
`\r`	ASCII CR (回车)
`\t`	ASCII 水平制表符 (TAB)
`\v`	ASCII 垂直制表符 (VT)
`\ooo`	字符具有八进制值 ooo	(2,4)
`\xhh`	字符具有十六进制值 hh	(3,4)

转义序列	含义	注意事项
`\N{name}`	字符命名 name 在 Unicode 数据库中	(5)
`\uxxxx`	字符具有 16 位十六进制值 xxxx	(6)
`\Uxxxxxxxx`	字符具有 32 位十六进制值 xxxxxxxx	(7)

2. 词法分析 ¶

2.1. 行结构 ¶

2.1.1. 逻辑行 ¶

2.1.2. 物理行 ¶

2.1.3. 注释 ¶

2.1.4. 编码声明 ¶

2.1.5. 明确行联接 ¶

2.1.6. 隐式行联接 ¶

2.1.7. 空行 ¶

2.1.8. 缩进 ¶

2.1.9. 令牌之间的空白 ¶

2.2. 其它令牌 ¶

2.3. 标识符和关键词 ¶

2.3.1. 关键词 ¶

2.3.2. Soft Keywords ¶

2.3.3. 预留的标识符类 ¶

2.4. 文字 ¶

2.4.1. 字符串和 bytes 文字 ¶

2.4.1.1. Escape sequences ¶

2.4.2. 字符串文字串联 ¶

2.4.3. f-strings ¶

2.4.4. 数值文字 ¶

2.4.5. 整数文字 ¶

2.4.6. Floating-point literals ¶

2.4.7. 虚数文字 ¶

2.5. 运算符 ¶

2.6. 定界符 ¶

内容表

上一话题

下一话题

本页