`urllib.parse` — 将 URL 剖析成组件 ¶

此模块定义的标准接口能将 URL (统一资源定位符) 字符串分解成组件 (编址方案、网络位置、路径等)，将组件组合回 URL 字符串，及将给定基 URL 的相对 URL 转换成绝对 URL。

The module has been designed to match the internet RFC on Relative Uniform Resource Locators. It supports the following URL schemes: file , ftp , gopher , hdl , http , https , imap , itms-services , mailto , mms , news , nntp , prospero , rsync , rtsp , rtsps , rtspu , sftp , shttp , sip , sips , snews , svn , svn+ssh , telnet , wais , ws , wss .

CPython 实现细节： The inclusion of the itms-services URL scheme can prevent an app from passing Apple’s App Store review process for the macOS and iOS App Stores. Handling for the itms-services scheme is always removed on iOS; on macOS, it may be removed if CPython has been built with the --with-app-store-compliance 选项。

The urllib.parse 模块定义的函数分为 2 大类：URL 剖析和 URL 引用。这些将详细涵盖在下列章节。

This module’s functions use the deprecated term netloc (或 net_loc ), which was introduced in RFC 1808 . However, this term has been obsoleted by RFC 3986 , which introduced the term authority as its replacement. The use of netloc is continued for backward compatibility.

URL 剖析 ¶

URL 剖析函数聚焦于将 URL 字符串分割成其组件，或将 URL 组件组合成 URL 字符串。

urllib.parse. urlparse ( urlstring , scheme = '' , allow_fragments = True ) ¶

将 URL 剖析成 6 个组件，返回 6 项命名元组。这相当于一般 URL 结构： scheme://netloc/path;parameters?query#fragment . Each tuple item is a string, possibly empty. The components are not broken up into smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:

>>> from urllib.parse import urlparse
>>> urlparse("scheme://netloc/path;parameters?query#fragment")
ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='',
            query='query', fragment='fragment')
>>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?"
...              "highlight=params#url-parsing")
>>> o
ParseResult(scheme='http', netloc='docs.python.org:80',
            path='/3/library/urllib.parse.html', params='',
            query='highlight=params', fragment='url-parsing')
>>> o.scheme
'http'
>>> o.netloc
'docs.python.org:80'
>>> o.hostname
'docs.python.org'
>>> o.port
80
>>> o._replace(fragment="").geturl()
'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params'

遵循的句法规范在 RFC 1808 , urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.

>>> from urllib.parse import urlparse
>>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            params='', query='', fragment='')
>>> urlparse('www.cwi.nl/%7Eguido/Python.html')
ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
            params='', query='', fragment='')
>>> urlparse('help/Python.html')
ParseResult(scheme='', netloc='', path='help/Python.html', params='',
            query='', fragment='')

The scheme argument gives the default addressing scheme, to be used only if the URL does not specify one. It should be the same type (text or bytes) as urlstring , except that the default value

''

is always allowed, and is automatically converted to

b''

若合适。

若 allow_fragments argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and


fragment

is set to the empty string in the return value.

返回值为命名元组 , which means that its items can be accessed by index or as named attributes, which are:

属性	索引	值	值若不存在
`scheme`	0	URL 方案说明符	scheme 参数
`netloc`	1	网络位置部分	空字符串
`path`	2	分层路径	空字符串
`params`	3	用于最后路径元素的参数	空字符串
`query`	4	查询组件	空字符串
`fragment`	5	片段标识符	空字符串
`username`		用户名	`None`
`password`		口令	`None`
`hostname`		主机名 (小写)	`None`
`port`		整数形式的端口号 (若存在)	`None`

读取


port

属性将引发


ValueError

if an invalid port is specified in the URL. See section 结构化剖析结果 for more information on the result object.

字符在


netloc

attribute that decompose under NFKC normalization (as used by the IDNA encoding) into any of

，或

将引发


ValueError

. If the URL is decomposed before parsing, no error will be raised.

As is the case with all named tuples, the subclass has a few additional methods and attributes that are particularly useful. One such method is


_replace()

。


_replace()

method will return a new ParseResult object replacing specified fields with new values.

`urllib.parse` — 将 URL 剖析成组件 ¶

URL 剖析 ¶

URL parsing security ¶

剖析 ASCII 编码字节 ¶

结构化剖析结果 ¶

URL 引用 ¶

内容表

上一话题

下一话题

本页

属性	索引	值	值若不存在
`url`	0	URL with no fragment	空字符串
`fragment`	1	片段标识符	空字符串

内容表

上一话题

下一话题

本页

urllib.parse — 将 URL 剖析成组件 ¶

URL 剖析 ¶

URL parsing security ¶

剖析 ASCII 编码字节 ¶

结构化剖析结果 ¶

URL 引用 ¶

内容表

上一话题

下一话题

本页

`urllib.parse` — 将 URL 剖析成组件 ¶