html.parser — 简单 HTML 和 XHTML 剖析器

源代码: Lib/html/parser.py


此模块定义的类 HTMLParser 其充当剖析 HTML (超文本标记语言) 和 XHTML 格式文本文件的基础。

class html.parser. HTMLParser ( * , convert_charrefs = True )

创建能剖析无效标记的剖析器实例。

convert_charrefs is True (the default), all character references (except the ones in script / style elements) are automatically converted to the corresponding Unicode characters.

An HTMLParser instance is fed HTML data and calls handler methods when start tags, end tags, text, comments, and other markup elements are encountered. The user should subclass HTMLParser and override its methods to implement the desired behavior.

This parser does not check that end tags match start tags or call the end-tag handler for elements which are closed implicitly by closing an outer element.

3.4 版改变: convert_charrefs keyword argument added.

3.5 版改变: The default value for argument convert_charrefs 现为 True .