SequenceMatcher 范例
¶
此范例比较 2 字符串,并认为 Blank 是 junk:
>>> s = SequenceMatcher(lambda x: x == " ",
... "private Thread currentThread;",
... "private volatile Thread currentThread;")
ratio()
返回 [0, 1] 浮点数,度量序列的相似性。根据经验,
ratio()
值超过 0.6 意味着序列接近匹配:
>>> print(round(s.ratio(), 3))
0.866
若仅对序列匹配位置感兴趣,
get_matching_blocks()
很顺手:
>>> for block in s.get_matching_blocks():
... print("a[%d] and b[%d] match for %d elements" % block)
a[0] and b[0] match for 8 elements
a[8] and b[17] match for 21 elements
a[29] and b[38] match for 0 elements
注意:最后元组的返回通过
get_matching_blocks()
始终是虚设的,
(len(a), len(b), 0)
,且这是唯一情况,最后元组元素 (匹配的元素数) 为
0
.
若想要知道如何将第 1 序列更改成第 2 序列,使用
get_opcodes()
:
>>> for opcode in s.get_opcodes():
... print("%6s a[%d:%d] b[%d:%d]" % opcode)
equal a[0:8] b[0:8]
insert a[8:8] b[8:17]
equal a[8:29] b[17:38]
Differ 对象
¶
注意,
Differ
生成增量不声称是
minimal
diff。相反,最小 diff 经常反直觉,因为它们尽可能同步,有时却意外相距 100 页。将同步点限定到连续匹配可预留一些局域性概念,以偶尔产生更长 diff 为代价。
The
Differ
类拥有此构造函数:
-
class
difflib.
Differ
(
linejunk
=
None
,
charjunk
=
None
)
可选关键词参数
linejunk
and
charjunk
是过滤函数 (或
None
):
linejunk
:函数接受单字符串自变量并返回 True 若字符串是 junk。默认为
None
,意味着没有行被认为是 junk。
charjunk
:函数接受单字符自变量 (1 长字符串) 并返回 True 若字符是 junk。默认为
None
,意味着没有字符被认为是 junk。
这些 junk 过滤函数加速匹配以查找差异,且不会导致任何不同行或字符被忽略。阅读描述从
find_longest_match()
方法的
isjunk
参数为解释。
Differ
对象的使用 (生成增量) 是凭借一方法:
-
compare
(
a
,
b
)
¶
-
比较 2 行序列,并生成增量 (行序列)。
每序列必须包含以换行符结尾的单个单行字符串。这种序列可以获取自
readlines()
方法在像文件对象。生成增量也由以换行终止的字符串组成,准备按原样打印凭借
writelines()
方法在像文件对象。
Differ 范例
¶
此范例比较 2 文本。首先,设置文本,以换行符结尾的单个单行字符串序列 (这种序列还可以获取自
readlines()
方法在像文件对象):
>>> text1 = ''' 1. Beautiful is better than ugly.
... 2. Explicit is better than implicit.
... 3. Simple is better than complex.
... 4. Complex is better than complicated.
... '''.splitlines(keepends=True)
>>> len(text1)
4
>>> text1[0][-1]
'\n'
>>> text2 = ''' 1. Beautiful is better than ugly.
... 3. Simple is better than complex.
... 4. Complicated is better than complex.
... 5. Flat is better than nested.
... '''.splitlines(keepends=True)
接着,实例化 Differ 对象:
注意:当实例化
Differ
对象,可以传递函数以过滤掉行和 junk 字符。见
Differ()
构造函数了解细节。
最后,比较两者:
>>> result = list(d.compare(text1, text2))
result
是字符串列表,因此,让我们美化打印它:
>>> from pprint import pprint
>>> pprint(result)
[' 1. Beautiful is better than ugly.\n',
'- 2. Explicit is better than implicit.\n',
'- 3. Simple is better than complex.\n',
'+ 3. Simple is better than complex.\n',
'? ++\n',
'- 4. Complex is better than complicated.\n',
'? ^ ---- ^\n',
'+ 4. Complicated is better than complex.\n',
'? ++++ ^ ^\n',
'+ 5. Flat is better than nested.\n']
作为单个多行字符串,它看起来像这样:
>>> import sys
>>> sys.stdout.writelines(result)
1. Beautiful is better than ugly.
- 2. Explicit is better than implicit.
- 3. Simple is better than complex.
+ 3. Simple is better than complex.
? ++
- 4. Complex is better than complicated.
? ^ ---- ^
+ 4. Complicated is better than complex.
? ++++ ^ ^
+ 5. Flat is better than nested.
difflib 命令行接口
¶
此范例展示如何使用 difflib 来创建
diff
-like utility.
""" Command line interface to difflib.py providing diffs in four formats:
* ndiff: lists every line and highlights interline changes.
* context: highlights clusters of changes in a before/after format.
* unified: highlights clusters of changes in an inline format.
* html: generates side by side comparison with change highlights.
"""
import sys, os, difflib, argparse
from datetime import datetime, timezone
def file_mtime(path):
t = datetime.fromtimestamp(os.stat(path).st_mtime,
timezone.utc)
return t.astimezone().isoformat()
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-c', action='store_true', default=False,
help='Produce a context format diff (default)')
parser.add_argument('-u', action='store_true', default=False,
help='Produce a unified format diff')
parser.add_argument('-m', action='store_true', default=False,
help='Produce HTML side by side diff '
'(can use -c and -l in conjunction)')
parser.add_argument('-n', action='store_true', default=False,
help='Produce a ndiff format diff')
parser.add_argument('-l', '--lines', type=int, default=3,
help='Set number of context lines (default 3)')
parser.add_argument('fromfile')
parser.add_argument('tofile')
options = parser.parse_args()
n = options.lines
fromfile = options.fromfile
tofile = options.tofile
fromdate = file_mtime(fromfile)
todate = file_mtime(tofile)
with open(fromfile) as ff:
fromlines = ff.readlines()
with open(tofile) as tf:
tolines = tf.readlines()
if options.u:
diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
elif options.n:
diff = difflib.ndiff(fromlines, tolines)
elif options.m:
diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile,context=options.c,numlines=n)
else:
diff = difflib.context_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n)
sys.stdout.writelines(diff)
if __name__ == '__main__':
main()
ndiff example
¶
此范例展示如何使用
difflib.ndiff()
.
"""ndiff [-q] file1 file2
or
ndiff (-r1 | -r2) < ndiff_output > file1_or_file2
Print a human-friendly file difference report to stdout. Both inter-
and intra-line differences are noted. In the second form, recreate file1
(-r1) or file2 (-r2) on stdout, from an ndiff report on stdin.
In the first form, if -q ("quiet") is not specified, the first two lines
of output are
-: file1
+: file2
Each remaining line begins with a two-letter code:
"- " line unique to file1
"+ " line unique to file2
" " line common to both files
"? " line not present in either input file
Lines beginning with "? " attempt to guide the eye to intraline
differences, and were not present in either input file. These lines can be
confusing if the source files contain tab characters.
The first file can be recovered by retaining only lines that begin with
" " or "- ", and deleting those 2-character prefixes; use ndiff with -r1.
The second file can be recovered similarly, but by retaining only " " and
"+ " lines; use ndiff with -r2; or, on Unix, the second file can be
recovered by piping the output through
sed -n '/^[+ ] /s/^..//p'
"""
__version__ = 1, 7, 0
import difflib, sys
def fail(msg):
out = sys.stderr.write
out(msg + "\n\n")
out(__doc__)
return 0
# open a file & return the file object; gripe and return 0 if it
# couldn't be opened
def fopen(fname):
try:
return open(fname)
except IOError as detail:
return fail("couldn't open " + fname + ": " + str(detail))
# open two files & spray the diff to stdout; return false iff a problem
def fcompare(f1name, f2name):
f1 = fopen(f1name)
f2 = fopen(f2name)
if not f1 or not f2:
return 0
a = f1.readlines(); f1.close()
b = f2.readlines(); f2.close()
for line in difflib.ndiff(a, b):
print(line, end=' ')
return 1
# crack args (sys.argv[1:] is normal) & compare;
# return false iff a problem
def main(args):
import getopt
try:
opts, args = getopt.getopt(args, "qr:")
except getopt.error as detail:
return fail(str(detail))
noisy = 1
qseen = rseen = 0
for opt, val in opts:
if opt == "-q":
qseen = 1
noisy = 0
elif opt == "-r":
rseen = 1
whichfile = val
if qseen and rseen:
return fail("can't specify both -q and -r")
if rseen:
if args:
return fail("no args allowed with -r option")
if whichfile in ("1", "2"):
restore(whichfile)
return 1
return fail("-r value must be 1 or 2")
if len(args) != 2:
return fail("need 2 filename args")
f1name, f2name = args
if noisy:
print('-:', f1name)
print('+:', f2name)
return fcompare(f1name, f2name)
# read ndiff output from stdin, and print file1 (which=='1') or
# file2 (which=='2') to stdout
def restore(which):
restored = difflib.restore(sys.stdin.readlines(), which)
sys.stdout.writelines(restored)
if __name__ == '__main__':
main(sys.argv[1:])