statistics
— 数学统计函数
¶
Added in version 3.4.
源代码: Lib/statistics.py
This module provides functions for calculating mathematical statistics of numeric (
Real
-valued) data.
The module is not intended to be a competitor to third-party libraries such as NumPy , SciPy , or proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab. It is aimed at the level of graphing and scientific calculators.
Unless explicitly noted, these functions support
int
,
float
,
Decimal
and
Fraction
. Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Collections with a mix of types are also undefined and implementation-dependent. If your input data consists of mixed types, you may be able to use
map()
to ensure a consistent result, for example:
map(float, input_data)
.
Some datasets use
NaN
(not a number) values to represent missing data. Since NaNs have unusual comparison semantics, they cause surprising or undefined behaviors in the statistics functions that sort data or that count occurrences. The functions affected are
median()
,
median_low()
,
median_high()
,
median_grouped()
,
mode()
,
multimode()
,和
quantiles()
。
NaN
values should be stripped before calling these functions:
>>> from statistics import median >>> from math import isnan >>> from itertools import filterfalse >>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4] >>> sorted(data) # This has surprising behavior [20.7, nan, 14.4, 18.3, 19.2, nan] >>> median(data) # This result is unexpected 16.35 >>> sum(map(isnan, data)) # Number of missing values 2 >>> clean = list(filterfalse(isnan, data)) # Strip NaN values >>> clean [20.7, 19.2, 18.3, 14.4] >>> sorted(clean) # Sorting now works as expected [14.4, 18.3, 19.2, 20.7] >>> median(clean) # This result is now well defined 18.75