python中的numpy var()和统计方差()有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41204400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between numpy var() and statistics variance() in python?
提问by Michail Michailidis
I was trying one Dataquest exercise and I figured out that the variance I am getting is different for the two packages..
我正在尝试一个 Dataquest 练习,我发现这两个包的差异是不同的。
e.g for [1,2,3,4]
例如对于 [1,2,3,4]
from statistics import variance
import numpy as np
print(np.var([1,2,3,4]))
print(variance([1,2,3,4]))
//1.25
//1.6666666666666667
The expected answer of the exercise is calculated with np.var()
练习的预期答案是用 np.var() 计算的
EditI guess it has to do that the later one is sample variance and not variance.. Anyone could explain the difference?
编辑我想它必须这样做,后者是样本方差而不是方差..任何人都可以解释这种差异吗?
回答by FallAndLearn
Use this
用这个
print(np.var([1,2,3,4],ddof=1))
1.66666666667
Delta Degrees of Freedom: the divisor used in the calculation is N - ddof
, where N represents the number of elements. By default, ddof
is zero.
Delta 自由度:计算中使用的除数是N - ddof
,其中 N 表示元素的数量。默认情况下,ddof
为零。
The mean is normally calculated as x.sum() / N
, where N = len(x)
. If, however, ddof
is specified, the divisor N - ddof
is used instead.
平均值通常计算为x.sum() / N
,其中N = len(x)
。但是,如果ddof
指定了,N - ddof
则使用除数。
In standard statistical practice, ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.
在标准统计实践中,ddof=1
提供假设无限总体方差的无偏估计量。ddof=0
提供正态分布变量方差的最大似然估计。
Statistical libraries like numpy use the variance nfor what they call var or variance and the standard deviation
像 numpy 这样的统计库使用方差n作为他们所谓的 var 或方差和标准偏差
For more information refer this documentation : numpy doc
有关更多信息,请参阅此文档:numpy doc
回答by Andrew Cameron Morris
It is correct that dividing by N-1 gives an unbiased estimate for the mean, which can give the impression that dividing by N-1 is therefore slightly more accurate, albeit a little more complex. What is too often not stated is that dividing by N gives the minimum variance estimate for the mean, which is likely to be closer to the true mean than the unbiased estimate, as well as being somewhat simpler.
除以 N-1 给出均值的无偏估计是正确的,这可能给人的印象是除以 N-1 因此稍微更准确,尽管稍微复杂一些。通常没有说明的是,除以 N 给出了均值的最小方差估计,这可能比无偏估计更接近真实均值,并且更简单一些。