python中的numpy var()和统计方差()有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41204400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:34:41  来源:igfitidea点击:

What is the difference between numpy var() and statistics variance() in python?

pythonnumpy

提问by Michail Michailidis

I was trying one Dataquest exercise and I figured out that the variance I am getting is different for the two packages..

我正在尝试一个 Dataquest 练习,我发现这两个包的差异是不同的。

e.g for [1,2,3,4]

例如对于 [1,2,3,4]

from statistics import variance
import numpy as np
print(np.var([1,2,3,4]))
print(variance([1,2,3,4]))
//1.25
//1.6666666666666667

The expected answer of the exercise is calculated with np.var()

练习的预期答案是用 np.var() 计算的

EditI guess it has to do that the later one is sample variance and not variance.. Anyone could explain the difference?

编辑我想它必须这样做,后者是样本方差而不是方差..任何人都可以解释这种差异吗?

回答by FallAndLearn

Use this

用这个

print(np.var([1,2,3,4],ddof=1))

1.66666666667

Delta Degrees of Freedom: the divisor used in the calculation is N - ddof, where N represents the number of elements. By default, ddofis zero.

Delta 自由度:计算中使用的除数是N - ddof,其中 N 表示元素的数量。默认情况下,ddof为零。

The mean is normally calculated as x.sum() / N, where N = len(x). If, however, ddofis specified, the divisor N - ddofis used instead.

平均值通常计算为x.sum() / N,其中N = len(x)。但是,如果ddof指定了,N - ddof则使用除数。

In standard statistical practice, ddof=1provides an unbiased estimator of the variance of a hypothetical infinite population. ddof=0provides a maximum likelihood estimate of the variance for normally distributed variables.

在标准统计实践中,ddof=1提供假设无限总体方差的无偏估计量。ddof=0提供正态分布变量方差的最大似然估计。

Statistical libraries like numpy use the variance nfor what they call var or variance and the standard deviation

像 numpy 这样的统计库使用方差n作为他们所谓的 var 或方差和标准偏差

For more information refer this documentation : numpy doc

有关更多信息,请参阅此文档:numpy doc

回答by Andrew Cameron Morris

It is correct that dividing by N-1 gives an unbiased estimate for the mean, which can give the impression that dividing by N-1 is therefore slightly more accurate, albeit a little more complex. What is too often not stated is that dividing by N gives the minimum variance estimate for the mean, which is likely to be closer to the true mean than the unbiased estimate, as well as being somewhat simpler.

除以 N-1 给出均值的无偏估计是正确的,这可能给人的印象是除以 N-1 因此稍微更准确,尽管稍微复杂一些。通常没有说明的是,除以 N 给出了均值的最小方差估计,这可能比无偏估计更接近真实均值,并且更简单一些。