Python 为什么 numpy std() 给出与 matlab std() 不同的结果?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27600207/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:00:16  来源:igfitidea点击:

Why does numpy std() give a different result to matlab std()?

pythonmatlabnumpystandard-deviation

提问by gustavgans

I try to convert matlab code to numpy and figured out that numpy has a different result with the std function.

我尝试将 matlab 代码转换为 numpy 并发现 numpy 与 std 函数有不同的结果。

in matlab

在matlab中

std([1,3,4,6])
ans =  2.0817

in numpy

在 numpy

np.std([1,3,4,6])
1.8027756377319946

Is this normal? And how should I handle this?

这是正常的吗?我该如何处理?

采纳答案by Alex Riley

The NumPy function np.stdtakes an optional parameter ddof: "Delta Degrees of Freedom". By default, this is 0. Set it to 1to get the MATLAB result:

NumPy 函数np.std采用一个可选参数ddof:“Delta 自由度”。默认情况下,这是0. 将其设置1为获取 MATLAB 结果:

>>> np.std([1,3,4,6], ddof=1)
2.0816659994661326

To add a little more context, in the calculation of the variance (of which the standard deviation is the square root) we typically divide by the number of values we have.

为了添加更多上下文,在计算方差(其标准偏差是平方根)时,我们通常除以我们拥有的值的数量。

But if we select a random sample of Nelements from a larger distribution and calculate the variance, division by Ncan lead to an underestimate of the actual variance. To fix this, we can lower the number we divide by (the degrees of freedom) to a number less than N(usually N-1). The ddofparameter allows us change the divisor by the amount we specify.

但是,如果我们N从较大的分布中随机选择元素样本并计算方差,则除法N可能会导致低估实际方差。为了解决这个问题,我们可以将除以(自由度)的数字降低到小于N(通常为N-1)的数字。该ddof参数允许我们按我们指定的数量更改除数。

Unless told otherwise, NumPy will calculate the biasedestimator for the variance (ddof=0, dividing by N). This is what you want if you are working with the entire distribution (and not a subset of values which have been randomly picked from a larger distribution). If the ddofparameter is given, NumPy divides by N - ddofinstead.

除非另有说明,否则 NumPy 将计算方差(,除以)的有偏估计量。如果您正在处理整个分布(而不是从较大分布中随机选取的值的子集),这就是您想要的。如果给定参数,则 NumPy 会除以除以代替。ddof=0NddofN - ddof

The default behaviour of MATLAB's stdis to correct the bias for sample variance by dividing by N-1. This gets rid of some of (but probably not all of) of the bias in the standard deviation. This is likely to be what you want if you're using the function on a random sample of a larger distribution.

MATLAB 的默认行为std是通过除以 来校正样本方差的偏差N-1。这消除了标准偏差中的一些(但可能不是全部)偏差。如果您在较大分布的随机样本上使用该函数,这可能就是您想要的。

The nice answer by @hbaderts gives further mathematical details.

@hbaderts 的好答案提供了更多的数学细节。

回答by hbaderts

The standard deviation is the square root of the variance. The variance of a random variable Xis defined as

标准差是方差的平方根。随机变量的方差X定义为

definition of variance

方差的定义

An estimator for the variance would therefore be

因此,方差的估计量为

biased estimator

有偏估计

where sample meandenotes the sample mean. For randomly selected xi, it can be shown that this estimator does not converge to the real variance, but to

其中样本平均值表示样本均值。对于随机选择的席,可以证明这个估计量不收敛到真实方差,而是收敛到

unbiased estimator

无偏估计

If you randomly select samples and estimate the sample mean and variance, you will have to use a corrected (unbiased) estimator

如果您随机选择样本并估计样本均值和方差,则必须使用校正(无偏)估计量

unbiased estimator

无偏估计

which will converge to sigma squared. The correction term n-1is also called Bessel's correction.

这将收敛到西格玛平方。修正项n-1也称为贝塞尔修正。

Now by default, MATLABs stdcalculates the unbiasedestimator with the correction term n-1. NumPy however (as @ajcr explained) calculates the biasedestimator with no correction term by default. The parameter ddofallows to set any correction term n-ddof. By setting it to 1 you get the same result as in MATLAB.

现在默认情况下,MATLAB使用校正项std计算无偏估计量n-1。然而,NumPy(如@ajcr 所解释的)在默认情况下计算没有校正项的有偏估计量。该参数ddof允许设置任何校正项n-ddof。通过将其设置为 1,您可以获得与 MATLAB 中相同的结果。

Similarly, MATLAB allows to add a second parameter w, which specifies the "weighing scheme". The default, w=0, results in the correction term n-1(unbiased estimator), while for w=1, only n is used as correction term (biased estimator).

同样,MATLAB 允许添加第二个参数w,该参数指定“称重方案”。默认值 ,w=0导致校正项n-1(无偏估计量),而对于w=1,只有 n 用作校正项(有偏估计量)。

回答by MJM

For people who aren't great with statistics, a simplistic guide is:

对于不擅长统计的人来说,一个简单的指南是:

  • Include ddof=1if you're calculating np.std()for a sample taken from your full dataset.

  • Ensure ddof=0if you're calculating np.std()for the full population

  • 包括ddof=1如果你计算np.std()从您的完整数据集取样。

  • 确保ddof=0您计算np.std()的是全部人口

The DDOF is included for samples in order to counterbalance bias that can occur in the numbers.

样本包含 DDOF 以抵消数字中可能出现的偏差。