Python:Numpy 标准偏差错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24067996/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:54:25  来源:igfitidea点击:

Python: Numpy standard deviation error

pythonnumpy

提问by MacSanhe

This is a simple test

这是一个简单的测试

import numpy as np
data = np.array([-1,0,1])
print data.std()

>> 0.816496580928

I don't understand how this result been generated? Obviously:

我不明白这个结果是如何产生的?明显地:

( (1^0.5 + 1^0.5 + 0^0.5)/(3-1) )^0.5 = 1

and in matlab it gives me std([-1,0,1]) = 1. Could you help me get understand how numpy.std()works?

在 matlab 中它给了我std([-1,0,1]) = 1. 你能帮我了解一下numpy.std()工作原理吗?

回答by BlackVegetable

The crux of this problem is that you need to divide by N(3), not N-1(2). As Iarsmans pointed out, numpy will use the population variance, not the sample variance.

这个问题的关键是你需要除以N(3) 而不是N-1(2)。正如 Iarsmans 指出的那样,numpy 将使用总体方差,而不是样本方差。

So the real answer is sqrt(2/3)which is exactly that: 0.8164965...

所以真正的答案是sqrt(2/3)0.8164965...

If you happen to be trying to deliberately use a different value (than the default of 0) for the degrees of freedom, use the keyword argument ddofwith a positive value other than 0:

如果您碰巧故意使用不同的值(而不是默认值 0)作为自由度,请使用ddof非 0 正值的关键字参数:

np.std(data, ddof=1)

... but doing so herewould reintroduce your original problem as numpy will divide by N - ddof.

...但在这里这样做会重新引入您的原始问题,因为 numpy 将除以N - ddof.

回答by Oleg Sklyar

It is worth reading the help page for the function/method before suggesting it is incorrect. The method does exactly what the doc-string says it should be doing, divides by 3, because By default ddofis zero.:

在建议它不正确之前,值得阅读该函数/方法的帮助页面。该方法完全按照文档字符串所说的去做,除以 3,因为默认情况下ddof为零。

In [3]: numpy.std?

String form: <function std at 0x104222398>
File:        /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/fromnumeric.py
Definition:  numpy.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)
Docstring:
Compute the standard deviation along the specified axis.

...

ddof : int, optional
    Means Delta Degrees of Freedom.  The divisor used in calculations
    is ``N - ddof``, where ``N`` represents the number of elements.
    By default `ddof` is zero.

回答by schodge

When getting into NumPy from Matlab, you'll probably want to keep the docs for both handy. They're similar but often differ in small but important details. Basically, they calculate the standard deviation differently. I would strongly recommend checking the documentation for anything you use that calculates standard deviation, whether a pocket calculator or a programming language, since the default is not (sorry!) standardized.

当从 Matlab 进入 NumPy 时,您可能希望将文档放在手边。它们很相似,但通常在小而重要的细节上有所不同。基本上,他们计算标准偏差的方式不同。我强烈建议您检查文档以了解您使用的任何计算标准偏差的内容,无论是袖珍计算器还是编程语言,因为默认值不是(对不起!)标准化的。

Numpy STD: http://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html

Numpy STD:http: //docs.scipy.org/doc/numpy/reference/generated/numpy.std.html

Matlab STD: http://www.mathworks.com/help/matlab/ref/std.html

Matlab STD:http: //www.mathworks.com/help/matlab/ref/std.html

The Numpy docs for stdare a bit opaque, IMHO, especially considering that NumPy docs are generally fairly clear. If you read far enough: The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population.(In english, default is pop std dev, set ddof=1for sample std dev).

std恕我直言,Numpy 文档有点不透明,尤其是考虑到 NumPy 文档通常相当清晰。如果您读得足够多:(The average squared deviation is normally calculated as x.sum() / N, where N = len(x). If, however, ddof is specified, the divisor N - ddof is used instead. In standard statistical practice, ddof=1 provides an unbiased estimator of the variance of the infinite population.在英语中,默认为 pop std dev,ddof=1为示例 std dev设置)。

OTOH, the Matlab docs make clear the difference that's tripping you up:

OTOH,Matlab 文档清楚地说明了绊倒你的区别:

There are two common textbook definitions for the standard deviation s of a data vector X. [equations omitted] n is the number of elements in the sample. The two forms of the equation differ only in n – 1 versus n in the divisor.

There are two common textbook definitions for the standard deviation s of a data vector X. [equations omitted] n is the number of elements in the sample. The two forms of the equation differ only in n – 1 versus n in the divisor.

So, by default, Matlab calculates the sample standard deviation (N-1 in the divisor, so bigger to compensate for the fact this is a sample) and Numpy calculates the population standard deviation (N in the divisor). You use the ddofparameter to switch to the sample standard, or any other denominator you want (which goes beyond my statistics knowledge).

因此,默认情况下,Matlab 计算样本标准差(除数中的 N-1,因此更大以补偿这是一个样本的事实),而 Numpy 计算总体标准差(除数中的 N)。您可以使用该ddof参数切换到样本标准或您想要的任何其他分母(这超出了我的统计知识范围)。

Lastly, it doesn't help on this problem, but you'll probably find this helpful at some point. http://wiki.scipy.org/NumPy_for_Matlab_Users

最后,它对这个问题没有帮助,但你可能会发现这在某些时候很有帮助。http://wiki.scipy.org/NumPy_for_Matlab_Users