pandas 大熊猫与 numpy 中的不同标准

Question

提问by Mannaggia

The standard deviation differs between pandas and numpy. Why and which one is the correct one? (the relative difference is 3.5% which should not come from rounding, this is high in my opinion).

pandas 和 numpy 的标准偏差不同。为什么，哪一个是正确的？（相对差异是 3.5%，这不应该来自四舍五入，我认为这是很高的）。

Example

例子

import numpy as np
import pandas as pd
from StringIO import StringIO

a='''0.057411
0.024367
 0.021247
-0.001809
-0.010874
-0.035845
0.001663
0.043282
0.004433
-0.007242
0.029294
0.023699
0.049654
0.034422
-0.005380'''


df = pd.read_csv(StringIO(a.strip()), delim_whitespace=True, header=None)

df.std()==np.std(df) # False
df.std() # 0.025801
np.std(df) # 0.024926

(0.024926 - 0.025801) / 0.024926 # 3.5% relative difference

I use these versions:

我使用这些版本：

pandas: '0.14.0' numpy: '1.8.1'

Pandas：'0.14.0' numpy：'1.8.1'

Answer 1

回答by NPE

In a nutshell, neither is "incorrect". Pandas uses the unbiased estimator(N-1in the denominator), whereas Numpy by default does not.

简而言之，两者都不是“不正确的”。Pandas 使用无偏估计器（N-1在分母中），而 Numpy 默认不使用。

To make them behave the same, pass ddof=1to numpy.std().

要使它们的行为相同，请传递ddof=1到numpy.std().

For further discussion, see

有关进一步讨论，请参见

Answer 2

回答by Xuan

For pandasto performed the same as numpy, you can pass in the ddof=0parameter, so df.std(ddof=0).

对于pandasto 执行与相同numpy，您可以传入ddof=0参数，因此df.std(ddof=0)。

This short video explains quite well why n-1might be preferred for samples. https://www.youtube.com/watch?v=Cn0skMJ2F3c

这个简短的视频很好地解释了为什么n-1可能更喜欢样品。https://www.youtube.com/watch?v=Cn0skMJ2F3c

pandas 大熊猫与 numpy 中的不同标准

提问by Mannaggia

回答by NPE

回答by Xuan

相关推荐

最近更新

标签

pandas 大熊猫与 numpy 中的不同标准

提问by Mannaggia

回答by NPE

回答by Xuan

相关推荐

pandas 如果值出现在熊猫数据框的任何列中，如何打印行

pandas Groupby 给定所选 DataFrame 列值的百分位数

从协议缓冲区创建一个类似 Python 字典的对象，以在 Pandas 中使用

Pandas：连接数据框并保留重复的索引

相关推荐

最近更新

标签