Pandas dataframe groupby 计算总体标准差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25915225/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe groupby to calculate population standard deviation
提问by neelshiv
I am trying to use groupby and np.std to calculate a standard deviation, but it seems to be calculating a sample standard deviation (with a degrees of freedom equal to 1).
我正在尝试使用 groupby 和 np.std 来计算标准偏差,但它似乎正在计算样本标准偏差(自由度等于 1)。
Here is a sample.
这是一个示例。
#create dataframe
>>> df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
>>> df
A B values
0 1 1 10
1 1 2 15
2 2 1 20
3 2 2 25
#calculate standard deviation using groupby
>>> df.groupby('A').agg(np.std)
B values
A
1 0.707107 3.535534
2 0.707107 3.535534
#Calculate using numpy (np.std)
>>> np.std([10,15],ddof=0)
2.5
>>> np.std([10,15],ddof=1)
3.5355339059327378
Is there a way to use the population std calculation (ddof=0) with the groupby statement? The records I am using are not (not the example table above) are not samples, so I am only interested in population std deviations.
有没有办法在 groupby 语句中使用人口标准计算(ddof=0)?我使用的记录不是(不是上面的示例表)不是样本,所以我只对总体标准偏差感兴趣。
回答by EdChum
You can pass additional args to np.stdin the aggfunction:
您可以np.std在agg函数中传递额外的参数:
In [202]:
df.groupby('A').agg(np.std, ddof=0)
Out[202]:
B values
A
1 0.5 2.5
2 0.5 2.5
In [203]:
df.groupby('A').agg(np.std, ddof=1)
Out[203]:
B values
A
1 0.707107 3.535534
2 0.707107 3.535534
回答by Giorgos Myrianthous
For degree of freedom = 0
为了 degree of freedom = 0
(This means that bins with one number will end up with std=0instead of NaN)
(这意味着带有一个数字的垃圾箱将以std=0代替NaN)
import numpy as np
def std(x):
return np.std(x)
df.groupby('A').agg(['mean', 'max', std])

