Python Groupby Pandas DataFrame 并计算一列的均值和标准差,并将标准差添加为具有 reset_index 的新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26599347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Groupby Pandas DataFrame and calculate mean and stdev of one column and add the std as a new column with reset_index
提问by kkhatri99
I have a Pandas DataFrame as below:
我有一个 Pandas DataFrame 如下:
a b c d
0 Apple 3 5 7
1 Banana 4 4 8
2 Cherry 7 1 3
3 Apple 3 4 7
I would like to group the rows by column 'a' while replacing values in column 'c' by the mean of values in grouped rows and add another column with std deviation of the values in column 'c' whose mean has been calculated. The values in column 'b' or 'd' are constant for all rows being grouped. So, the desired output would be:
我想按“a”列对行进行分组,同时用分组行中的值的平均值替换“c”列中的值,并添加另一列,其平均值已计算出“c”列中值的标准偏差。'b' 或 'd' 列中的值对于被分组的所有行都是常量。因此,所需的输出将是:
a b c d e
0 Apple 3 4.5 7 0.707107
1 Banana 4 4 8 0
2 Cherry 7 1 3 0
What is the best way to achieve this?
实现这一目标的最佳方法是什么?
采纳答案by unutbu
You could use a groupby-aggoperation:
你可以使用一个groupby-agg操作:
In [38]: result = df.groupby(['a'], as_index=False).agg(
{'c':['mean','std'],'b':'first', 'd':'first'})
and then rename and reorder the columns:
然后重命名和重新排序列:
In [39]: result.columns = ['a','c','e','b','d']
In [40]: result.reindex(columns=sorted(result.columns))
Out[40]:
a b c d e
0 Apple 3 4.5 7 0.707107
1 Banana 4 4.0 8 NaN
2 Cherry 7 1.0 3 NaN
Pandas computes the sample std by default. To compute the population std:
Pandas 默认计算样本标准差。计算人口标准:
def pop_std(x):
return x.std(ddof=0)
result = df.groupby(['a'], as_index=False).agg({'c':['mean',pop_std],'b':'first', 'd':'first'})
result.columns = ['a','c','e','b','d']
result.reindex(columns=sorted(result.columns))
yields
产量
a b c d e
0 Apple 3 4.5 7 0.5
1 Banana 4 4.0 8 0.0
2 Cherry 7 1.0 3 0.0

