Python Groupby Pandas DataFrame 并计算一列的均值和标准差，并将标准差添加为具有 reset_index 的新列

Question

提问by kkhatri99

I have a Pandas DataFrame as below:

我有一个 Pandas DataFrame 如下：

   a      b      c      d
0  Apple  3      5      7
1  Banana 4      4      8
2  Cherry 7      1      3
3  Apple  3      4      7

I would like to group the rows by column 'a' while replacing values in column 'c' by the mean of values in grouped rows and add another column with std deviation of the values in column 'c' whose mean has been calculated. The values in column 'b' or 'd' are constant for all rows being grouped. So, the desired output would be:

我想按“a”列对行进行分组，同时用分组行中的值的平均值替换“c”列中的值，并添加另一列，其平均值已计算出“c”列中值的标准偏差。'b' 或 'd' 列中的值对于被分组的所有行都是常量。因此，所需的输出将是：

   a      b      c      d      e
0  Apple  3      4.5    7      0.707107
1  Banana 4      4      8      0
2  Cherry 7      1      3      0

What is the best way to achieve this?

实现这一目标的最佳方法是什么？

Answer 1

采纳答案by unutbu

You could use a groupby-aggoperation:

你可以使用一个groupby-agg操作：

In [38]: result = df.groupby(['a'], as_index=False).agg(
                      {'c':['mean','std'],'b':'first', 'd':'first'})

and then rename and reorder the columns:

然后重命名和重新排序列：

In [39]: result.columns = ['a','c','e','b','d']

In [40]: result.reindex(columns=sorted(result.columns))
Out[40]: 
        a  b    c  d         e
0   Apple  3  4.5  7  0.707107
1  Banana  4  4.0  8       NaN
2  Cherry  7  1.0  3       NaN

Pandas computes the sample std by default. To compute the population std:

Pandas 默认计算样本标准差。计算人口标准：

def pop_std(x):
    return x.std(ddof=0)

result = df.groupby(['a'], as_index=False).agg({'c':['mean',pop_std],'b':'first', 'd':'first'})

result.columns = ['a','c','e','b','d']
result.reindex(columns=sorted(result.columns))

yields

产量

        a  b    c  d    e
0   Apple  3  4.5  7  0.5
1  Banana  4  4.0  8  0.0
2  Cherry  7  1.0  3  0.0

Python Groupby Pandas DataFrame 并计算一列的均值和标准差，并将标准差添加为具有 reset_index 的新列

提问by kkhatri99

采纳答案by unutbu

相关推荐

最近更新

标签

Python Groupby Pandas DataFrame 并计算一列的均值和标准差，并将标准差添加为具有 reset_index 的新列

提问by kkhatri99

采纳答案by unutbu

相关推荐

Python 如何垂直显示列表？

Python 如何遍历字母表？

Python 如何在熊猫中测试字符串是否包含列表中的子字符串之一？

Python Pb 将 pandas.Series 列表转换为 pandas.Series 的 numpy 数组

相关推荐

最近更新

标签