pandas 如何对一列进行熊猫分组操作，但将另一列保留在结果数据框中

Question

提问by Ger

My question is about groupby operation with pandas. I have the following DataFrame :

我的问题是关于Pandas的 groupby 操作。我有以下数据帧：

In [4]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est", "Est", "West", "West"]})

In [5]: df
Out[5]: 
   A   B     C
0  0  PO   Est
1  1  PO   Est
2  2  PA  West
3  3  PA  West

This is what I would like to do : I want to group by column B and do a sum on column A. But at the end, I would like column C to still be in the DataFrame. If I do :

这就是我想要做的：我想按 B 列分组并对 A 列求和。但最后，我希望 C 列仍然在 DataFrame 中。如果我做：

In [8]: df.groupby(by="B").aggregate(pd.np.sum)
Out[8]: 
    A
B    
PA  5
PO  1

It does the job but column C is missing. I can also do this :

它可以完成工作，但缺少 C 列。我也可以这样做：

In [9]: df.groupby(by=["B", "C"]).aggregate(pd.np.sum)
Out[9]: 
         A
B  C      
PA West  5
PO Est   1

or

或者

In [11]: df.groupby(by=["B", "C"], as_index=False).aggregate(pd.np.sum)
Out[11]: 
    B     C  A
0  PA  West  5
1  PO   Est  1

But in both cases it group by B AND C and not just B and keeps the C value. Is what I want to do irrelevant or is there a way to do it ?

但在这两种情况下，它都按 B AND C 而不仅仅是 B 分组并保留 C 值。我想做的事情是无关紧要的还是有办法做到的？

Answer 1

回答by MaxU

try to use DataFrameGroupBy.agg()method with dict of {column -> function}:

尝试使用DataFrameGroupBy.agg()方法dict of {column -> function}：

In [6]: df.groupby('B').agg({'A':'sum', 'C':'first'})
Out[6]:
       C  A
B
PA  West  5
PO   Est  1

From docs:

从文档：

Function to use for aggregating groups. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If passed a dict, the keys must be DataFrame column names.

用于聚合组的函数。如果是函数，则必须在传递 DataFrame 或传递给 DataFrame.apply 时工作。如果传递 dict，则键必须是 DataFrame 列名。

or something like this depending on your goals:

或类似的东西，取决于您的目标：

In [8]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est1", "Est2", "West1", "West2"]})

In [9]: df.groupby('B').agg({'A':'sum', 'C':'first'})
Out[9]:
        C  A
B
PA  West1  5
PO   Est1  1

In [10]: df['sum_A'] = df.groupby('B')['A'].transform('sum')

In [11]: df
Out[11]:
   A   B      C  sum_A
0  0  PO   Est1      1
1  1  PO   Est2      1
2  2  PA  West1      5
3  3  PA  West2      5

pandas 如何对一列进行熊猫分组操作，但将另一列保留在结果数据框中

提问by Ger

回答by MaxU

相关推荐

最近更新

标签

pandas 如何对一列进行熊猫分组操作，但将另一列保留在结果数据框中

提问by Ger

回答by MaxU

相关推荐

Pandas - 使用 datetimeindex 对数据框进行排序

Python TypeError：无法对 <class 'pandas.core.index.Int64Index'> 进行切片停止值索引

带有 pct_change 的 Pandas groupby

抑制来自 python pandas 的 Name dtype 描述

相关推荐

最近更新

标签