pandas 如何对一列进行熊猫分组操作,但将另一列保留在结果数据框中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40397067/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to do a pandas groupby operation on one column but keep the other in the resulting dataframe
提问by Ger
My question is about groupby operation with pandas. I have the following DataFrame :
我的问题是关于Pandas的 groupby 操作。我有以下数据帧:
In [4]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est", "Est", "West", "West"]})
In [5]: df
Out[5]:
A B C
0 0 PO Est
1 1 PO Est
2 2 PA West
3 3 PA West
This is what I would like to do : I want to group by column B and do a sum on column A. But at the end, I would like column C to still be in the DataFrame. If I do :
这就是我想要做的:我想按 B 列分组并对 A 列求和。但最后,我希望 C 列仍然在 DataFrame 中。如果我做 :
In [8]: df.groupby(by="B").aggregate(pd.np.sum)
Out[8]:
A
B
PA 5
PO 1
It does the job but column C is missing. I can also do this :
它可以完成工作,但缺少 C 列。我也可以这样做:
In [9]: df.groupby(by=["B", "C"]).aggregate(pd.np.sum)
Out[9]:
A
B C
PA West 5
PO Est 1
or
或者
In [11]: df.groupby(by=["B", "C"], as_index=False).aggregate(pd.np.sum)
Out[11]:
B C A
0 PA West 5
1 PO Est 1
But in both cases it group by B AND C and not just B and keeps the C value. Is what I want to do irrelevant or is there a way to do it ?
但在这两种情况下,它都按 B AND C 而不仅仅是 B 分组并保留 C 值。我想做的事情是无关紧要的还是有办法做到的?
回答by MaxU
try to use DataFrameGroupBy.agg()method with dict of {column -> function}
:
尝试使用DataFrameGroupBy.agg()方法dict of {column -> function}
:
In [6]: df.groupby('B').agg({'A':'sum', 'C':'first'})
Out[6]:
C A
B
PA West 5
PO Est 1
From docs:
从文档:
Function to use for aggregating groups. If a function, must either work when passed a DataFrame or when passed to DataFrame.apply. If passed a dict, the keys must be DataFrame column names.
用于聚合组的函数。如果是函数,则必须在传递 DataFrame 或传递给 DataFrame.apply 时工作。如果传递 dict,则键必须是 DataFrame 列名。
or something like this depending on your goals:
或类似的东西,取决于您的目标:
In [8]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est1", "Est2", "West1", "West2"]})
In [9]: df.groupby('B').agg({'A':'sum', 'C':'first'})
Out[9]:
C A
B
PA West1 5
PO Est1 1
In [10]: df['sum_A'] = df.groupby('B')['A'].transform('sum')
In [11]: df
Out[11]:
A B C sum_A
0 0 PO Est1 1
1 1 PO Est2 1
2 2 PA West1 5
3 3 PA West2 5