pandas 如何分组和聚合熊猫中的多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51653170/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to group by and aggregate on multiple columns in pandas
提问by Neil
I have following dataframe in pandas
我在Pandas中有以下数据框
ID Balance ATM_drawings Value
1 100 50 345
1 150 33 233
2 100 100 333
2 100 100 234
I want data in that desired format
我想要所需格式的数据
ID Balance_mean Balance_sum ATM_Drawings_mean ATM_drawings_sum
1 75 250 41.5 83
2 200 100 200 100
I am using following command to do it in pandas
我正在使用以下命令在Pandas中执行此操作
df1= df[['Balance','ATM_drawings']].groupby('ID', as_index = False).agg(['mean', 'sum']).reset_index()
But, it does not give what I intended to get.
但是,它并没有给出我想要的。
回答by jpp
You can use a dictionary to specify aggregation functions for each series:
您可以使用字典为每个系列指定聚合函数:
d = {'Balance': ['mean', 'sum'], 'ATM_drawings': ['mean', 'sum']}
res = df.groupby('ID').agg(d)
# flatten MultiIndex columns
res.columns = ['_'.join(col) for col in res.columns.values]
print(res)
Balance_mean Balance_sum ATM_drawings_mean ATM_drawings_sum
ID
1 125 250 41.5 83
2 100 200 100.0 200
Or you can define d
via dict.fromkeys
:
或者您可以d
通过dict.fromkeys
以下方式定义:
d = dict.fromkeys(('Balance', 'ATM_drawings'), ['mean', 'sum'])
回答by Plasma
Not sure how to achieve this using agg
, but you could reuse the `groupby′ object to avoid having to do the operation multiple times, and then use transformations:
不确定如何使用 实现这一点agg
,但您可以重用 `groupby' 对象以避免多次执行操作,然后使用转换:
import pandas as pd
df = pd.DataFrame({
"ID": [1, 1, 2, 2],
"Balance": [100, 150, 100, 100],
"ATM_drawings": [50, 33, 100, 100],
"Value": [345, 233, 333, 234]
})
gb = df.groupby("ID")
df["Balance_mean"] = gb["Balance"].transform("mean")
df["Balance_sum"] = gb["Balance"].transform("sum")
df["ATM_drawings_mean"] = gb["ATM_drawings"].transform("mean")
df["ATM_drawings_sum"] = gb["ATM_drawings"].transform("sum")
print df
Which yields:
其中产生:
ID Balance Balance_mean Balance_sum ATM_drawings ATM_drawings_mean ATM_drawings_sum Value
0 1 100 125 250 50 41.5 83 345
1 1 150 125 250 33 41.5 83 233
2 2 100 100 200 100 100.0 200 333
3 2 100 100 200 100 100.0 200 234