pandas 如何分组和聚合熊猫中的多列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51653170/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:52:30  来源:igfitidea点击:

How to group by and aggregate on multiple columns in pandas

pythonpandasdataframeindexingpandas-groupby

提问by Neil

I have following dataframe in pandas

我在Pandas中有以下数据框

 ID     Balance     ATM_drawings    Value
 1      100         50              345 
 1      150         33              233
 2      100         100             333 
 2      100         100             234

I want data in that desired format

我想要所需格式的数据

 ID     Balance_mean    Balance_sum     ATM_Drawings_mean    ATM_drawings_sum 
 1      75              250             41.5                 83 
 2      200             100             200                  100

I am using following command to do it in pandas

我正在使用以下命令在Pandas中执行此操作

 df1= df[['Balance','ATM_drawings']].groupby('ID', as_index = False).agg(['mean', 'sum']).reset_index()

But, it does not give what I intended to get.

但是,它并没有给出我想要的。

回答by jpp

You can use a dictionary to specify aggregation functions for each series:

您可以使用字典为每个系列指定聚合函数:

d = {'Balance': ['mean', 'sum'], 'ATM_drawings': ['mean', 'sum']}
res = df.groupby('ID').agg(d)

# flatten MultiIndex columns
res.columns = ['_'.join(col) for col in res.columns.values]

print(res)

    Balance_mean  Balance_sum  ATM_drawings_mean  ATM_drawings_sum
ID                                                                
1            125          250               41.5                83
2            100          200              100.0               200

Or you can define dvia dict.fromkeys:

或者您可以d通过dict.fromkeys以下方式定义:

d = dict.fromkeys(('Balance', 'ATM_drawings'), ['mean', 'sum'])

回答by Plasma

Not sure how to achieve this using agg, but you could reuse the `groupby′ object to avoid having to do the operation multiple times, and then use transformations:

不确定如何使用 实现这一点agg,但您可以重用 `groupby' 对象以避免多次执行操作,然后使用转换:

import pandas as pd

df = pd.DataFrame({
    "ID": [1, 1, 2, 2],
    "Balance": [100, 150, 100, 100],
    "ATM_drawings": [50, 33, 100, 100],
    "Value": [345, 233, 333, 234]
})

gb = df.groupby("ID")
df["Balance_mean"] = gb["Balance"].transform("mean")
df["Balance_sum"] = gb["Balance"].transform("sum")
df["ATM_drawings_mean"] = gb["ATM_drawings"].transform("mean")
df["ATM_drawings_sum"] = gb["ATM_drawings"].transform("sum")
print df

Which yields:

其中产生:

   ID  Balance  Balance_mean  Balance_sum  ATM_drawings  ATM_drawings_mean  ATM_drawings_sum  Value
0   1      100           125          250            50               41.5                83    345
1   1      150           125          250            33               41.5                83    233
2   2      100           100          200           100              100.0               200    333
3   2      100           100          200           100              100.0               200    234