在 Pandas 中聚合多列时如何重置索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39568965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:02:20  来源:igfitidea点击:

How to reset indexes when aggregating multiple columns in pandas

pandasgroup-byaggregate-functions

提问by sourav

I have dataframe that I am trying to group by which looks like this

我有我试图分组的数据框,它看起来像这样

Cust_ID Store_ID month lst_buy_dt1  purchase_amt    
 1       20       10     2015-10-07  100
 1       20       10     2015-10-09  200
 1       20       10     2015-10-20  100

I need the maximum of ls_buy_dtand maximum or purchase amount for each cust_ID, Store_IDcombination for each month in a different dataframe. Sample ouput:

我需要的最大的ls_buy_dt和最大或购买金额为每个cust_IDStore_ID在不同的数据帧每个月组合。示例输出:

Cust_ID Stored_ID month max_lst_buy_dt tot_purchase_amt
 1       20        10      2015-10-20     400

My code is below .

我的代码在下面。

aggregations = {
    'lst_buy_dt1': { # Get the max purchase date across all purchases in a month
    'max_lst_buy_dt': 'max',       
    },
    'purchase_amt': {     # Sum the purchases 
    'tot_purchase': 'sum',   # Find the max, call the result "max_date"
    }
}

grouped_at_Cust=metro_sales.groupby(['cust_id','store_id','month']).agg(aggregations).reset_index()

I am able to get the right aggregations . However the data frame contains an additional index in columns which I am not able to get rid of. Unable to show it, but here is the result from

我能够获得正确的聚合。但是,数据框在列中包含一个我无法删除的附加索引。无法显示,但这是结果

list(grouped_at_Cust.columns.values)

[('cust_id', ''),
('store_id', ''),
('month', ''),
('lst_buy_dt1', 'max_lst_buy_dt'),
('purchase_amt', 'tot_purchase')]

Notice the hierarchy in the last 2 columns. How to get rid of it? I just need the columns max_lst_buy_dtand tot_purchase.

请注意最后 2 列中的层次结构。如何摆脱它?我只需要列max_lst_buy_dttot_purchase.

回答by IanS

Edit: based on your comment, you can simply drop the first level of the columns index. For instance with a more complicated aggregation:

编辑:根据您的评论,您可以简单地删除列索引的第一级。例如,使用更复杂的聚合:

aggregations = {
    'lst_buy_dt1': {
        'max_lst_buy_dt': 'max',       
        'min_lst_buy_dt': 'min',       
    },
    'purchase_amt': {
        'tot_purchase': 'sum',
    }
}
grouped_at_Cust = metro_sales.groupby(['cust_id', 'store_id', 'month']).agg(aggregations).reset_index()
grouped_at_Cust.columns = grouped_at_Cust.columns.droplevel(0)

Output:

输出:

             tot_purchase min_lst_buy_dt max_lst_buy_dt
0   cust_id           100     2015-10-07     2015-10-07
1     month           100     2015-10-20     2015-10-20
2  store_id           200     2015-10-09     2015-10-09


Original answer

原答案

I think your aggregationsdictionary is too complicated. If you follow the documentation:

我觉得你的aggregations字典太复杂了。如果您遵循文档

agg = {
    'lst_buy_dt1': 'max',       
    'purchase_amt': 'sum',
}
metro_sales.groupby(['cust_id','store_id','month']).agg(agg).reset_index()
Out[19]: 
      index  purchase_amt lst_buy_dt1
0   cust_id           100  2015-10-07
1     month           100  2015-10-20
2  store_id           200  2015-10-09

All you need now is to rename the columns of the result:

您现在需要的只是重命名结果的列:

grouped_at_Cust.rename(columns={
    'lst_buy_dt1': 'max_lst_buy_dt', 
    'purchase_amt': 'tot_purchase'
})