在 Pandas 中聚合多列时如何重置索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39568965/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to reset indexes when aggregating multiple columns in pandas
提问by sourav
I have dataframe that I am trying to group by which looks like this
我有我试图分组的数据框,它看起来像这样
Cust_ID Store_ID month lst_buy_dt1 purchase_amt
1 20 10 2015-10-07 100
1 20 10 2015-10-09 200
1 20 10 2015-10-20 100
I need the maximum of ls_buy_dt
and maximum or purchase amount for each cust_ID
, Store_ID
combination for each month in a different dataframe. Sample ouput:
我需要的最大的ls_buy_dt
和最大或购买金额为每个cust_ID
,Store_ID
在不同的数据帧每个月组合。示例输出:
Cust_ID Stored_ID month max_lst_buy_dt tot_purchase_amt
1 20 10 2015-10-20 400
My code is below .
我的代码在下面。
aggregations = {
'lst_buy_dt1': { # Get the max purchase date across all purchases in a month
'max_lst_buy_dt': 'max',
},
'purchase_amt': { # Sum the purchases
'tot_purchase': 'sum', # Find the max, call the result "max_date"
}
}
grouped_at_Cust=metro_sales.groupby(['cust_id','store_id','month']).agg(aggregations).reset_index()
I am able to get the right aggregations . However the data frame contains an additional index in columns which I am not able to get rid of. Unable to show it, but here is the result from
我能够获得正确的聚合。但是,数据框在列中包含一个我无法删除的附加索引。无法显示,但这是结果
list(grouped_at_Cust.columns.values)
[('cust_id', ''),
('store_id', ''),
('month', ''),
('lst_buy_dt1', 'max_lst_buy_dt'),
('purchase_amt', 'tot_purchase')]
Notice the hierarchy in the last 2 columns. How to get rid of it? I just need the columns max_lst_buy_dt
and tot_purchase
.
请注意最后 2 列中的层次结构。如何摆脱它?我只需要列max_lst_buy_dt
和tot_purchase
.
回答by IanS
Edit: based on your comment, you can simply drop the first level of the columns index. For instance with a more complicated aggregation:
编辑:根据您的评论,您可以简单地删除列索引的第一级。例如,使用更复杂的聚合:
aggregations = {
'lst_buy_dt1': {
'max_lst_buy_dt': 'max',
'min_lst_buy_dt': 'min',
},
'purchase_amt': {
'tot_purchase': 'sum',
}
}
grouped_at_Cust = metro_sales.groupby(['cust_id', 'store_id', 'month']).agg(aggregations).reset_index()
grouped_at_Cust.columns = grouped_at_Cust.columns.droplevel(0)
Output:
输出:
tot_purchase min_lst_buy_dt max_lst_buy_dt
0 cust_id 100 2015-10-07 2015-10-07
1 month 100 2015-10-20 2015-10-20
2 store_id 200 2015-10-09 2015-10-09
Original answer
原答案
I think your aggregations
dictionary is too complicated. If you follow the documentation:
我觉得你的aggregations
字典太复杂了。如果您遵循文档:
agg = {
'lst_buy_dt1': 'max',
'purchase_amt': 'sum',
}
metro_sales.groupby(['cust_id','store_id','month']).agg(agg).reset_index()
Out[19]:
index purchase_amt lst_buy_dt1
0 cust_id 100 2015-10-07
1 month 100 2015-10-20
2 store_id 200 2015-10-09
All you need now is to rename the columns of the result:
您现在需要的只是重命名结果的列:
grouped_at_Cust.rename(columns={
'lst_buy_dt1': 'max_lst_buy_dt',
'purchase_amt': 'tot_purchase'
})