在 Pandas 中聚合多列时如何重置索引

Question

提问by sourav

I have dataframe that I am trying to group by which looks like this

我有我试图分组的数据框，它看起来像这样

Cust_ID Store_ID month lst_buy_dt1  purchase_amt    
 1       20       10     2015-10-07  100
 1       20       10     2015-10-09  200
 1       20       10     2015-10-20  100

I need the maximum of ls_buy_dtand maximum or purchase amount for each cust_ID, Store_IDcombination for each month in a different dataframe. Sample ouput:

我需要的最大的ls_buy_dt和最大或购买金额为每个cust_ID，Store_ID在不同的数据帧每个月组合。示例输出：

Cust_ID Stored_ID month max_lst_buy_dt tot_purchase_amt
 1       20        10      2015-10-20     400

My code is below .

我的代码在下面。

aggregations = {
    'lst_buy_dt1': { # Get the max purchase date across all purchases in a month
    'max_lst_buy_dt': 'max',       
    },
    'purchase_amt': {     # Sum the purchases 
    'tot_purchase': 'sum',   # Find the max, call the result "max_date"
    }
}

grouped_at_Cust=metro_sales.groupby(['cust_id','store_id','month']).agg(aggregations).reset_index()

I am able to get the right aggregations . However the data frame contains an additional index in columns which I am not able to get rid of. Unable to show it, but here is the result from

我能够获得正确的聚合。但是，数据框在列中包含一个我无法删除的附加索引。无法显示，但这是结果

list(grouped_at_Cust.columns.values)

[('cust_id', ''),
('store_id', ''),
('month', ''),
('lst_buy_dt1', 'max_lst_buy_dt'),
('purchase_amt', 'tot_purchase')]

Notice the hierarchy in the last 2 columns. How to get rid of it? I just need the columns max_lst_buy_dtand tot_purchase.

请注意最后 2 列中的层次结构。如何摆脱它？我只需要列max_lst_buy_dt和tot_purchase.

Answer 1

回答by IanS

Edit: based on your comment, you can simply drop the first level of the columns index. For instance with a more complicated aggregation:

编辑：根据您的评论，您可以简单地删除列索引的第一级。例如，使用更复杂的聚合：

aggregations = {
    'lst_buy_dt1': {
        'max_lst_buy_dt': 'max',       
        'min_lst_buy_dt': 'min',       
    },
    'purchase_amt': {
        'tot_purchase': 'sum',
    }
}
grouped_at_Cust = metro_sales.groupby(['cust_id', 'store_id', 'month']).agg(aggregations).reset_index()
grouped_at_Cust.columns = grouped_at_Cust.columns.droplevel(0)

Output:

输出：

             tot_purchase min_lst_buy_dt max_lst_buy_dt
0   cust_id           100     2015-10-07     2015-10-07
1     month           100     2015-10-20     2015-10-20
2  store_id           200     2015-10-09     2015-10-09

Original answer

原答案

I think your aggregationsdictionary is too complicated. If you follow the documentation:

我觉得你的aggregations字典太复杂了。如果您遵循文档：

agg = {
    'lst_buy_dt1': 'max',       
    'purchase_amt': 'sum',
}
metro_sales.groupby(['cust_id','store_id','month']).agg(agg).reset_index()
Out[19]: 
      index  purchase_amt lst_buy_dt1
0   cust_id           100  2015-10-07
1     month           100  2015-10-20
2  store_id           200  2015-10-09

All you need now is to rename the columns of the result:

您现在需要的只是重命名结果的列：

grouped_at_Cust.rename(columns={
    'lst_buy_dt1': 'max_lst_buy_dt', 
    'purchase_amt': 'tot_purchase'
})

在 Pandas 中聚合多列时如何重置索引

提问by sourav

回答by IanS

相关推荐

最近更新

标签

在 Pandas 中聚合多列时如何重置索引

提问by sourav

回答by IanS

相关推荐

pandas 使用df.to_sql将pandas数据帧写入sqlite数据库表时如何设置主键

pandas 如何管理熊猫数据中的单位？

pandas 如何在 Python 中删除数据框的子集？

pandas 在数据框中查找空值的有效方法

相关推荐

最近更新

标签