在 Pandas DataFrame 中按字典分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41185560/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:38:06  来源:igfitidea点击:

Group by a Dictionary in Pandas DataFrame

pythonpandas

提问by tonyibm

I read data from an excel file, it is like this:

我从一个excel文件中读取数据,是这样的:

enter image description here

在此处输入图片说明

and I want to get the sum of Q1 and Q2 for each employee, so I create a dictionary, and want to group by based on that,

我想得到每个员工的 Q1 和 Q2 的总和,所以我创建了一个字典,并想根据它进行分组,

import pandas as pd

mb2 = pd.read_excel('C:\Users\IBM_ADMIN\Desktop\ml-1m\工作簿1.xlsx', sheetname='Sheet3')
mapping = {'Jan':'Q1','Feb':'Q1','Mar':'Q1','Apr':'Q2','May':'Q2','Jun':'Q2'}
mb2.groupby(by=mapping,axis=1).sum()

enter image description here

在此处输入图片说明

How to show also the field Name? I try to do like this, but it didn't work:

如何同时显示字段名称?我尝试这样做,但没有奏效:

mb2.groupby(by=(mapping,'Name'),axis=1).sum()

采纳答案by Blackecho

A quick solution is to save the resulting DataFramein a new variable (grouped_datain the following code) and then assign the Namecolumn to it using the original DataFrame:

一个快速的解决方案是将结果保存DataFrame在一个新变量中(grouped_data在以下代码中),然后Name使用原始变量将列分配给它DataFrame

grouped_data = mb2.groupby(by=mapping,axis=1).sum()
grouped_data['Name'] = mb2['Name']
print(grouped_data)

EDIT:Another way of achieving the same result is to modify the mappingdictionary, adding a "Name": "Name"pair to it, and calling mb2.groupby(by=mapping,axis=1).sum()as before.

编辑:实现相同结果的另一种方法是修改mapping字典,向其中添加一"Name": "Name"对,然后mb2.groupby(by=mapping,axis=1).sum()像以前一样调用。

回答by jezrael

You can use set_indexfirst, then groupbyand last reset_index:

您可以使用set_indexfirst , thengroupby和 last reset_index

df = mb2.set_index('Name').groupby(by=mapping,axis=1).sum().reset_index()
print (df)
      Name    Q1    Q2
0     Tony  1367  1835
1    Rafal  1405  2574
2     John  1550  1409
3  Katrina  1555  1509
4    Kathy  1583  1491
5      Tom  1344  1874