如何根据类别将 Pandas 数据框行转换为列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39635993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:04:18  来源:igfitidea点击:

How to convert pandas dataframe rows into columns, based on category?

pythonpandas

提问by Nandhini Anand

I have a pandas data frame with a category variable and some number variables. Something like this:

我有一个带有类别变量和一些数字变量的Pandas数据框。像这样的东西:

ls = [{'count':5, 'module':'payroll', 'id':2}, {'count': 53, 'module': 'general','id':2}, {'id': 5,'count': 35, 'module': 'tax'}, ]
df = pd.DataFrame.from_dict(ls)

The df looks like this:

df 看起来像这样:

 df
Out[15]: 
   count  id   module
0      5   2  payroll
1     53   2  general
2     35   5      tax

I want convert(transpose is the right word?) the module variables into columns and group by the id. So something like:

我想转换(转置是正确的词?)将模块变量转换为列并按 id 分组。所以像:

   general_count  id  payroll_count  tax_count
0           53.0   2            5.0        NaN
1            NaN   5            NaN       35.0

One approach to this would be to use apply:

一种方法是使用apply:

df['payroll_count'] = df.id.apply(lambda x: df[df.id==x][df.module=='payroll'])

However, this suffers from multiple drawbacks:

但是,这存在多个缺点:

  1. Costly, and takes too much time

  2. Creates artifacts and empty dataframes that need to be cleaned up.

  1. 成本高,需要太多时间

  2. 创建需要清理的工件和空数据帧。

I sense there's a better way to achieve this with pandas groupby, but can't find a way to this same operation more efficiently. Please help.

我觉得使用pandas groupby有更好的方法来实现这一点,但找不到更有效地进行相同操作的方法。请帮忙。

回答by jezrael

You can use groupbyby columns which first create new indexand last column. then need aggreagate some way - I use mean, then convert one column DataFrameto Seriesby DataFrame.squeeze(then is not necessary remove top level of Multiindex in columns) and reshape by unstack. Last add_suffixto column name:

您可以groupby按首先创建 newindex和 last 的列使用column。然后需要以某种方式聚合 - 我使用mean,然后将一列转换DataFrameSeriesby DataFrame.squeeze(然后没有必要删除列中 Multiindex 的顶级)并通过unstack. 最后add_suffix到列名:

df = df.groupby(['id','module']).mean().squeeze().unstack().add_suffix('_count')
print (df)
module  general_count  payroll_count  tax_count
id                                             
2                53.0            5.0        NaN
5                 NaN            NaN       35.0

Another solution with pivot, then need remove Multiindexfrom columns by list comprehension:

另一个解决方案pivot,然后需要Multiindex从列中删除list comprehension

df = df.pivot(index='id', columns='module')
df.columns = ['_'.join((col[1], col[0])) for col in df.columns]
print (df)
    general_count  payroll_count  tax_count
id                                         
2            53.0            5.0        NaN
5             NaN            NaN       35.0

回答by Zero

You could use set_indexand unstack

你可以使用set_indexunstack

In [2]: df.set_index(['id','module'])['count'].unstack().add_suffix('_count').reset_index()
Out[2]:
module  id  general_count  payroll_count  tax_count
0        2           53.0            5.0        NaN
1        5            NaN            NaN       35.0