如何根据类别将 Pandas 数据框行转换为列？

Question

提问by Nandhini Anand

I have a pandas data frame with a category variable and some number variables. Something like this:

我有一个带有类别变量和一些数字变量的Pandas数据框。像这样的东西：

ls = [{'count':5, 'module':'payroll', 'id':2}, {'count': 53, 'module': 'general','id':2}, {'id': 5,'count': 35, 'module': 'tax'}, ]
df = pd.DataFrame.from_dict(ls)

The df looks like this:

df 看起来像这样：

 df
Out[15]: 
   count  id   module
0      5   2  payroll
1     53   2  general
2     35   5      tax

I want convert(transpose is the right word?) the module variables into columns and group by the id. So something like:

我想转换（转置是正确的词？）将模块变量转换为列并按 id 分组。所以像：

   general_count  id  payroll_count  tax_count
0           53.0   2            5.0        NaN
1            NaN   5            NaN       35.0

One approach to this would be to use apply:

一种方法是使用apply：

df['payroll_count'] = df.id.apply(lambda x: df[df.id==x][df.module=='payroll'])

However, this suffers from multiple drawbacks:

但是，这存在多个缺点：

Costly, and takes too much time
Creates artifacts and empty dataframes that need to be cleaned up.

成本高，需要太多时间
创建需要清理的工件和空数据帧。

I sense there's a better way to achieve this with pandas groupby, but can't find a way to this same operation more efficiently. Please help.

我觉得使用pandas groupby有更好的方法来实现这一点，但找不到更有效地进行相同操作的方法。请帮忙。

Answer 1

回答by jezrael

You can use groupbyby columns which first create new indexand last column. then need aggreagate some way - I use mean, then convert one column DataFrameto Seriesby DataFrame.squeeze(then is not necessary remove top level of Multiindex in columns) and reshape by unstack. Last add_suffixto column name:

您可以groupby按首先创建 newindex和 last 的列使用column。然后需要以某种方式聚合 - 我使用mean，然后将一列转换DataFrame为Seriesby DataFrame.squeeze（然后没有必要删除列中 Multiindex 的顶级）并通过unstack. 最后add_suffix到列名：

df = df.groupby(['id','module']).mean().squeeze().unstack().add_suffix('_count')
print (df)
module  general_count  payroll_count  tax_count
id                                             
2                53.0            5.0        NaN
5                 NaN            NaN       35.0

Another solution with pivot, then need remove Multiindexfrom columns by list comprehension:

另一个解决方案pivot，然后需要Multiindex从列中删除list comprehension：

df = df.pivot(index='id', columns='module')
df.columns = ['_'.join((col[1], col[0])) for col in df.columns]
print (df)
    general_count  payroll_count  tax_count
id                                         
2            53.0            5.0        NaN
5             NaN            NaN       35.0

Answer 2

回答by Zero

You could use set_indexand unstack

你可以使用set_index和unstack

In [2]: df.set_index(['id','module'])['count'].unstack().add_suffix('_count').reset_index()
Out[2]:
module  id  general_count  payroll_count  tax_count
0        2           53.0            5.0        NaN
1        5            NaN            NaN       35.0

如何根据类别将 Pandas 数据框行转换为列？

提问by Nandhini Anand

回答by jezrael

回答by Zero

相关推荐

最近更新

标签

如何根据类别将 Pandas 数据框行转换为列？

提问by Nandhini Anand

回答by jezrael

回答by Zero

相关推荐

许多数据帧上的高效 Python Pandas Stock Beta 计算

用 Pandas 上的值注释条形图（在 Seaborn factorplot 条形图上）

pandas 熊猫排序 lambda 函数

pandas 我可以在 Android 上运行 Numpy（或其他 Python 包）吗？

相关推荐

最近更新

标签