Python 一次用于多列的 Pandas 数据透视表

Question

提问by Grr

Let's say I have a DataFrame:

假设我有一个 DataFrame：

   nj  ptype  wd  wpt
0   2      1   2    1
1   3      2   1    2
2   1      1   3    1
3   2      2   3    3
4   3      1   2    2

I would like to aggregate this data using ptypeas the index like so:

我想像这样使用ptype索引来聚合这些数据：

             nj             wd            wpt
       1.0  2.0  3.0  1.0  2.0  3.0  1.0  2.0  3.0
ptype    
    1    1    1    1    0    2    1    2    1    0
    2    0    1    1    1    0    1    0    1    1

You could build each one of the top level columns for the final value by creating a pivot table with aggfunc='count'and then concatenating them all, like so:

您可以通过创建一个数据透视表，aggfunc='count'然后将它们全部连接起来，为最终值构建每一列顶级列，如下所示：

nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
print(out)
        nj             wd            wpt
         1    2    3    1    2    3    1    2    3
ptype
1      1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
2      0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0

But I really dislike this and it feels wrong. I would like to know if there is a way to do this in a simpler fashion preferably with a builtin method. Thanks in advance!

但我真的很不喜欢这个，感觉不对。我想知道是否有办法以更简单的方式做到这一点，最好使用内置方法。提前致谢！

Answer 1

回答by Psidom

Instead of doing it in one step, you can do the aggregation firstly and then pivotit using unstackmethod:

您可以先进行聚合，然后pivot使用unstack方法进行聚合，而不是一步完成：

(df.set_index('ptype')
 .groupby(level='ptype')
# to do the count of columns nj, wd, wpt against the column ptype using 
# groupby + value_counts
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))

#      nj             wd            wpt
#       1    2    3    1    2    3    1    2    3
#ptype                                  
#1    1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
#2    0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0

Another option to avoid using applymethod:

避免使用apply方法的另一种选择：

(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))

Naive Timingon the sample data:

样本数据的朴素计时：

Original solution:

原解决方案：

%%timeit
nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
# 100 loops, best of 3: 12 ms per loop

Option one:

选项一：

%%timeit
(df.set_index('ptype')
 .groupby(level='ptype')
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))
# 100 loops, best of 3: 10.1 ms per loop

Option two:

选项二：

%%timeit 
(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))
# 100 loops, best of 3: 4.3 ms per loop

Answer 2

回答by Allen

Another solution using groupby and unstack.

另一个使用 groupby 和 unstack 的解决方案。

df2 = pd.concat([df.groupby(['ptype',e])[e].count().unstack() for e in ['nj','wd','wpt']],axis=1).fillna(0).astype(int)    
df2.columns=pd.MultiIndex.from_product([['nj','wd','wpt'],[1.0,2.0,3.0]])

df2
Out[207]: 
       nj          wd         wpt        
      1.0 2.0 3.0 1.0 2.0 3.0 1.0 2.0 3.0
ptype                                    
1       1   1   1   0   2   1   2   1   0
2       0   1   1   1   0   1   0   1   1

Answer 3

回答by Himanshu Aggarwal

An easier solution is

一个更简单的解决方案是

employee.pivot_table(index= ‘Title', values= “Salary”, aggfunc= [np.mean, np.median, min, max, np.std], fill_value=0)

In this case, for the salary column we are using different aggregate functions

在这种情况下，对于工资列，我们使用了不同的聚合函数

Python 一次用于多列的 Pandas 数据透视表

提问by Grr

回答by Psidom

回答by Allen

回答by Himanshu Aggarwal

相关推荐

最近更新

标签

Python 一次用于多列的 Pandas 数据透视表

提问by Grr

回答by Psidom

回答by Allen

回答by Himanshu Aggarwal

相关推荐

Python Tensorflow Assign 需要两个张量的形状匹配。lhs 形状= [20] rhs 形状= [48]

Python 导入 pandas.plotting 的问题

在 Python 中将列表转换为字符串

Python 什么相当于 Matlab 元胞数组？

相关推荐

最近更新

标签