Pandas 的数据透视表或分组依据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30679467/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:26:47  来源:igfitidea点击:

Pivot Tables or Group By for Pandas?

pythonpandascountgroup-bypivot-table

提问by SteelyDanish

I have a hopefully straightforward question that has been giving me a lot of difficulty for the last 3 hours. It should be easy.

我有一个希望直截了当的问题,在过去的 3 个小时里一直给我带来很多困难。这应该很容易。

Here's the challenge.

这就是挑战。

I have a pandas dataframe:

我有一个Pandas数据框:

+--------------------------+
|     Col 'X'    Col 'Y'  |
+--------------------------+
|     class 1      cat 1  |
|     class 2      cat 1  |
|     class 3      cat 2  |
|     class 2      cat 3  |
+--------------------------+

What I am looking to transform the dataframe into:

我希望将数据框转换为:

+------------------------------------------+
|                  cat 1    cat 2    cat 3 |
+------------------------------------------+
|     class 1         1        0        0  |
|     class 2         1        0        1  |
|     class 3         0        1        0  |
+------------------------------------------+

Where the values are value counts. Anybody have any insight? Thanks!

其中值是值计数。有人有任何见解吗?谢谢!

回答by Zero

Here are couple of ways to reshape your data df

以下是重塑数据的几种方法 df

In [27]: df
Out[27]:
     Col X  Col Y
0  class 1  cat 1
1  class 2  cat 1
2  class 3  cat 2
3  class 2  cat 3

1)Using pd.crosstab()

1)使用pd.crosstab()

In [28]: pd.crosstab(df['Col X'], df['Col Y'])
Out[28]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

2)Or, use groupbyon 'Col X','Col Y'with unstackover Col Y, then fill NaNswith zeros.

2)或者,使用groupbyon'Col X','Col Y'unstackover Col Y,然后NaNs用零填充。

In [29]: df.groupby(['Col X','Col Y']).size().unstack('Col Y', fill_value=0)
Out[29]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

3)Or, use pd.pivot_table()with index=Col X, columns=Col Y

3)或者,pd.pivot_table()index=Col X, 一起使用columns=Col Y

In [30]: pd.pivot_table(df, index=['Col X'], columns=['Col Y'], aggfunc=len, fill_value=0)
Out[30]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0

4)Or, use set_indexwith unstack

4)或者,set_indexunstack

In [492]: df.assign(v=1).set_index(['Col X', 'Col Y'])['v'].unstack(fill_value=0)
Out[492]:
Col Y    cat 1  cat 2  cat 3
Col X
class 1      1      0      0
class 2      1      0      1
class 3      0      1      0