Python 如何使用百分比制作熊猫交叉表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21247203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:23:40  来源:igfitidea点击:

How to make a pandas crosstab with percentages?

pythonpandascrosstab

提问by Brian Keegan

Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?

给定具有不同分类变量的数据框,如何返回带有百分比而不是频率的交叉表?

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                   'B' : ['A', 'B', 'C'] * 8,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                   'D' : np.random.randn(24),
                   'E' : np.random.randn(24)})


pd.crosstab(df.A,df.B)


B       A    B    C
A               
one     4    4    4
three   2    2    2
two     2    2    2

Using the margins option in crosstab to compute row and column totals gets us close enough to think that it should be possible using an aggfunc or groupby, but my meager brain can't think it through.

使用交叉表中的边距选项来计算行和列总数让我们足够接近,认为使用 aggfunc 或 groupby 应该是可能的,但我微薄的大脑无法思考。

B       A     B    C
A               
one     .33  .33  .33
three   .33  .33  .33
two     .33  .33  .33

采纳答案by BrenBarn

pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)

Basically you just have the function that does row/row.sum(), and you use applywith axis=1to apply it by row.

基本上,您只有执行 的函数row/row.sum(),然后使用applywithaxis=1逐行应用它。

(If doing this in Python 2, you should use from __future__ import divisionto make sure division always returns a float.)

(如果在 Python 2 中这样做,你应该使用from __future__ import division来确保除法总是返回一个浮点数。)

回答by Andy Hayden

Another option is to use divrather than apply:

另一种选择是使用div而不是 apply:

In [11]: res = pd.crosstab(df.A, df.B)

Divide by the sum over the index:

除以指数的总和:

In [12]: res.sum(axis=1)
Out[12]: 
A
one      12
three     6
two       6
dtype: int64

Similar to above, you need to do something about integer division (I use astype('float')):

与上面类似,您需要对整数除法做一些事情(我使用 astype('float')):

In [13]: res.astype('float').div(res.sum(axis=1), axis=0)
Out[13]: 
B             A         B         C
A                                  
one    0.333333  0.333333  0.333333
three  0.333333  0.333333  0.333333
two    0.333333  0.333333  0.333333

回答by howMuchCheeseIsTooMuchCheese

If you're looking for a percentage of the total, you can divide by the len of the df instead of the row sum:

如果您正在寻找总数的百分比,您可以除以 df 的 len 而不是行总和:

pd.crosstab(df.A, df.B).apply(lambda r: r/len(df), axis=1)

回答by Harry

From Pandas 0.18.1 onwards, there's a normalizeoption:

从 Pandas 0.18.1 开始,有一个normalize选项:

In [1]: pd.crosstab(df.A,df.B, normalize='index')
Out[1]:

B              A           B           C
A           
one     0.333333    0.333333    0.333333
three   0.333333    0.333333    0.333333
two     0.333333    0.333333    0.333333

Where you can normalise across either all, index(rows), or columns.

您可以在allindex(行)或columns.

More details are available in the documentation.

文档中提供更多详细信息。

回答by gabra

We can show it as percentages by multiplying by 100:

我们可以通过乘以百分比来显示它100

pd.crosstab(df.A,df.B, normalize='index')\
    .round(4)*100

B          A      B      C
A                         
one    33.33  33.33  33.33
three  33.33  33.33  33.33
two    33.33  33.33  33.33

Where I've rounded for convenience.

为方便起见,我四舍五入。

回答by Shivam Aranya

Normalizing the index will simply work out. Use parameter, normalize = "index"in pd.crosstab().

规范化索引将很简单。使用参数,normalize = "index"pd.crosstab().