Python 如何使用百分比制作熊猫交叉表?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21247203/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to make a pandas crosstab with percentages?
提问by Brian Keegan
Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?
给定具有不同分类变量的数据框,如何返回带有百分比而不是频率的交叉表?
df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
'B' : ['A', 'B', 'C'] * 8,
'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
'D' : np.random.randn(24),
'E' : np.random.randn(24)})
pd.crosstab(df.A,df.B)
B A B C
A
one 4 4 4
three 2 2 2
two 2 2 2
Using the margins option in crosstab to compute row and column totals gets us close enough to think that it should be possible using an aggfunc or groupby, but my meager brain can't think it through.
使用交叉表中的边距选项来计算行和列总数让我们足够接近,认为使用 aggfunc 或 groupby 应该是可能的,但我微薄的大脑无法思考。
B A B C
A
one .33 .33 .33
three .33 .33 .33
two .33 .33 .33
采纳答案by BrenBarn
pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)
Basically you just have the function that does row/row.sum(), and you use applywith axis=1to apply it by row.
基本上,您只有执行 的函数row/row.sum(),然后使用applywithaxis=1逐行应用它。
(If doing this in Python 2, you should use from __future__ import divisionto make sure division always returns a float.)
(如果在 Python 2 中这样做,你应该使用from __future__ import division来确保除法总是返回一个浮点数。)
回答by Andy Hayden
Another option is to use divrather than apply:
另一种选择是使用div而不是 apply:
In [11]: res = pd.crosstab(df.A, df.B)
Divide by the sum over the index:
除以指数的总和:
In [12]: res.sum(axis=1)
Out[12]:
A
one 12
three 6
two 6
dtype: int64
Similar to above, you need to do something about integer division (I use astype('float')):
与上面类似,您需要对整数除法做一些事情(我使用 astype('float')):
In [13]: res.astype('float').div(res.sum(axis=1), axis=0)
Out[13]:
B A B C
A
one 0.333333 0.333333 0.333333
three 0.333333 0.333333 0.333333
two 0.333333 0.333333 0.333333
回答by howMuchCheeseIsTooMuchCheese
If you're looking for a percentage of the total, you can divide by the len of the df instead of the row sum:
如果您正在寻找总数的百分比,您可以除以 df 的 len 而不是行总和:
pd.crosstab(df.A, df.B).apply(lambda r: r/len(df), axis=1)
回答by Harry
From Pandas 0.18.1 onwards, there's a normalizeoption:
从 Pandas 0.18.1 开始,有一个normalize选项:
In [1]: pd.crosstab(df.A,df.B, normalize='index')
Out[1]:
B A B C
A
one 0.333333 0.333333 0.333333
three 0.333333 0.333333 0.333333
two 0.333333 0.333333 0.333333
Where you can normalise across either all, index(rows), or columns.
您可以在all、index(行)或columns.
More details are available in the documentation.
回答by gabra
We can show it as percentages by multiplying by 100:
我们可以通过乘以百分比来显示它100:
pd.crosstab(df.A,df.B, normalize='index')\
.round(4)*100
B A B C
A
one 33.33 33.33 33.33
three 33.33 33.33 33.33
two 33.33 33.33 33.33
Where I've rounded for convenience.
为方便起见,我四舍五入。
回答by Shivam Aranya
Normalizing the index will simply work out. Use parameter, normalize = "index"in pd.crosstab().
规范化索引将很简单。使用参数,normalize = "index"在pd.crosstab().

