Python 如何使用百分比制作熊猫交叉表？

Question

提问by Brian Keegan

Given a dataframe with different categorical variables, how do I return a cross-tabulation with percentages instead of frequencies?

给定具有不同分类变量的数据框，如何返回带有百分比而不是频率的交叉表？

df = pd.DataFrame({'A' : ['one', 'one', 'two', 'three'] * 6,
                   'B' : ['A', 'B', 'C'] * 8,
                   'C' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar'] * 4,
                   'D' : np.random.randn(24),
                   'E' : np.random.randn(24)})


pd.crosstab(df.A,df.B)


B       A    B    C
A               
one     4    4    4
three   2    2    2
two     2    2    2

Using the margins option in crosstab to compute row and column totals gets us close enough to think that it should be possible using an aggfunc or groupby, but my meager brain can't think it through.

使用交叉表中的边距选项来计算行和列总数让我们足够接近，认为使用 aggfunc 或 groupby 应该是可能的，但我微薄的大脑无法思考。

B       A     B    C
A               
one     .33  .33  .33
three   .33  .33  .33
two     .33  .33  .33

Answer 1

采纳答案by BrenBarn

pd.crosstab(df.A, df.B).apply(lambda r: r/r.sum(), axis=1)

Basically you just have the function that does row/row.sum(), and you use applywith axis=1to apply it by row.

基本上，您只有执行的函数row/row.sum()，然后使用applywithaxis=1逐行应用它。

(If doing this in Python 2, you should use from __future__ import divisionto make sure division always returns a float.)

（如果在 Python 2 中这样做，你应该使用from __future__ import division来确保除法总是返回一个浮点数。）

Answer 2

回答by Andy Hayden

Another option is to use divrather than apply:

另一种选择是使用div而不是 apply：

In [11]: res = pd.crosstab(df.A, df.B)

Divide by the sum over the index:

除以指数的总和：

In [12]: res.sum(axis=1)
Out[12]: 
A
one      12
three     6
two       6
dtype: int64

Similar to above, you need to do something about integer division (I use astype('float')):

与上面类似，您需要对整数除法做一些事情（我使用 astype('float')）：

In [13]: res.astype('float').div(res.sum(axis=1), axis=0)
Out[13]: 
B             A         B         C
A                                  
one    0.333333  0.333333  0.333333
three  0.333333  0.333333  0.333333
two    0.333333  0.333333  0.333333

Answer 3

回答by howMuchCheeseIsTooMuchCheese

If you're looking for a percentage of the total, you can divide by the len of the df instead of the row sum:

如果您正在寻找总数的百分比，您可以除以 df 的 len 而不是行总和：

pd.crosstab(df.A, df.B).apply(lambda r: r/len(df), axis=1)

Answer 4

回答by Harry

From Pandas 0.18.1 onwards, there's a normalizeoption:

从 Pandas 0.18.1 开始，有一个normalize选项：

In [1]: pd.crosstab(df.A,df.B, normalize='index')
Out[1]:

B              A           B           C
A           
one     0.333333    0.333333    0.333333
three   0.333333    0.333333    0.333333
two     0.333333    0.333333    0.333333

Where you can normalise across either all, index(rows), or columns.

您可以在all、index（行）或columns.

More details are available in the documentation.

文档中提供了更多详细信息。

Answer 5

回答by gabra

We can show it as percentages by multiplying by 100:

我们可以通过乘以百分比来显示它100：

pd.crosstab(df.A,df.B, normalize='index')\
    .round(4)*100

B          A      B      C
A                         
one    33.33  33.33  33.33
three  33.33  33.33  33.33
two    33.33  33.33  33.33

Where I've rounded for convenience.

为方便起见，我四舍五入。

Answer 6

回答by Shivam Aranya

Normalizing the index will simply work out. Use parameter, normalize = "index"in pd.crosstab().

规范化索引将很简单。使用参数，normalize = "index"在pd.crosstab().

Python 如何使用百分比制作熊猫交叉表？

提问by Brian Keegan

采纳答案by BrenBarn

回答by Andy Hayden

回答by howMuchCheeseIsTooMuchCheese

回答by Harry

回答by gabra

回答by Shivam Aranya

相关推荐

最近更新

标签

Python 如何使用百分比制作熊猫交叉表？

提问by Brian Keegan

采纳答案by BrenBarn

回答by Andy Hayden

回答by howMuchCheeseIsTooMuchCheese

回答by Harry

回答by gabra

回答by Shivam Aranya

相关推荐

Python 不是 JSON 可序列化的

如何以轮格式安装 Python 库？

Python 错误：“导入错误：没有名为 6 的模块”

记录来自 python-requests 模块的所有请求

相关推荐

最近更新

标签