pandas 熊猫将列转换为总数的百分比
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42006346/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas convert columns to percentages of the totals
提问by DTATSO
I have a dataframe with 4 columns an ID and three categories that results fell into
我有一个包含 4 列 ID 和三个结果类别的数据框
<80% 80-90 >90
id
1 2 4 4
2 3 6 1
3 7 0 3
I would like to convert it to percentages ie:
我想将其转换为百分比,即:
<80% 80-90 >90
id
1 20% 40% 40%
2 30% 60% 10%
3 70% 0% 30%
this seems like it should be within pandas capabilities but I just can't figure it out.
这似乎应该在Pandas的能力范围内,但我无法弄清楚。
Thanks in advance!
提前致谢!
回答by ASGM
You can do this using basic pandas operators .div
and .sum
, using the axis
argument to make sure the calculations happen the way you want:
您可以使用基本的 Pandas 运算符.div
和来执行此操作,并.sum
使用axis
参数确保计算按您想要的方式进行:
cols = ['<80%', '80-90', '>90']
df[cols] = df[cols].div(df[cols].sum(axis=1), axis=0).multiply(100)
- Calculate the sum of each column (
df[cols].sum(axis=1
).axis=1
makes the summation occur across the rows, rather than down the columns. - Divide the dataframe by the resulting series (
df[cols].div(df[cols].sum(axis=1), axis=0
).axis=0
makes the division happen across the columns. - To finish, multiply the results by
100
so they are percentages between 0 and 100 instead of proportions between 0 and 1 (or you can skip this step and store them as proportions).
- 计算每列的总和 (
df[cols].sum(axis=1
)。axis=1
使求和发生在行中,而不是在列中。 - 将数据帧除以结果系列 (
df[cols].div(df[cols].sum(axis=1), axis=0
)。axis=0
使划分发生在列之间。 - 最后,将结果乘以
100
0 到 100 之间的百分比,而不是 0 到 1 之间的比例(或者您可以跳过此步骤并将它们存储为比例)。
回答by Tim Tian
df/df.sum()
If you want to divide the sum of rows, transpose it first.
如果要除以行的总和,请先将其转置。
回答by FDV
Tim Tian's answer pretty much worked for me, but maybe this helps if you have a df with several columns and want to do a % column wise.
Tim Tian 的回答对我来说非常有用,但是如果您有一个包含多个列的 df 并且想要明智地执行 % 列,这可能会有所帮助。
df_pct = df/df[df.columns].sum()*100
I was having trouble because I wanted to have the result of a pd.pivot_table expressed as a %, but couldn't get it to work. So I just used that code on the resulting table itself and it worked.
我遇到了麻烦,因为我想将 pd.pivot_table 的结果表示为 %,但无法使其正常工作。所以我只是在结果表本身上使用了该代码并且它起作用了。