Python 如何在 Pandas 中使用 groupby 根据另一列中的条件计算百分比/比例总计
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36987829/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use groupby in pandas to calculate a percentage / proportion total based on a criteria in another column
提问by fuzzy_logic_77
I'm trying to work out how to use the groupby
function in pandas to work out the proportions of values per year with a given Yes/No criteria.
我正在尝试找出如何groupby
在 Pandas 中使用该函数根据给定的 Yes/No 标准计算出每年值的比例。
For example, I have a dataframe called names
:
例如,我有一个名为的数据框names
:
Name Number Year Sex Criteria
0 name1 789 1998 Male N
1 name1 688 1999 Male N
2 name1 639 2000 Male N
3 name2 551 1998 Male Y
4 name2 499 1999 Male Y
I can use
我可以用
namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum()
to get:
要得到:
Number
Sex Year Criteria
Male 1998 N 14507
Y 2308
1999 N 14119
Y 2331
and so on. I would like the 'Number Criteria' column to show the % of the total for each gender and year - so instead of N = 14507 and Y = 2308 for 1998 above I'd have N = 86.27% and Y = 13.73%.
等等。我希望“数字标准”列显示每个性别和年份的总数百分比 - 因此,上面的 1998 年的 N = 14507 和 Y = 2308,而不是 N = 86.27% 和 Y = 13.73%。
Can anyone advise how to do this?
谁能建议如何做到这一点?
回答by IanS
This question is a direct extension of the suggested duplicate. Borrowing from the accepted answer, this will work:
这个问题是建议的副本的直接扩展。借用接受的答案,这将起作用:
In [46]: namesgrouped.groupby(level=[0, 1]).apply(lambda g: g / g.sum())
Out[46]:
Number
Sex Year Criteria
Male 1998 N 0.588806
Y 0.411194
1999 N 0.579612
Y 0.420388
2000 N 1.000000
Edit: a transform operation might be faster than apply:
编辑:转换操作可能比应用更快:
namesgrouped / namesgrouped.groupby(level=[0, 1]).transform('sum')