Python 如何在 Pandas 中使用 groupby 根据另一列中的条件计算百分比/比例总计

Question

提问by fuzzy_logic_77

I'm trying to work out how to use the groupbyfunction in pandas to work out the proportions of values per year with a given Yes/No criteria.

我正在尝试找出如何groupby在 Pandas 中使用该函数根据给定的 Yes/No 标准计算出每年值的比例。

For example, I have a dataframe called names:

例如，我有一个名为的数据框names：

  Name  Number  Year   Sex Criteria
0  name1     789  1998  Male      N
1  name1     688  1999  Male      N
2  name1     639  2000  Male      N
3  name2     551  1998  Male      Y
4  name2     499  1999  Male      Y

I can use

我可以用

namesgrouped = names.groupby(["Sex", "Year", "Criteria"]).sum()

to get:

要得到：

                   Number
Sex    Year      Criteria
Male   1998 N        14507
            Y         2308
       1999 N        14119
            Y         2331

and so on. I would like the 'Number Criteria' column to show the % of the total for each gender and year - so instead of N = 14507 and Y = 2308 for 1998 above I'd have N = 86.27% and Y = 13.73%.

等等。我希望“数字标准”列显示每个性别和年份的总数百分比 - 因此，上面的 1998 年的 N = 14507 和 Y = 2308，而不是 N = 86.27% 和 Y = 13.73%。

Can anyone advise how to do this?

谁能建议如何做到这一点？

Answer 1

回答by IanS

This question is a direct extension of the suggested duplicate. Borrowing from the accepted answer, this will work:

这个问题是建议的副本的直接扩展。借用接受的答案，这将起作用：

In [46]: namesgrouped.groupby(level=[0, 1]).apply(lambda g: g / g.sum())
Out[46]: 
                      Number
Sex  Year Criteria          
Male 1998 N         0.588806
          Y         0.411194
     1999 N         0.579612
          Y         0.420388
     2000 N         1.000000

Edit: a transform operation might be faster than apply:

编辑：转换操作可能比应用更快：

namesgrouped / namesgrouped.groupby(level=[0, 1]).transform('sum')

Python 如何在 Pandas 中使用 groupby 根据另一列中的条件计算百分比/比例总计

提问by fuzzy_logic_77

回答by IanS

相关推荐

最近更新

标签

Python 如何在 Pandas 中使用 groupby 根据另一列中的条件计算百分比/比例总计

提问by fuzzy_logic_77

回答by IanS

相关推荐

如何在python中解析对象的JSON数组

Python 什么时候在 Django 中使用 get、get_queryset、get_context_data？

在python中使用特定列名过滤pandas数据框

Python 如何在 Keras 中返回验证丢失的历史记录

相关推荐

最近更新

标签