Pandas groupby 将非空值计数为百分比

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47170311/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:44:49  来源:igfitidea点击:

Pandas groupby count non-null values as percentage

pythonpandas

提问by J. Paul

Given this dataset, I would like to count missing, NaN, values:

鉴于此数据集,我想计算缺失的 NaN 值:

df = pd.DataFrame({'A' : [1, np.nan, 2 , 55, 6, np.nan, -17, np.nan],
                   'Team' : ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'],
                   'C' : [4, 14, 3 , 8, 8, 7, np.nan, 11],
                   'D' : [np.nan, np.nan, -12 , 12, 12, -12, np.nan, np.nan]})

Specifically I want to count (as a percentage) per group in the 'Team' column. I can get the raw count by this:

具体来说,我想在“团队”列中对每个组进行计数(以百分比形式)。我可以通过以下方式获得原始计数:

df.groupby('Team').count()

This will get the number of nonmissing numbers. What I would like to do is create a percentage, so instead of getting the raw number I would get it as a percentage of the total entries in each group (I don't know the size of the groups which are all uneven). I've tried using .agg(), but I can't seem to get what I want. How can I do this?

这将获得非缺失数字的数量。我想要做的是创建一个百分比,因此不是获取原始数字,而是将其作为每个组中总条目的百分比来获取(我不知道所有不均匀的组的大小)。我试过使用 .agg(),但我似乎无法得到我想要的。我怎样才能做到这一点?

回答by Andy Hayden

You can take the meanof the notnullBoolean DataFrame:

你可以采取mean的的notnull布尔数据框:

In [11]: df.notnull()
Out[11]:
       A      C      D  Team
0   True   True  False  True
1  False   True  False  True
2   True   True   True  True
3   True   True   True  True
4   True   True   True  True
5  False   True   True  True
6   True  False  False  True
7  False   True  False  True

In [12]: df.notnull().mean()
Out[12]:
A       0.625
C       0.875
D       0.500
Team    1.000
dtype: float64

and with the groupby:

并与 groupby:

In [13]: df.groupby("Team").apply(lambda x: x.notnull().mean())
Out[13]:
              A         C    D  Team
Team
one    0.666667  0.666667  0.0   1.0
three  0.500000  1.000000  0.5   1.0
two    0.666667  1.000000  1.0   1.0

It may be faster to do this without an apply using set_indexfirst:

在没有申请的情况下,set_index先使用以下方法可能会更快:

In [14]: df.set_index("Team").notnull().groupby(level=0).mean()
Out[14]:
              A         C    D
Team
one    0.666667  0.666667  0.0
three  0.500000  1.000000  0.5
two    0.666667  1.000000  1.0

回答by YOBEN_S

Base on your own code add div(df.groupby('Team').size(),0)

根据您自己的代码添加 div(df.groupby('Team').size(),0)

df.groupby('Team').count().div(df.groupby('Team').size(),0)
Out[190]: 
              A         C    D
Team                          
one    0.666667  0.666667  0.0
three  0.500000  1.000000  0.5
two    0.666667  1.000000  1.0