Pandas 计算 groupby 函数中的空值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43321455/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:22:06  来源:igfitidea点击:

Pandas count null values in a groupby function

pythonpandas

提问by Stefan

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
               'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
               'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]})

Output:

输出:

     A      B     C
0  foo    one   NaN
1  bar    one  bla2
2  foo    two   NaN
3  bar  three  bla3
4  foo    two   NaN
5  bar    two   NaN
6  foo    one   NaN
7  foo  three   NaN

I would like to use groupby in order to count the number of NaN's for the different combinations of foo.

我想使用 groupby 来计算不同 foo 组合的 NaN 数量。

Expected Output (EDIT):

预期输出(编辑):

     A      B     C    D
0  foo    one   NaN    2
1  bar    one  bla2    0
2  foo    two   NaN    2
3  bar  three  bla3    0
4  foo    two   NaN    2
5  bar    two   NaN    1
6  foo    one   NaN    2
7  foo  three   NaN    1

Currently I am trying this:

目前我正在尝试这个:

df['count']=df.groupby(['A'])['B'].isnull().transform('sum')

But this is not working...

但这不起作用......

Thank You

谢谢你

回答by jezrael

I think you need groupbywith sumof NaNvalues:

我认为你需要groupby使用sumNaN值:

df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count')
print (df2)
     A      B  count
0  bar    one      0
1  bar  three      0
2  bar    two      1
3  foo    one      2
4  foo  three      1
5  foo    two      2

If need filter first add boolean indexing:

如果需要过滤器首先添加boolean indexing

df = df[df['A'] == 'foo']
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int)
print (df2)
A    B    
foo  one      2
     three    1
     two      2

Or simplier:

或者更简单:

df = df[df['A'] == 'foo']
df2 = df['B'].value_counts()
print (df2)
one      2
two      2
three    1
Name: B, dtype: int64

EDIT: Solution is very similar, only add transform:

编辑:解决方案非常相似,只添加transform

df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int)
print (df)
     A      B     C  D
0  foo    one   NaN  2
1  bar    one  bla2  0
2  foo    two   NaN  2
3  bar  three  bla3  0
4  foo    two   NaN  2
5  bar    two   NaN  1
6  foo    one   NaN  2
7  foo  three   NaN  1

Similar solution:

类似的解决方案:

df['D'] = df.C.isnull()
df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int)
print (df)
     A      B     C  D
0  foo    one   NaN  2
1  bar    one  bla2  0
2  foo    two   NaN  2
3  bar  three  bla3  0
4  foo    two   NaN  2
5  bar    two   NaN  1
6  foo    one   NaN  2
7  foo  three   NaN  1

回答by tagoma

df[df.A == 'foo'].groupby('b').agg({'C': lambda x: x.isnull().sum()})

returns:

返回:

=>        C
B       
one    2
three  1
two    2