Pandas 计算 groupby 函数中的空值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43321455/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas count null values in a groupby function
提问by Stefan
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
'C' : [np.nan, 'bla2', np.nan, 'bla3', np.nan, np.nan, np.nan, np.nan]})
Output:
输出:
A B C
0 foo one NaN
1 bar one bla2
2 foo two NaN
3 bar three bla3
4 foo two NaN
5 bar two NaN
6 foo one NaN
7 foo three NaN
I would like to use groupby in order to count the number of NaN's for the different combinations of foo.
我想使用 groupby 来计算不同 foo 组合的 NaN 数量。
Expected Output (EDIT):
预期输出(编辑):
A B C D
0 foo one NaN 2
1 bar one bla2 0
2 foo two NaN 2
3 bar three bla3 0
4 foo two NaN 2
5 bar two NaN 1
6 foo one NaN 2
7 foo three NaN 1
Currently I am trying this:
目前我正在尝试这个:
df['count']=df.groupby(['A'])['B'].isnull().transform('sum')
But this is not working...
但这不起作用......
Thank You
谢谢你
回答by jezrael
I think you need groupby
with sum
of NaN
values:
我认为你需要groupby
使用sum
的NaN
值:
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count')
print (df2)
A B count
0 bar one 0
1 bar three 0
2 bar two 1
3 foo one 2
4 foo three 1
5 foo two 2
If need filter first add boolean indexing
:
如果需要过滤器首先添加boolean indexing
:
df = df[df['A'] == 'foo']
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int)
print (df2)
A B
foo one 2
three 1
two 2
Or simplier:
或者更简单:
df = df[df['A'] == 'foo']
df2 = df['B'].value_counts()
print (df2)
one 2
two 2
three 1
Name: B, dtype: int64
EDIT: Solution is very similar, only add transform
:
编辑:解决方案非常相似,只添加transform
:
df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int)
print (df)
A B C D
0 foo one NaN 2
1 bar one bla2 0
2 foo two NaN 2
3 bar three bla3 0
4 foo two NaN 2
5 bar two NaN 1
6 foo one NaN 2
7 foo three NaN 1
Similar solution:
类似的解决方案:
df['D'] = df.C.isnull()
df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int)
print (df)
A B C D
0 foo one NaN 2
1 bar one bla2 0
2 foo two NaN 2
3 bar three bla3 0
4 foo two NaN 2
5 bar two NaN 1
6 foo one NaN 2
7 foo three NaN 1
回答by tagoma
df[df.A == 'foo'].groupby('b').agg({'C': lambda x: x.isnull().sum()})
returns:
返回:
=> C
B
one 2
three 1
two 2