pandas 熊猫分组和过滤

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41523270/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:44:36  来源:igfitidea点击:

Pandas groupby and filter

pythonpandasindexinggroup-byconditional-statements

提问by iwbabn

I have dataframe:

我有数据框:

df = pd.DataFrame({'ID':[1,1,2,2,3,3], 
                   'YEAR' : [2011,2012,2012,2013,2013,2014], 
                   'V': [0,1,1,0,1,0],
                   'C':[00,11,22,33,44,55]})

I would like to group by ID, and select the row with V = 0 within each group.

我想按 ID 分组,并在每组中选择 V = 0 的行。

This doesn't seem to work:

这似乎不起作用:

print(df.groupby(['ID']).filter(lambda x: x['V'] == 0)) 

Got an error:

出现错误:

TypeError: filter function returned a Series, but expected a scalar bool

类型错误:过滤器函数返回一个系列,但需要一个标量布尔值

How can I use filter to achieve the goal? Thank you.

如何使用过滤器来实现目标?谢谢你。

EDIT: The condition on V may vary for each group, e.g., it could be V==0 for ID 1, V==1 for ID 2, and this info can be available through another DF:

编辑:V 上的条件可能因每个组而异,例如,ID 1 可能是 V==0,ID 2 可能是 V==1,并且可以通过另一个 DF 获得此信息:

df = pd.DataFrame({'ID':[1,2,3], 
                   'V': [0,1,0])

So how to do row filtering within each group?

那么如何在每个组内进行行过滤呢?

回答by jezrael

I think groupbyis not necessary, use boolean indexingonly if need all rows where Vis 0:

我认为groupby是没有必要的,使用boolean indexing仅需要其中的所有行V0

print (df[df.V == 0])
    C  ID  V  YEAR
0   0   1  0  2011
3  33   2  0  2013
5  55   3  0  2014

But if need return all groups where is at least one value of column Vequal 0add any, because filterneed Trueor Falsefor filtering all rows in group:

但是如果需要返回所有组,其中列的至少一个值V等于0add any,因为过滤器需要TrueFalse过滤组中的所有行:

print(df.groupby(['ID']).filter(lambda x: (x['V'] == 0).any())) 
    C  ID  V  YEAR
0   0   1  0  2011
1  11   1  1  2012
2  22   2  1  2012
3  33   2  0  2013
4  44   3  1  2013
5  55   3  0  2014

Better for testing is change column for groupby- row with 2012is filter out because no V==0:

更好的测试是更改列groupby- 行2012被过滤掉,因为没有V==0

print(df.groupby(['YEAR']).filter(lambda x: (x['V'] == 0).any())) 
    C  ID  V  YEAR
0   0   1  0  2011
3  33   2  0  2013
4  44   3  1  2013
5  55   3  0  2014

If performance is important use GroupBy.transformwith boolean indexing:

如果性能是重要的用途GroupBy.transformboolean indexing

print(df[(df['V'] == 0).groupby(df['YEAR']).transform('any')]) 
   ID  YEAR  V   C
0   1  2011  0   0
3   2  2013  0  33
4   3  2013  1  44
5   3  2014  0  55

Detail:

详情

print((df['V'] == 0).groupby(df['YEAR']).transform('any')) 
0     True
1    False
2    False
3     True
4     True
5     True
Name: V, dtype: bool