pandas 熊猫分组和过滤
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41523270/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby and filter
提问by iwbabn
I have dataframe:
我有数据框:
df = pd.DataFrame({'ID':[1,1,2,2,3,3],
'YEAR' : [2011,2012,2012,2013,2013,2014],
'V': [0,1,1,0,1,0],
'C':[00,11,22,33,44,55]})
I would like to group by ID, and select the row with V = 0 within each group.
我想按 ID 分组,并在每组中选择 V = 0 的行。
This doesn't seem to work:
这似乎不起作用:
print(df.groupby(['ID']).filter(lambda x: x['V'] == 0))
Got an error:
出现错误:
TypeError: filter function returned a Series, but expected a scalar bool
类型错误:过滤器函数返回一个系列,但需要一个标量布尔值
How can I use filter to achieve the goal? Thank you.
如何使用过滤器来实现目标?谢谢你。
EDIT: The condition on V may vary for each group, e.g., it could be V==0 for ID 1, V==1 for ID 2, and this info can be available through another DF:
编辑:V 上的条件可能因每个组而异,例如,ID 1 可能是 V==0,ID 2 可能是 V==1,并且可以通过另一个 DF 获得此信息:
df = pd.DataFrame({'ID':[1,2,3],
'V': [0,1,0])
So how to do row filtering within each group?
那么如何在每个组内进行行过滤呢?
回答by jezrael
I think groupby
is not necessary, use boolean indexing
only if need all rows where V
is 0
:
我认为groupby
是没有必要的,使用boolean indexing
仅需要其中的所有行V
是0
:
print (df[df.V == 0])
C ID V YEAR
0 0 1 0 2011
3 33 2 0 2013
5 55 3 0 2014
But if need return all groups where is at least one value of column V
equal 0
add any
, because filterneed True
or False
for filtering all rows in group:
但是如果需要返回所有组,其中列的至少一个值V
等于0
add any
,因为过滤器需要True
或False
过滤组中的所有行:
print(df.groupby(['ID']).filter(lambda x: (x['V'] == 0).any()))
C ID V YEAR
0 0 1 0 2011
1 11 1 1 2012
2 22 2 1 2012
3 33 2 0 2013
4 44 3 1 2013
5 55 3 0 2014
Better for testing is change column for groupby
- row with 2012
is filter out because no V==0
:
更好的测试是更改列groupby
- 行2012
被过滤掉,因为没有V==0
:
print(df.groupby(['YEAR']).filter(lambda x: (x['V'] == 0).any()))
C ID V YEAR
0 0 1 0 2011
3 33 2 0 2013
4 44 3 1 2013
5 55 3 0 2014
If performance is important use GroupBy.transform
with boolean indexing
:
如果性能是重要的用途GroupBy.transform
有boolean indexing
:
print(df[(df['V'] == 0).groupby(df['YEAR']).transform('any')])
ID YEAR V C
0 1 2011 0 0
3 2 2013 0 33
4 3 2013 1 44
5 3 2014 0 55
Detail:
详情:
print((df['V'] == 0).groupby(df['YEAR']).transform('any'))
0 True
1 False
2 False
3 True
4 True
5 True
Name: V, dtype: bool