Pandas, groupby 列值大于 x
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29632784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas, groupby where column value is greater than x
提问by cyberbemon
I have a table like this
我有一张这样的桌子
timestamp avg_hr hr_quality avg_rr rr_quality activity sleep_summary_id
1422404668 66 229 0 0 13 78
1422404670 64 223 0 0 20 78
1422404672 64 216 0 0 11 78
1422404674 66 198 0 40 9 78
1422404676 65 184 0 30 3 78
1422404678 64 173 0 10 17 78
1422404680 66 199 0 20 118 78
I'm trying to group the data by timestamp,sleep idand rr_quality, where rr_qualityis > 0
我试图通过组数据timestamp,sleep id并且rr_quality,这里rr_quality是> 0
I've tried the following and none of them seems to work
我尝试了以下方法,但似乎都不起作用
df3 = df2.groupby([df2.index.hour,'sleep_summary_id',df2['rr_quality']>0])
df3 = df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'>0])
df3 = df2.groupby([df2.index.hour,'sleep_summary_id',['rr_quality']>0])
All of them returns a keyerror.
他们都返回一个keyerror。
EDIT:
编辑:
Also can't seem to be able to pass more than one filter at a time. I tried the following:
似乎也不能一次通过多个过滤器。我尝试了以下方法:
df2[df2['rr_quality'] >= 150, df2['hr_quality'] > 200]
df2[df2['rr_quality'] >= 150, ['hr_quality'] > 200]
df2[[df2['rr_quality'] >= 150, ['hr_quality'] > 200]]
returns: TypeError: 'Series' objects are mutable, thus they cannot be hashed
返回: TypeError: 'Series' objects are mutable, thus they cannot be hashed
采纳答案by EdChum
the simplest thing to do here is to filter the df first and then perform the groupby:
这里要做的最简单的事情是先过滤 df,然后执行 groupby:
df2[df2['rr_quality'] > 0].groupby([df2.index.hour,'sleep_summary_id'])
EDIT
编辑
If you're intending to assign this back to your original df:
如果您打算将其分配回原始 df:
df2.loc[df2['rr_quality'] > 0, 'AVG_HR'] = df2[df2['rr_quality'] >= 150].groupby([df2.index.hour,'emfit_sleep_summary_id'])['avg_hr'].transform('mea??n')
The loccall will mask the lhs so that the result of the transform aligns correctly
该loc调用将屏蔽 lhs,以便转换的结果正确对齐
To filter using multiple conditions you need to use the array comparision operators &, |and ~for and, orand notrespectively, additionally you need to wrap the conditions in parentheses due to operator precedence:
要使用多个条件需要使用阵列对比运算符滤波器&,|和~对and,or和not分别附加地需要包装在由于操作者优先括号中的条件:
df2[(df2['rr_quality'] >= 150) & (df2['hr_quality'] > 200)]
回答by Czarking
I know this is old but I wanted to add that there is an official functionto do exactly this. Transforming the example from pandas to your case:
我知道这是旧的,但我想补充一点,有一个官方功能可以做到这一点。将示例从Pandas转换为您的案例:
grouped_df2= df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'])
grouped_df2.filter(lambda x: x['rr_quality'] > 0.)

