Pandas, groupby 列值大于 x

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29632784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:11:57  来源:igfitidea点击:

Pandas, groupby where column value is greater than x

pythonpandas

提问by cyberbemon

I have a table like this

我有一张这样的桌子

    timestamp   avg_hr  hr_quality  avg_rr  rr_quality  activity    sleep_summary_id

    1422404668  66      229             0       0           13              78
    1422404670  64      223             0       0           20              78
    1422404672  64      216             0       0           11              78
    1422404674  66      198             0       40          9               78
    1422404676  65      184             0       30          3               78
    1422404678  64      173             0       10          17              78
    1422404680  66      199             0       20          118             78

I'm trying to group the data by timestamp,sleep idand rr_quality, where rr_qualityis > 0

我试图通过组数据timestampsleep id并且rr_quality,这里rr_quality> 0

I've tried the following and none of them seems to work

我尝试了以下方法,但似乎都不起作用

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id',df2['rr_quality']>0])

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'>0])

 df3 = df2.groupby([df2.index.hour,'sleep_summary_id',['rr_quality']>0])

All of them returns a keyerror.

他们都返回一个keyerror。

EDIT:

编辑:

Also can't seem to be able to pass more than one filter at a time. I tried the following:

似乎也不能一次通过多个过滤器。我尝试了以下方法:

df2[df2['rr_quality'] >= 150, df2['hr_quality'] > 200]
df2[df2['rr_quality'] >= 150, ['hr_quality'] > 200]
df2[[df2['rr_quality'] >= 150, ['hr_quality'] > 200]]

returns: TypeError: 'Series' objects are mutable, thus they cannot be hashed

返回: TypeError: 'Series' objects are mutable, thus they cannot be hashed

采纳答案by EdChum

the simplest thing to do here is to filter the df first and then perform the groupby:

这里要做的最简单的事情是先过滤 df,然后执行 groupby:

df2[df2['rr_quality'] > 0].groupby([df2.index.hour,'sleep_summary_id'])

EDIT

编辑

If you're intending to assign this back to your original df:

如果您打算将其分配回原始 df:

df2.loc[df2['rr_quality'] > 0, 'AVG_HR'] = df2[df2['rr_quality'] >= 150].groupby([df2.index.hour,'emfit_sleep_summary_id'])['avg_hr'].transform('mea??n')

The loccall will mask the lhs so that the result of the transform aligns correctly

loc调用将屏蔽 lhs,以便转换的结果正确对齐

To filter using multiple conditions you need to use the array comparision operators &, |and ~for and, orand notrespectively, additionally you need to wrap the conditions in parentheses due to operator precedence:

要使用多个条件需要使用阵列对比运算符滤波器&|~andornot分别附加地需要包装在由于操作者优先括号中的条件:

df2[(df2['rr_quality'] >= 150) & (df2['hr_quality'] > 200)]

回答by Czarking

I know this is old but I wanted to add that there is an official functionto do exactly this. Transforming the example from pandas to your case:

我知道这是旧的,但我想补充一点,有一个官方功能可以做到这一点。将示例从Pandas转换为您的案例:

grouped_df2= df2.groupby([df2.index.hour,'sleep_summary_id','rr_quality'])
grouped_df2.filter(lambda x: x['rr_quality'] > 0.)