使用 Pandas 中的 Where 条件分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44537249/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grouping by with Where conditions in Pandas
提问by Keithx
I created column 'dif_pause' based on subtracting 'pause_end' and 'pause_start' column values and doing the mean value aggregation using groupby () function just like this:
我创建了列 'dif_pause' 基于减去 'pause_end' 和 'pause_start' 列值并使用 groupby() 函数进行平均值聚合,就像这样:
pauses['dif_pause'] = pauses['pause_end'] - pauses['pause_start']
pauses['dif_pause'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days)
pauses_df=pauses.groupby(["subscription_id"])["dif_pause"].mean().reset_index(name="avg_pause")
I'd like to include in the groupby section the checking whether pause_end>pause_start (some equialent of WHERE clause in SQL). How can one do that?
我想在 groupby 部分包括检查是否 pause_end>pause_start (SQL 中的 WHERE 子句的某些等价物)。怎么能这样呢?
Thanks.
谢谢。
回答by jezrael
It seems you need query
or boolean indexing
first for filtering:
看来您需要query
或boolean indexing
首先进行过滤:
pauses.query("pause_end > pause_start")
.groupby(["subscription_id"])["dif_pause"].mean().reset_index(name="avg_pause")
pauses[pauses["pause_end"] > pauses["pause_start"]]
.groupby(["subscription_id"])["dif_pause"].mean().reset_index(name="avg_pause")