使用 Pandas 中的 Where 条件分组

Question

提问by Keithx

Have a dataframe like this:

有一个这样的数据框：

I created column 'dif_pause' based on subtracting 'pause_end' and 'pause_start' column values and doing the mean value aggregation using groupby () function just like this:

我创建了列 'dif_pause' 基于减去 'pause_end' 和 'pause_start' 列值并使用 groupby() 函数进行平均值聚合，就像这样：

pauses['dif_pause'] = pauses['pause_end'] - pauses['pause_start']
pauses['dif_pause'].astype(dt.timedelta).map(lambda x: np.nan if pd.isnull(x) else x.days)

pauses_df=pauses.groupby(["subscription_id"])["dif_pause"].mean().reset_index(name="avg_pause")

I'd like to include in the groupby section the checking whether pause_end>pause_start (some equialent of WHERE clause in SQL). How can one do that?

我想在 groupby 部分包括检查是否 pause_end>pause_start （SQL 中的 WHERE 子句的某些等价物）。怎么能这样呢？

Thanks.

谢谢。

Answer 1

回答by jezrael

It seems you need queryor boolean indexingfirst for filtering:

看来您需要query或boolean indexing首先进行过滤：

pauses.query("pause_end > pause_start")
       .groupby(["subscription_id"])["dif_pause"].mean().reset_index(name="avg_pause")

pauses[pauses["pause_end"] > pauses["pause_start"]]
      .groupby(["subscription_id"])["dif_pause"].mean().reset_index(name="avg_pause")

使用 Pandas 中的 Where 条件分组

提问by Keithx

回答by jezrael

相关推荐

最近更新

标签

使用 Pandas 中的 Where 条件分组

提问by Keithx

回答by jezrael

相关推荐

pandas TypeError: unhashable type: 'list' 在 python 中使用 groupby 时

如何在 Pandas 中打开文件

pandas python csv到字典使用csv或pandas模块

AttributeError: 'module' 对象在 Pandas 中没有属性 'to_numeric'

相关推荐

最近更新

标签