Python 什么是 Pandas 上的 SQL“GROUP BY HAVING”的等价物？

Question

提问by Mannaggia

what would be the most efficient way to use groupby and in parallel apply a filter in pandas?

使用 groupby 并在 Pandas 中并行应用过滤器的最有效方法是什么？

Basically I am asking for the equivalent in SQL of

基本上我要求在 SQL 中的等价物

select *
...
group by col_name
having condition

I think there are many uses cases ranging from conditional means, sums, conditional probabilities, etc. which would make such a command very powerful.

我认为有很多用例，包括条件均值、总和、条件概率等，这将使这样的命令非常强大。

I need a very good performance, so ideally such a command would not be the result of several layered operations done in python.

我需要一个非常好的性能，所以理想情况下这样的命令不会是在 python 中完成的几个分层操作的结果。

Answer 1

采纳答案by Andy Hayden

As mentioned in unutbu's comment, groupby's filteris the equivalent of SQL'S HAVING:

正如 unutbu 的评论中提到的，groupby 的过滤器相当于 SQL 的 HAVING：

In [11]: df = pd.DataFrame([[1, 2], [1, 3], [5, 6]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  1  3
2  5  6

In [13]: g = df.groupby('A')  #  GROUP BY A

In [14]: g.filter(lambda x: len(x) > 1)  #  HAVING COUNT(*) > 1
Out[14]:
   A  B
0  1  2
1  1  3

You can write more complicated functions (these are applied to each group), provided they return a plain ol' bool:

您可以编写更复杂的函数（这些函数适用于每个组），前提是它们返回一个普通的 ol' bool：

In [15]: g.filter(lambda x: x['B'].sum() == 5)
Out[15]:
   A  B
0  1  2
1  1  3

Note: potentially there is a bugwhere you can't write you function to act on the columns you've used to groupby... a workaround is the groupby the columns manually i.e. g = df.groupby(df['A'])).

注意：可能存在一个错误，即您无法编写函数来对您曾经用于 groupby 的列进行操作……解决方法是手动 groupby 列，即g = df.groupby(df['A']))。

Answer 2

回答by Golden Lion

I group by state and county where max is greater than 20 then subquery the resulting values for True using the dataframe loc

我按州和县分组，其中最大值大于 20，然后使用数据帧 loc 子查询 True 的结果值

counties=df.groupby(['state','county'])['field1'].max()>20
counties=counties.loc[counties.values==True]

Python 什么是 Pandas 上的 SQL“GROUP BY HAVING”的等价物？

提问by Mannaggia

采纳答案by Andy Hayden

回答by Golden Lion

相关推荐

最近更新

标签

Python 什么是 Pandas 上的 SQL“GROUP BY HAVING”的等价物？

提问by Mannaggia

采纳答案by Andy Hayden

回答by Golden Lion

相关推荐

Python 按增量增加所有列表值

Python PyCharm 无法识别在开发模式下安装的模块

Python Django：如何使用动态（非模型）数据预填充 FormView？

Python 如何根据索引数组重新排列数组

相关推荐

最近更新

标签