Python Pandas groupby：根据值的条件过滤

Question

提问by Adam

Consider a dataframe like the following.

考虑如下所示的数据框。

import pandas as pd

# Initialize dataframe
df1 = pd.DataFrame(columns=['bar', 'foo'])
df1['bar'] = ['001', '001', '001', '001', '002', '002', '003', '003', '003']
df1['foo'] = [-1, 0, 2, 3, -8, 1, 0, 1, 2]
>>> print df1
   bar  foo
0  001   -1
1  001    0
2  001    2
3  001    3
4  002   -8
5  002    1
6  003    0
7  003    1
8  003    2

# Lower and upper bound for desired range
lower_bound = -5
upper_bound = 5

I would like to use groupby in Pandas to return a dataframe that filters out rows with an barthat meets a condition. In particular, I would like to filter out rows with an barif one of the values of foofor this baris not between lower_boundand upper_bound.

我想在 Pandas 中使用 groupby 返回一个数据框，该数据框过滤掉bar满足条件的行。我特别想用过滤掉行bar，如果值的一个foo本bar不是之间lower_bound和upper_bound。

In the above example, rows with bar = 002should be filtered out since not all of the rows with bar = 002contain a value of foobetween -5and 5(namely, row index 4contains foo = -8). The desired output for this example is the following.

在上面的示例中，bar = 002应过滤掉的行，因为并非所有的行bar = 002都包含foo介于-5和之间的值5（即，行索引4包含foo = -8）。此示例所需的输出如下。

# Desired output
   bar  foo
0  001   -1
1  001    0
2  001    2
3  001    3
6  003    0
7  003    1
8  003    2

I have tried the following approach.

我尝试了以下方法。

# Attempted solution
grouped = df1.groupby('bar')['foo']
grouped.filter(lambda x: x < lower_bound or x > upper_bound)

However, this yields a TypeError: the filter must return a boolean result. Furthermore, this approach might return a groupby object, when I want the result to return a dataframe object.

但是，这会产生一个TypeError: the filter must return a boolean result. 此外，当我希望结果返回一个数据帧对象时，这种方法可能会返回一个 groupby 对象。

Answer 1

采纳答案by Psidom

Most likely you will not use andand orbut vectorized &and |with pandas, and for your case, then apply all()function in the filter to construct the boolean condition, this keeps barwhere all corresponding foovalues are between lower_boundand upper_bound:

很可能您不会使用andandor而是使用矢量化&和|with pandas，并且对于您的情况，然后all()在过滤器中应用函数来构造布尔条件，这将保持bar所有相应foo值在lower_bound和upper_bound之间的位置：

df1.groupby('bar').filter(lambda x: ((x.foo >= lower_bound) & (x.foo <= upper_bound)).all())

#   bar foo
#0  001 -1
#1  001  0
#2  001  2
#3  001  3
#6  003  0
#7  003  1
#8  003  2

Answer 2

回答by mrPuzzle

Psidom's answer works fine, but can be slow on large datasets. Mine is somewhat of a workaround, but it is fast.

Psidom 的答案工作正常，但在大型数据集上可能会很慢。我的有点解决方法，但速度很快。

df1['conditions_apply'] = (df1.foo >= lower_bound) & (df1.foo <= upper_bound)
selection = df1.groupby('bar')['conditions_apply'].min()  # any False will return False
selection = selection[selection].index.tolist()           # get all bars with Trues
df1 = df1[df1.bar.isin(selection)]                        # make selection
df1.drop(columns=['conditions_apply'], inplace=True)      # drop newly made column

Python Pandas groupby：根据值的条件过滤

提问by Adam

采纳答案by Psidom

回答by mrPuzzle

相关推荐

最近更新

标签

Python Pandas groupby：根据值的条件过滤

提问by Adam

采纳答案by Psidom

回答by mrPuzzle

相关推荐

pandas 类型错误：pivot_table() 为关键字参数“values”获得了多个值

pandas 从整个数据框中删除一个字符

使用 Pandas 的字典键和值的列名

Pandas - 在列中找到第一个非空值

相关推荐

最近更新

标签