Pandas：过滤多列

Question

提问by M. K. Hunter

I am working in Pandas, and I want to apply multiple filters to a data frame across multiple fields.

我在 Pandas 工作，我想对跨多个字段的数据框应用多个过滤器。

I am working with another, more complex data frame, but I am simplifying the contex for this question. Here is the setup for a sample data frame:

我正在使用另一个更复杂的数据框，但我正在简化这个问题的上下文。以下是示例数据框的设置：

dates = pd.date_range('20170101', periods=16)
rand_df = pd.DataFrame(np.random.randn(16,4), index=dates, columns=list('ABCD'))

Applying one filter to this data frame is well documented and simple:

对这个数据框应用一个过滤器是有据可查且简单的：

rand_df.loc[lambda df: df['A'] < 0]

Since the lambda looks like a simple boolean expression. It is tempting to do the following. This does not work, since, instead of being a boolean expression, it is a callable. Multiple of these cannot combine as boolean expressions would:

由于 lambda 看起来像一个简单的布尔表达式。做以下事情很诱人。这不起作用，因为它不是布尔表达式，而是可调用的。其中多个不能像布尔表达式那样组合：

rand_df.loc[lambda df: df['A'] < 0 and df[‘B'] < 0]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-31-dfa05ab293f9> in <module>()
----> 1 rand_df.loc[lambda df: df['A'] < 0 and df['B'] < 0]

I have found two ways to successfully implement this. I will add them to the potential answers, so you can comment directly on them as solutions. However, I would like to solicit other approaches, since I am not really sure that either of these is a very standard approach for filtering a Pandas data frame.

我找到了两种成功实现这一点的方法。我会将它们添加到潜在答案中，因此您可以直接将它们作为解决方案进行评论。但是，我想征求其他方法，因为我不确定这些方法中的任何一个是过滤 Pandas 数据框的非常标准的方法。

Answer 1

回答by MaxU

In [3]: rand_df.query("A < 0 and B < 0")
Out[3]:
                   A         B         C         D
2017-01-02 -0.701682 -1.224531 -0.273323 -1.091705
2017-01-05 -1.262971 -0.531959 -0.997451 -0.070095
2017-01-06 -0.065729 -1.427199  1.202082  0.136657
2017-01-08 -1.445050 -0.367112 -2.617743  0.496396
2017-01-12 -1.273692 -0.456254 -0.668510 -0.125507

or:

或者：

In [6]: rand_df[rand_df[['A','B']].lt(0).all(1)]
Out[6]:
                   A         B         C         D
2017-01-02 -0.701682 -1.224531 -0.273323 -1.091705
2017-01-05 -1.262971 -0.531959 -0.997451 -0.070095
2017-01-06 -0.065729 -1.427199  1.202082  0.136657
2017-01-08 -1.445050 -0.367112 -2.617743  0.496396
2017-01-12 -1.273692 -0.456254 -0.668510 -0.125507

PS You will find a lot of examples in the Pandas docs

PS 你会在Pandas 文档中找到很多例子

Answer 2

回答by DJK

rand_df[(rand_df.A < 0) & (rand_df.B <0)]

Answer 3

回答by piRSquared

To use the lambda, don't pass the entire column.

要使用lambda，不要传递整列。

rand_df.loc[lambda x: (x.A < 0) & (x.B < 0)]
# Or
# rand_df[lambda x: (x.A < 0) & (x.B < 0)]

                   A         B         C         D
2017-01-12 -0.460918 -1.001184 -0.796981  0.328535
2017-01-14 -0.146846 -1.088095 -1.055271 -0.778120

You can speed up the evaluation by using boolean numpy arrays

您可以使用布尔 numpy 数组来加速评估

c1 = rand_df.A.values > 0
c2 = rand_df.B.values > 0
rand_df[c1 & c2]

                   A         B         C         D
2017-01-12 -0.460918 -1.001184 -0.796981  0.328535
2017-01-14 -0.146846 -1.088095 -1.055271 -0.778120

Answer 4

回答by M. K. Hunter

Here is an approach that “chains” use of the ‘loc' operation:

这是一种“链接”使用“loc”操作的方法：

rand_df.loc[lambda df: df['A'] < 0].loc[lambda df: df['B'] < 0]

Answer 5

回答by M. K. Hunter

Here is an approach which includes writing a method to do the filtering. I am sure that some filters will be sufficiently complex or complicated that the method is the best way to go (this case is not so complex.) Also, when I am using Pandas and I write a “for” loop, I feel like I am doing it wrong.

这是一种方法，包括编写一个方法来进行过滤。我确信某些过滤器将足够复杂或复杂，以至于该方法是最好的方法（这种情况并不复杂。）此外，当我使用 Pandas 并编写“for”循环时，我觉得我我做错了。

def lt_zero_ab(df):
    result = []
    for index, row in df.iterrows():
        if row['A'] <0 and row['B'] <0:
            result.append(index)
    return result
rand_df.loc[lt_zero_ab]

Pandas：过滤多列

提问by M. K. Hunter

回答by MaxU

回答by DJK

回答by piRSquared

回答by M. K. Hunter

回答by M. K. Hunter

相关推荐

最近更新

标签

Pandas：过滤多列

提问by M. K. Hunter

回答by MaxU

回答by DJK

回答by piRSquared

回答by M. K. Hunter

回答by M. K. Hunter

相关推荐

Pandas 从日期类型列中获取星期几

如何在 Pandas 数据框列中搜索特定文本？

pandas 通过pandas数据框用空格替换str列的换行符

Pandas：用于在 DataFrame 中设置值的三元条件运算符

相关推荐

最近更新

标签