Python Pandas：字符串包含和不包含

Question

提问by Sam Perry

I'm trying to match rows of a Pandas DataFrame that contains and doesn't contain certain strings. For example:

我正在尝试匹配包含和不包含某些字符串的 Pandas DataFrame 的行。例如：

import pandas
df = pandas.Series(['ab1', 'ab2', 'b2', 'c3'])
df[df.str.contains("b")]

Output:

输出：

0    ab1
1    ab2
2     b2
dtype: object

Desired output:

期望的输出：

2     b2
dtype: object

Question: is there an elegant way of saying something like this?

问题：有没有一种优雅的表达方式？

df[[df.str.contains("b")==True] and [df.str.contains("a")==False]]
# Doesn't give desired outcome

Answer 1

You're almost there, you just haven't got the syntax quite right, it should be:

你快到了，你只是没有完全正确的语法，它应该是：

df[(df.str.contains("b") == True) & (df.str.contains("a") == False)]

Another approach which might be cleaner if you have a lot of conditions to apply would to be to chain your filters together with reduce or a loop:

如果您有很多条件要应用，另一种可能更干净的方法是将过滤器与 reduce 或循环链接在一起：

from functools import reduce
filters = [("a", False), ("b", True)]
reduce(lambda df, f: df[df.str.contains(f[0]) == f[1]], filters, df)
#outputs b2

Answer 2

Either:

任何一个：

>>> ts.str.contains('b') & ~ts.str.contains('a')
0    False
1    False
2     True
3    False
dtype: bool

or use regex:

或使用正则表达式：

>>> ts.str.contains('^[^a]*b[^a]*$')
0    False
1    False
2     True
3    False
dtype: bool

Answer 3

You can use .loc and ~ to index:

您可以使用 .loc 和 ~ 来索引：

df.loc[(df.str.contains("b")) & (~df.str.contains("a"))]

2    b2
dtype: object