pandas 反转string.contains在python、pandas中

Question

提问by Xodarap777

I have something like this in my code:

我的代码中有这样的东西：

df2 = df[df['A'].str.contains("Hello|World")]

However, I want all the rows that don'tcontain either of Hello or World. How do I most efficiently reverse this?

但是，我想要所有不包含 Hello 或 World 的行。我如何最有效地扭转这种情况？

Answer 1

回答by DSM

You can use the tilde ~to flip the bool values:

您可以使用波浪号~翻转布尔值：

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df.A.str.contains("Hello|World")
0     True
1    False
2     True
3    False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
       A
1   this
3  apple

[2 rows x 1 columns]

Whether this is the most efficient way, I don't know; you'd have to time it against your other options. Sometimes using a regular expression is slower than things like df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))], but I'm bad at guessing where the crossovers are.

这是否是最有效的方式，我不知道；你必须根据你的其他选择来计时。有时使用正则表达式比诸如之类的要慢df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))]，但我不擅长猜测交叉点在哪里。

Answer 2

回答by Martijn Pieters

The .contains()method uses regular expressions, so you can use a negative lookahead testto determine that a word is notcontained:

该.contains()方法使用正则表达式，因此您可以使用否定前瞻测试来确定不包含某个单词：

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

This expression matches any string where the words Helloand Worldare notfound anywhere in the string.

这种表达的哪里话任何字符串相匹配Hello，并World在未找到该字符串中的任何位置。

Demo:

演示：

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
       A
1   this
3  apple

pandas 反转string.contains在python、pandas中

提问by Xodarap777

回答by DSM

回答by Martijn Pieters

相关推荐

最近更新

标签

pandas 反转string.contains在python、pandas中

提问by Xodarap777

回答by DSM

回答by Martijn Pieters

相关推荐

在 Pandas 中读取包含列表的 csv

在 Pandas 中使用 groupby 的 TimeSeries

找不到 Python Pandas read_excel() 模块

Pandas group by 不起作用

相关推荐

最近更新

标签