pandas 反转string.contains在python、pandas中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21055068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:32:52  来源:igfitidea点击:

Reversal of string.contains In python, pandas

pythonstringpython-2.7csvpandas

提问by Xodarap777

I have something like this in my code:

我的代码中有这样的东西:

df2 = df[df['A'].str.contains("Hello|World")]

df2 = df[df['A'].str.contains("Hello|World")]

However, I want all the rows that don'tcontain either of Hello or World. How do I most efficiently reverse this?

但是,我想要所有包含 Hello 或 World 的行。我如何最有效地扭转这种情况?

回答by DSM

You can use the tilde ~to flip the bool values:

您可以使用波浪号~翻转布尔值:

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df.A.str.contains("Hello|World")
0     True
1    False
2     True
3    False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
       A
1   this
3  apple

[2 rows x 1 columns]

Whether this is the most efficient way, I don't know; you'd have to time it against your other options. Sometimes using a regular expression is slower than things like df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))], but I'm bad at guessing where the crossovers are.

这是否是最有效的方式,我不知道;你必须根据你的其他选择来计时。有时使用正则表达式比诸如 之类的要慢df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))],但我不擅长猜测交叉点在哪里。

回答by Martijn Pieters

The .contains()method uses regular expressions, so you can use a negative lookahead testto determine that a word is notcontained:

.contains()方法使用正则表达式,因此您可以使用否定前瞻测试来确定包含某个单词:

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

This expression matches any string where the words Helloand Worldare notfound anywhere in the string.

这种表达的哪里话任何字符串相匹配Hello,并World找到该字符串中的任何位置。

Demo:

演示:

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
       A
1   this
3  apple