pandas 反转string.contains在python、pandas中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21055068/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reversal of string.contains In python, pandas
提问by Xodarap777
I have something like this in my code:
我的代码中有这样的东西:
df2 = df[df['A'].str.contains("Hello|World")]
df2 = df[df['A'].str.contains("Hello|World")]
However, I want all the rows that don'tcontain either of Hello or World. How do I most efficiently reverse this?
但是,我想要所有不包含 Hello 或 World 的行。我如何最有效地扭转这种情况?
回答by DSM
You can use the tilde ~to flip the bool values:
您可以使用波浪号~翻转布尔值:
>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df.A.str.contains("Hello|World")
0 True
1 False
2 True
3 False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0 False
1 True
2 False
3 True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
A
1 this
3 apple
[2 rows x 1 columns]
Whether this is the most efficient way, I don't know; you'd have to time it against your other options. Sometimes using a regular expression is slower than things like df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))], but I'm bad at guessing where the crossovers are.
这是否是最有效的方式,我不知道;你必须根据你的其他选择来计时。有时使用正则表达式比诸如 之类的要慢df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))],但我不擅长猜测交叉点在哪里。
回答by Martijn Pieters
The .contains()method uses regular expressions, so you can use a negative lookahead testto determine that a word is notcontained:
该.contains()方法使用正则表达式,因此您可以使用否定前瞻测试来确定不包含某个单词:
df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
This expression matches any string where the words Helloand Worldare notfound anywhere in the string.
这种表达的哪里话任何字符串相匹配Hello,并World在未找到该字符串中的任何位置。
Demo:
演示:
>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0 False
1 True
2 False
3 True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
A
1 this
3 apple

