Python 在 Pandas 的 DataFrame 上搜索“不包含”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17097643/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Search for "does-not-contain" on a DataFrame in pandas
提问by stites
I've done some searching and can't figure out how to filter a dataframe by df["col"].str.contains(word), however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of !(df["col"].str.contains(word)).
我已经进行了一些搜索,但无法弄清楚如何通过 过滤数据帧df["col"].str.contains(word),但是我想知道是否有办法进行相反的操作:通过该集合的恭维过滤数据帧。例如: 的效果!(df["col"].str.contains(word))。
Can this be done through a DataFramemethod?
这可以通过一种DataFrame方法来完成吗?
采纳答案by Andy Hayden
You can use the invert (~) operator (which acts like a not for boolean data):
您可以使用 invert (~) 运算符(对于布尔数据,它的作用类似于 not):
new_df = df[~df["col"].str.contains(word)]
, where new_dfis the copy returned by RHS.
,new_dfRHS 返回的副本在哪里。
contains also accepts a regular expression...
contains 也接受一个正则表达式...
If the above throws a ValueError, the reason is likely because you have mixed datatypes, so use na=False:
如果上面抛出一个 ValueError,原因很可能是因为你有混合数据类型,所以使用na=False:
new_df = df[~df["col"].str.contains(word, na=False)]
Or,
或者,
new_df = df[df["col"].str.contains(word) == False]
回答by Shoresh
I had to get rid of the NULL values before using the command recommended by Andy above. An example:
在使用上面 Andy 推荐的命令之前,我必须摆脱 NULL 值。一个例子:
df = pd.DataFrame(index = [0, 1, 2], columns=['first', 'second', 'third'])
df.ix[:, 'first'] = 'myword'
df.ix[0, 'second'] = 'myword'
df.ix[2, 'second'] = 'myword'
df.ix[1, 'third'] = 'myword'
df
first second third
0 myword myword NaN
1 myword NaN myword
2 myword myword NaN
Now running the command:
现在运行命令:
~df["second"].str.contains(word)
I get the following error:
我收到以下错误:
TypeError: bad operand type for unary ~: 'float'
I got rid of the NULL values using dropna() or fillna() first and retried the command with no problem.
我首先使用 dropna() 或 fillna() 摆脱了 NULL 值,然后重试了该命令,没有问题。
回答by nanselm2
I was having trouble with the not (~) symbol as well, so here's another way from another StackOverflow thread:
我也遇到了 not (~) 符号的问题,所以这是另一个StackOverflow 线程的另一种方式:
df[df["col"].str.contains('this|that')==False]
回答by U10-Forward
Additional to nanselm2's answer, you can use 0instead of False:
除了nanselm2的回答之外,您还可以使用0代替False:
df["col"].str.contains(word)==0
回答by Arash
You can use Apply and Lambda to select rows where a column contains any thing in a list. For your scenario :
您可以使用 Apply 和 Lambda 来选择列包含列表中任何内容的行。对于您的场景:
df[df["col"].apply(lambda x:x not in [word1,word2,word3])]
回答by Nursnaaz
I hope the answers are already posted
我希望答案已经发布
I am adding the framework to find multiple words and negate those from dataFrame.
我正在添加框架以查找多个单词并从 dataFrame 中否定这些单词。
Here 'word1','word2','word3','word4'= list of patterns to search
这里'word1','word2','word3','word4'= 要搜索的模式列表
df= DataFrame
df= 数据帧
column_a= A column name from from DataFrame df
column_a= 来自 DataFrame df 的列名
Search_for_These_values = ['word1','word2','word3','word4']
pattern = '|'.join(Search_for_These_values)
result = df.loc[~(df['column_a'].str.contains(pattern, case=False)]

