Python 在 Pandas 的 DataFrame 上搜索“不包含”

Question

提问by stites

I've done some searching and can't figure out how to filter a dataframe by df["col"].str.contains(word), however I'm wondering if there is a way to do the reverse: filter a dataframe by that set's compliment. eg: to the effect of !(df["col"].str.contains(word)).

我已经进行了一些搜索，但无法弄清楚如何通过过滤数据帧df["col"].str.contains(word)，但是我想知道是否有办法进行相反的操作：通过该集合的恭维过滤数据帧。例如：的效果!(df["col"].str.contains(word))。

Can this be done through a DataFramemethod?

这可以通过一种DataFrame方法来完成吗？

Answer 1

采纳答案by Andy Hayden

You can use the invert (~) operator (which acts like a not for boolean data):

您可以使用 invert (~) 运算符（对于布尔数据，它的作用类似于 not）：

new_df = df[~df["col"].str.contains(word)]

, where new_dfis the copy returned by RHS.

，new_dfRHS 返回的副本在哪里。

contains also accepts a regular expression...

contains 也接受一个正则表达式...

If the above throws a ValueError, the reason is likely because you have mixed datatypes, so use na=False:

如果上面抛出一个 ValueError，原因很可能是因为你有混合数据类型，所以使用na=False：

new_df = df[~df["col"].str.contains(word, na=False)]

Or,

或者，

new_df = df[df["col"].str.contains(word) == False]

Answer 2

回答by Shoresh

I had to get rid of the NULL values before using the command recommended by Andy above. An example:

在使用上面 Andy 推荐的命令之前，我必须摆脱 NULL 值。一个例子：

df = pd.DataFrame(index = [0, 1, 2], columns=['first', 'second', 'third'])
df.ix[:, 'first'] = 'myword'
df.ix[0, 'second'] = 'myword'
df.ix[2, 'second'] = 'myword'
df.ix[1, 'third'] = 'myword'
df

    first   second  third
0   myword  myword   NaN
1   myword  NaN      myword 
2   myword  myword   NaN

Now running the command:

现在运行命令：

~df["second"].str.contains(word)

I get the following error:

我收到以下错误：

TypeError: bad operand type for unary ~: 'float'

I got rid of the NULL values using dropna() or fillna() first and retried the command with no problem.

我首先使用 dropna() 或 fillna() 摆脱了 NULL 值，然后重试了该命令，没有问题。

Answer 3

回答by nanselm2

I was having trouble with the not (~) symbol as well, so here's another way from another StackOverflow thread:

我也遇到了 not (~) 符号的问题，所以这是另一个StackOverflow 线程的另一种方式：

df[df["col"].str.contains('this|that')==False]

Answer 4

回答by U10-Forward

Additional to nanselm2's answer, you can use 0instead of False:

除了nanselm2的回答之外，您还可以使用0代替False：

df["col"].str.contains(word)==0

Answer 5

回答by Arash

You can use Apply and Lambda to select rows where a column contains any thing in a list. For your scenario :

您可以使用 Apply 和 Lambda 来选择列包含列表中任何内容的行。对于您的场景：

df[df["col"].apply(lambda x:x not in [word1,word2,word3])]

Answer 6

回答by Nursnaaz

I hope the answers are already posted

我希望答案已经发布

I am adding the framework to find multiple words and negate those from dataFrame.

我正在添加框架以查找多个单词并从 dataFrame 中否定这些单词。

Here 'word1','word2','word3','word4'= list of patterns to search

这里'word1','word2','word3','word4'= 要搜索的模式列表

df= DataFrame

df= 数据帧

column_a= A column name from from DataFrame df

column_a= 来自 DataFrame df 的列名

Search_for_These_values = ['word1','word2','word3','word4'] 

pattern = '|'.join(Search_for_These_values)

result = df.loc[~(df['column_a'].str.contains(pattern, case=False)]

Python 在 Pandas 的 DataFrame 上搜索“不包含”

提问by stites

采纳答案by Andy Hayden

回答by Shoresh

回答by nanselm2

回答by U10-Forward

回答by Arash

回答by Nursnaaz

相关推荐

最近更新

标签

Python 在 Pandas 的 DataFrame 上搜索“不包含”

提问by stites

采纳答案by Andy Hayden

回答by Shoresh

回答by nanselm2

回答by U10-Forward

回答by Arash

回答by Nursnaaz

相关推荐

Python正则表达式匹配特定单词

Python CountVectorizer: AttributeError: 'numpy.ndarray' 对象没有属性 'lower'

如何在python中计算百分比

附加到 Python 字典中的列表

相关推荐

最近更新

标签