Python 在所有 Pandas DataFrame 列中搜索 String 并进行过滤

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26640129/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:47:26  来源:igfitidea点击:

Search for String in all Pandas DataFrame columns and filter

pythonpandas

提问by horatio1701d

Thought this would be straight forward but had some trouble tracking down an elegant way to search all columns in a dataframe at same time for a partial string match. Basically how would I apply df['col1'].str.contains('^')to an entire dataframe at once and filter down to any rows that have records containing the match?

认为这会很简单,但在追踪一种优雅的方式来同时搜索数据帧中的所有列以进行部分字符串匹配时遇到了一些麻烦。基本上我将如何一次应用于df['col1'].str.contains('^')整个数据框并过滤到任何包含匹配项的记录的行?

采纳答案by unutbu

The Series.str.containsmethod expects a regex pattern (by default), not a literal string. Therefore str.contains("^")matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^")to match the literal ^character.

Series.str.contains方法需要正则表达式模式(默认情况下),而不是文字字符串。因此str.contains("^")匹配任何字符串的开头。由于每个字符串都有开头,因此所有内容都匹配。而是用于str.contains("\^")匹配文字^字符。

To check every column, you could use for col in dfto iterate through the column names, and then call str.containson each column:

要检查每一列,您可以使用for col in df遍历列名,然后调用str.contains每一列:

mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]

Alternatively, you could pass regex=Falseto str.containsto make the test use the Python inoperator; but (in general) using regex is faster.

或者,您可以传递regex=Falsetostr.contains使测试使用 Pythonin运算符;但是(通常)使用正则表达式会更快。

回答by Puneet Sinha

Try with :

尝试:

df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1)

回答by Ciro

posting my findings in case someone would need.

发布我的发现以防万一有人需要。

i had a Dataframe (360 000 rows), needed to search across the whole dataframe to find the rows (just a few) that contained word 'TOTAL' (any variation eg 'TOTAL PRICE', 'TOTAL STEMS' etc) and delete those rows.

我有一个数据框(360 000 行),需要在整个数据框中搜索以找到包含单词“TOTAL”(任何变体,例如“TOTAL PRICE”、“TOTAL STEMS”等)的行(仅几行)并删除那些行。

i finally processed the dataframe in two-steps:

我最终分两步处理了数据帧:

FIND COLUMNS THAT CONTAIN THE WORD:

查找包含单词的列:

for i in df.columns:
df[i].astype('str').apply(lambda x: print(df[i].name) if x.startswith('TOTAL') else 'pass')

DELETE THE ROWS:

删除行:

df[df['LENGTH/ CMS'].str.contains('TOTAL') != True]