Python 在所有 Pandas DataFrame 列中搜索 String 并进行过滤

Question

提问by horatio1701d

Thought this would be straight forward but had some trouble tracking down an elegant way to search all columns in a dataframe at same time for a partial string match. Basically how would I apply df['col1'].str.contains('^')to an entire dataframe at once and filter down to any rows that have records containing the match?

认为这会很简单，但在追踪一种优雅的方式来同时搜索数据帧中的所有列以进行部分字符串匹配时遇到了一些麻烦。基本上我将如何一次应用于df['col1'].str.contains('^')整个数据框并过滤到任何包含匹配项的记录的行？

Answer 1

采纳答案by unutbu

The Series.str.containsmethod expects a regex pattern (by default), not a literal string. Therefore str.contains("^")matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^")to match the literal ^character.

该Series.str.contains方法需要正则表达式模式（默认情况下），而不是文字字符串。因此str.contains("^")匹配任何字符串的开头。由于每个字符串都有开头，因此所有内容都匹配。而是用于str.contains("\^")匹配文字^字符。

To check every column, you could use for col in dfto iterate through the column names, and then call str.containson each column:

要检查每一列，您可以使用for col in df遍历列名，然后调用str.contains每一列：

mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df])
df.loc[mask.any(axis=1)]

Alternatively, you could pass regex=Falseto str.containsto make the test use the Python inoperator; but (in general) using regex is faster.

或者，您可以传递regex=Falsetostr.contains使测试使用 Pythonin运算符；但是（通常）使用正则表达式会更快。

Answer 2

回答by Puneet Sinha

Try with :

尝试：

df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1)

Answer 3

回答by Ciro

posting my findings in case someone would need.

发布我的发现以防万一有人需要。

i had a Dataframe (360 000 rows), needed to search across the whole dataframe to find the rows (just a few) that contained word 'TOTAL' (any variation eg 'TOTAL PRICE', 'TOTAL STEMS' etc) and delete those rows.

我有一个数据框（360 000 行），需要在整个数据框中搜索以找到包含单词“TOTAL”（任何变体，例如“TOTAL PRICE”、“TOTAL STEMS”等）的行（仅几行）并删除那些行。

i finally processed the dataframe in two-steps:

我最终分两步处理了数据帧：

FIND COLUMNS THAT CONTAIN THE WORD:

查找包含单词的列：

for i in df.columns:
df[i].astype('str').apply(lambda x: print(df[i].name) if x.startswith('TOTAL') else 'pass')

DELETE THE ROWS:

删除行：

df[df['LENGTH/ CMS'].str.contains('TOTAL') != True]

Python 在所有 Pandas DataFrame 列中搜索 String 并进行过滤

提问by horatio1701d

采纳答案by unutbu

回答by Puneet Sinha

回答by Ciro

相关推荐

最近更新

标签

Python 在所有 Pandas DataFrame 列中搜索 String 并进行过滤

提问by horatio1701d

采纳答案by unutbu

回答by Puneet Sinha

回答by Ciro

相关推荐

Python Flask-login AttributeError: 'User' 对象没有属性 'is_active'

Python 为什么 `True == False is False` 评估为 False？

在python单元测试中模拟类属性的更好方法

在 Python 中作为制表符分隔的列写入文本文件

相关推荐

最近更新

标签