pandas 在任何列中搜索关键字的数据框并获取行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34354390/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
search dataframe for a keyword in any column and get the rows
提问by user5694846
I have a dataframe which I wish to get a subset by checking for the presence of a keyword across all columns in all rows one by one. Here is the snippet:
我有一个数据框,我希望通过一一检查所有行的所有列中是否存在关键字来获取子集。这是片段:
df.apply(lambda x: x.str.contains('TEST')).any()
but because not all column values are of string type and so it throws error:
但因为并非所有列值都是字符串类型,因此会引发错误:
AttributeError: ('Can only use .str accessor with string values
AttributeError: ('只能使用带有字符串值的 .str 访问器
Any help is appreciated.
任何帮助表示赞赏。
回答by 8one6
Flying blind without an example here, but how about:
在这里没有示例就盲目飞行,但是如何:
df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1)
df.apply(lambda row: row.astype(str).str.contains('TEST').any(), axis=1)
So, for example:
因此,例如:
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame(np.random.choice(['0.0', 'Hello', 'Goodbye'], (12, 3)))
df.apply(lambda row: row.astype(str).str.contains('Hello').any(), axis=1)
回答by jezrael
Without data it is complicated, but I try use numpy function numpy.column_stack
and list comprehension:
没有数据它很复杂,但我尝试使用 numpy 函数numpy.column_stack
和列表理解:
print df
A B D E
0 A TEST1 2014-04-08 8
1 B TEST2 2014-05-08 7
2 B C 2014-05-08 15
3 B TEST3 2014-05-08 1
4 TESTA A 2014-04-08 6
5 A TEST5 2014-04-08 1
Mask subset with columns with string data:
带有字符串数据的列掩码子集:
mask = np.column_stack([df[col].str.contains("TEST") for col in ['A', 'B']])
print mask
[[False True]
[False True]
[False False]
[False True]
[ True False]
[False True]]
print df.loc[mask.any(axis=1)]
A B D E
0 A TEST1 2014-04-08 8
1 B TEST2 2014-05-08 7
3 B TEST3 2014-05-08 1
4 TESTA A 2014-04-08 6
5 A TEST5 2014-04-08 1
Mask subset with excluded columns with not string data:
带有非字符串数据的排除列的掩码子集:
mask = np.column_stack([df[col].str.contains("TEST") for col in df if col not in ['D', 'E']])
print mask
[[False True]
[False True]
[False False]
[False True]
[ True False]
[False True]]
print df.loc[mask.any(axis=1)]
A B D E
0 A TEST1 2014-04-08 8
1 B TEST2 2014-05-08 7
3 B TEST3 2014-05-08 1
4 TESTA A 2014-04-08 6
5 A TEST5 2014-04-08 1