Python 检查熊猫数据框列中的字符串是否在列表中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17972938/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
check if string in pandas dataframe column is in list
提问by user2333196
If I have a frame like this
如果我有这样的框架
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
and I want to check if any of those rows contain a certain word I just have to do this.
我想检查这些行中是否有任何一行包含某个单词,我只需要这样做。
frame['b'] = frame.a.str.contains("dog") | frame.a.str.contains("cat") | frame.a.str.contains("fish")
frame['b']
outputs:
frame['b']
输出:
True
False
True
If I decide to make a list
如果我决定列一个清单
mylist =['dog', 'cat', 'fish']
how would I check that the rows contain a certain word in the list?
我将如何检查行中是否包含列表中的某个单词?
采纳答案by Andy Hayden
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
frame
a
0 the cat is blue
1 the sky is green
2 the dog is black
The str.contains
method accepts a regular expression pattern:
该str.contains
方法接受一个正则表达式模式:
mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)
pattern
'dog|cat|fish'
frame.a.str.contains(pattern)
0 True
1 False
2 True
Name: a, dtype: bool
Because regex patterns are supported, you can also embed flags:
由于支持正则表达式模式,您还可以嵌入标志:
frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})
frame
a
0 Cat Mr. Nibbles is blue
1 the sky is green
2 the dog is black
pattern = '|'.join([f'(?i){animal}' for animal in mylist]) # python 3.6+
pattern
'(?i)dog|(?i)cat|(?i)fish'
frame.a.str.contains(pattern)
0 True # Because of the (?i) flag, 'Cat' is also matched to 'cat'
1 False
2 True
回答by Meloun
For list should work
对于列表应该工作
print frame[frame['a'].isin(mylist)]
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html
回答by Aman Raparia
After going through the comments of the accepted answer of extracting the string, this approach can also be tried.
在浏览了提取字符串的已接受答案的注释后,也可以尝试这种方法。
frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
frame
a
0 the cat is blue
1 the sky is green
2 the dog is black
Let us create our list which will have strings that needs to be matched and extracted.
让我们创建我们的列表,其中包含需要匹配和提取的字符串。
mylist = ['dog', 'cat', 'fish']
pattern = '|'.join(mylist)
Now let create a function which will be responsible to find and extract the substring.
现在让我们创建一个函数来负责查找和提取子字符串。
import re
def pattern_searcher(search_str:str, search_list:str):
search_obj = re.search(search_list, search_str)
if search_obj :
return_str = search_str[search_obj.start(): search_obj.end()]
else:
return_str = 'NA'
return return_str
We will use this function with pandas.DataFrame.apply
我们将这个函数与 pandas.DataFrame.apply 一起使用
frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))
Result :
结果 :
a matched_str
0 the cat is blue cat
1 the sky is green NA
2 the dog is black dog