如何匹配python中正则表达式中字符串列表中的任何字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33406313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to match any string from a list of strings in regular expressions in python?
提问by Josh Weinstein
Lets say I have a list of strings,
假设我有一个字符串列表,
string_lst = ['fun', 'dum', 'sun', 'gum']
I want to make a regular expression, where at a point in it, I can match any of the strings i have in that list, within a group, such as this:
我想创建一个正则表达式,在其中的某个点,我可以匹配我在该列表中的任何字符串,在一个组内,例如:
import re
template = re.compile(r".*(elem for elem in string_lst).*")
template.match("I love to have fun.")
What would be the correct way to do this? Or would one have to make multiple regular expressions and match them all separately to the string?
这样做的正确方法是什么?或者是否必须制作多个正则表达式并将它们分别与字符串匹配?
采纳答案by vks
string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."
print re.findall(r"(?=("+'|'.join(string_lst)+r"))",x)
You cannot use match
as it will match from start.Use findall
instead.
您不能使用,match
因为它会从一开始就匹配。请findall
改用。
Output:['fun']
输出:['fun']
using search
you will get only the first match.So use findall
instead.
使用search
你只会得到第一场比赛findall
。所以改用。
Also use lookahead
if you have overlapping matches not starting at the same point.
lookahead
如果您的重叠匹配不是从同一点开始,也可以使用。
回答by lord63. j
Except for the regular expression, you can use list comprehension, hope it's not off the topic.
除了正则表达式,你可以使用列表理解,希望它不会偏离主题。
import re
def match(input_string, string_list):
words = re.findall(r'\w+', input_string)
return [word for word in words if word in string_list]
>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> match("I love to have fun.", string_lst)
['fun']
回答by John La Rooy
You should make sure to escape the strings correctly before combining into a regex
在组合成正则表达式之前,您应该确保正确转义字符串
>>> import re
>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> x = "I love to have fun."
>>> regex = re.compile("(?=(" + "|".join(map(re.escape, string_lst)) + "))")
>>> re.findall(regex, x)
['fun']
回答by jfs
regex
modulehas named lists(sets actually):
regex
模块具有命名列表(实际上是集合):
#!/usr/bin/env python
import regex as re # $ pip install regex
p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
print('matched')
Here words
is just a name, you can use anything you like instead..search()
methods is used instead of .*
before/after the named list.
这words
只是一个名称,您可以使用任何您喜欢的名称。.search()
使用方法而不是.*
在命名列表之前/之后。
To emulate named lists using stdlib's re
module:
要使用 stdlib 的re
模块模拟命名列表:
#!/usr/bin/env python
import re
words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
print('matched')
re.escape()
is used to escape regex meta-characters such as .*?
inside individual words (to match the words literally).sorted()
emulates regex
behavior and it puts the longest words first among the alternatives, compare:
re.escape()
用于转义正则表达式元字符,例如.*?
在单个单词内部(以逐字匹配单词)。sorted()
模拟regex
行为并将最长的单词放在备选方案中,比较:
>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
['funny']
回答by Pranzell
In line with @vks reply - I feel this actually does the comeplete task..
与@vks 回复一致 - 我觉得这实际上完成了完成任务..
finds = re.findall(r"(?=(\b" + '\b|\b'.join(string_lst) + r"\b))", x)
Adding word boundary completes the task!
添加词边界完成任务!