如何匹配python中正则表达式中字符串列表中的任何字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33406313/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:18:54  来源:igfitidea点击:

How to match any string from a list of strings in regular expressions in python?

pythonregexstringpython-3.x

提问by Josh Weinstein

Lets say I have a list of strings,

假设我有一个字符串列表,

string_lst = ['fun', 'dum', 'sun', 'gum']

I want to make a regular expression, where at a point in it, I can match any of the strings i have in that list, within a group, such as this:

我想创建一个正则表达式,在其中的某个点,我可以匹配我在该列表中的任何字符串,在一个组内,例如:

import re
template = re.compile(r".*(elem for elem in string_lst).*")
template.match("I love to have fun.")

What would be the correct way to do this? Or would one have to make multiple regular expressions and match them all separately to the string?

这样做的正确方法是什么?或者是否必须制作多个正则表达式并将它们分别与字符串匹配?

采纳答案by vks

string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."

print re.findall(r"(?=("+'|'.join(string_lst)+r"))",x)

You cannot use matchas it will match from start.Use findallinstead.

您不能使用,match因为它会从一开始就匹配。请findall改用。

Output:['fun']

输出:['fun']

using searchyou will get only the first match.So use findallinstead.

使用search你只会得到第一场比赛findall。所以改用。

Also use lookaheadif you have overlapping matches not starting at the same point.

lookahead如果您的重叠匹配不是从同一点开始,也可以使用。

回答by lord63. j

Except for the regular expression, you can use list comprehension, hope it's not off the topic.

除了正则表达式,你可以使用列表理解,希望它不会偏离主题。

import re
def match(input_string, string_list):
    words = re.findall(r'\w+', input_string)
    return [word for word in words if word in string_list]

>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> match("I love to have fun.", string_lst)
['fun']

回答by John La Rooy

You should make sure to escape the strings correctly before combining into a regex

在组合成正则表达式之前,您应该确保正确转义字符串

>>> import re
>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> x = "I love to have fun."
>>> regex = re.compile("(?=(" + "|".join(map(re.escape, string_lst)) + "))")
>>> re.findall(regex, x)
['fun']

回答by jfs

regexmodulehas named lists(sets actually):

regex模块具有命名列表(实际上是集合):

#!/usr/bin/env python
import regex as re # $ pip install regex

p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
    print('matched')

Here wordsis just a name, you can use anything you like instead.
.search()methods is used instead of .*before/after the named list.

words只是一个名称,您可以使用任何您喜欢的名称。
.search()使用方法而不是.*在命名列表之前/之后。

To emulate named lists using stdlib's remodule:

要使用 stdlib 的re模块模拟命名列表:

#!/usr/bin/env python
import re

words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
    print('matched')

re.escape()is used to escape regex meta-characters such as .*?inside individual words (to match the words literally).
sorted()emulates regexbehavior and it puts the longest words first among the alternatives, compare:

re.escape()用于转义正则表达式元字符,例如.*?在单个单词内部(以逐字匹配单词)。
sorted()模拟regex行为并将最长的单词放在备选方案中,比较:

>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
['funny']

回答by Pranzell

In line with @vks reply - I feel this actually does the comeplete task..

与@vks 回复一致 - 我觉得这实际上完成了完成任务..

finds = re.findall(r"(?=(\b" + '\b|\b'.join(string_lst) + r"\b))", x)

Adding word boundary completes the task!

添加词边界完成任务!