如何匹配python中正则表达式中字符串列表中的任何字符串？

Question

提问by Josh Weinstein

Lets say I have a list of strings,

假设我有一个字符串列表，

string_lst = ['fun', 'dum', 'sun', 'gum']

I want to make a regular expression, where at a point in it, I can match any of the strings i have in that list, within a group, such as this:

我想创建一个正则表达式，在其中的某个点，我可以匹配我在该列表中的任何字符串，在一个组内，例如：

import re
template = re.compile(r".*(elem for elem in string_lst).*")
template.match("I love to have fun.")

What would be the correct way to do this? Or would one have to make multiple regular expressions and match them all separately to the string?

这样做的正确方法是什么？或者是否必须制作多个正则表达式并将它们分别与字符串匹配？

Answer 1

采纳答案by vks

string_lst = ['fun', 'dum', 'sun', 'gum']
x="I love to have fun."

print re.findall(r"(?=("+'|'.join(string_lst)+r"))",x)

You cannot use matchas it will match from start.Use findallinstead.

您不能使用，match因为它会从一开始就匹配。请findall改用。

Output:['fun']

输出：['fun']

using searchyou will get only the first match.So use findallinstead.

使用search你只会得到第一场比赛findall。所以改用。

Also use lookaheadif you have overlapping matches not starting at the same point.

lookahead如果您的重叠匹配不是从同一点开始，也可以使用。

Answer 2

回答by lord63. j

Except for the regular expression, you can use list comprehension, hope it's not off the topic.

除了正则表达式，你可以使用列表理解，希望它不会偏离主题。

import re
def match(input_string, string_list):
    words = re.findall(r'\w+', input_string)
    return [word for word in words if word in string_list]

>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> match("I love to have fun.", string_lst)
['fun']

Answer 3

回答by John La Rooy

You should make sure to escape the strings correctly before combining into a regex

在组合成正则表达式之前，您应该确保正确转义字符串

>>> import re
>>> string_lst = ['fun', 'dum', 'sun', 'gum']
>>> x = "I love to have fun."
>>> regex = re.compile("(?=(" + "|".join(map(re.escape, string_lst)) + "))")
>>> re.findall(regex, x)
['fun']

Answer 4

回答by jfs

regexmodulehas named lists(sets actually):

regex模块具有命名列表（实际上是集合）：

#!/usr/bin/env python
import regex as re # $ pip install regex

p = re.compile(r"\L<words>", words=['fun', 'dum', 'sun', 'gum'])
if p.search("I love to have fun."):
    print('matched')

Here wordsis just a name, you can use anything you like instead.
.search()methods is used instead of .*before/after the named list.

这words只是一个名称，您可以使用任何您喜欢的名称。
.search()使用方法而不是.*在命名列表之前/之后。

To emulate named lists using stdlib's remodule:

要使用 stdlib 的re模块模拟命名列表：

#!/usr/bin/env python
import re

words = ['fun', 'dum', 'sun', 'gum']
longest_first = sorted(words, key=len, reverse=True)
p = re.compile(r'(?:{})'.format('|'.join(map(re.escape, longest_first))))
if p.search("I love to have fun."):
    print('matched')

re.escape()is used to escape regex meta-characters such as .*?inside individual words (to match the words literally).
sorted()emulates regexbehavior and it puts the longest words first among the alternatives, compare:

re.escape()用于转义正则表达式元字符，例如.*?在单个单词内部（以逐字匹配单词）。
sorted()模拟regex行为并将最长的单词放在备选方案中，比较：

>>> import re
>>> re.findall("(funny|fun)", "it is funny")
['funny']
>>> re.findall("(fun|funny)", "it is funny")
['fun']
>>> import regex
>>> regex.findall(r"\L<words>", "it is funny", words=['fun', 'funny'])
['funny']
>>> regex.findall(r"\L<words>", "it is funny", words=['funny', 'fun'])
['funny']

Answer 5

回答by Pranzell

In line with @vks reply - I feel this actually does the comeplete task..

与@vks 回复一致 - 我觉得这实际上完成了完成任务..

finds = re.findall(r"(?=(\b" + '\b|\b'.join(string_lst) + r"\b))", x)

Adding word boundary completes the task!

添加词边界完成任务！

如何匹配python中正则表达式中字符串列表中的任何字符串？

提问by Josh Weinstein

采纳答案by vks

回答by lord63. j

回答by John La Rooy

回答by jfs

回答by Pranzell

相关推荐

最近更新

标签

如何匹配python中正则表达式中字符串列表中的任何字符串？

提问by Josh Weinstein

采纳答案by vks

回答by lord63. j

回答by John La Rooy

回答by jfs

回答by Pranzell

相关推荐

在Python中获取文件的文件夹名称

Python Sublime Text 3 的构建系统问题 - 无法从正在运行的程序中获取输入

Python 使用 boto3 连接到 CloudFront 时如何选择 AWS 配置文件

Python 中 x 的第 n 个根是否有简写

相关推荐

最近更新

标签