如何在python中将多个正则表达式组合成一个?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42136040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:17:28  来源:igfitidea点击:

How to combine multiple regex into single one in python?

pythonregexpattern-matching

提问by Amit

I'm learning about regular expression. I don't know how to combine different regular expression to make a single generic regular expression.

我正在学习正则表达式。我不知道如何组合不同的正则表达式来制作一个通用的正则表达式。

I want to write a single regular expression which works for multiple cases. I know this is can be done with naive approach by using or" | "operator.

我想编写一个适用于多种情况的正则表达式。我知道这可以通过使用“|”运算符以天真的方法完成。

I don't like this approach. Can anybody tell me better approach?

我不喜欢这种方法。有人能告诉我更好的方法吗?

回答by Lior Magen

You need to compile all your regex functions. Check this example:

您需要编译所有正则表达式函数。检查这个例子:

import re
re1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*'
re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*'
re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+'
re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*'

sentences = [string1, string2, string3, string4]
for sentence in sentences:
    generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence)

回答by nigel222

To findallwith an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:

对于 findall任意系列的 RE,您所要做的就是连接每个返回的匹配项列表:

re_list = [
    '\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question,
    ...
    '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question
]

matches = []
for r in re_list:
   matches += re.findall( r, string)

For efficiency it would be better to use a list of compiled REs.

为了提高效率,最好使用已编译的 RE 列表。

Alternatively you could join the element RE strings using

或者,您可以使用加入元素 RE 字符串

generic_re = re.compile( '|'.join( re_list) )

回答by Karen McCulloch

I see lots of people are using pipes, but that seems to only match the first instance. If you want to match all, then try using lookaheads.

我看到很多人都在使用管道,但这似乎只匹配第一个实例。如果您想匹配所有内容,请尝试使用前瞻。

Example:

例子:

>>> fruit_string = "10a11p" 
>>> fruit_regex = r'(?=.*?(?P<pears>\d+)p)(?=.*?(?P<apples>\d+)a)'
>>> re.match(fruit_regex, fruit_string).groupdict()
{'apples': '10', 'pears': '11'}
>>> re.match(fruit_regex, fruit_string).group(0)
'10a,11p'
>>> re.match(fruit_regex, fruit_string).group(1)
'11'

(?= ...)is a look ahead:

(?= ...)展望未来:

Matches if ... matches next, but doesn't consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it's followed by 'Asimov'.

匹配 if ... 匹配 next,但不消耗任何字符串。这称为先行断言。例如,Isaac (?=Asimov) 仅在后跟 'Asimov' 时才匹配 'Isaac '。

.*?(?P<pears>\d+)pfind a number followed a p anywhere in the string and name the number "pears"

.*?(?P<pears>\d+)p在字符串中的任意位置找到一个跟在 ap 后面的数字,并将该数字命名为“梨”