如何在python中将多个正则表达式组合成一个?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42136040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to combine multiple regex into single one in python?
提问by Amit
I'm learning about regular expression. I don't know how to combine different regular expression to make a single generic regular expression.
我正在学习正则表达式。我不知道如何组合不同的正则表达式来制作一个通用的正则表达式。
I want to write a single regular expression which works for multiple cases. I know this is can be done with naive approach by using or" | "operator.
我想编写一个适用于多种情况的正则表达式。我知道这可以通过使用或“|”运算符以天真的方法完成。
I don't like this approach. Can anybody tell me better approach?
我不喜欢这种方法。有人能告诉我更好的方法吗?
回答by Lior Magen
You need to compile all your regex functions. Check this example:
您需要编译所有正则表达式函数。检查这个例子:
import re
re1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*'
re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*'
re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+'
re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*'
sentences = [string1, string2, string3, string4]
for sentence in sentences:
generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence)
回答by nigel222
To findall
with an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:
对于 findall
任意系列的 RE,您所要做的就是连接每个返回的匹配项列表:
re_list = [
'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question,
...
'\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question
]
matches = []
for r in re_list:
matches += re.findall( r, string)
For efficiency it would be better to use a list of compiled REs.
为了提高效率,最好使用已编译的 RE 列表。
Alternatively you could join the element RE strings using
或者,您可以使用加入元素 RE 字符串
generic_re = re.compile( '|'.join( re_list) )
回答by Karen McCulloch
I see lots of people are using pipes, but that seems to only match the first instance. If you want to match all, then try using lookaheads.
我看到很多人都在使用管道,但这似乎只匹配第一个实例。如果您想匹配所有内容,请尝试使用前瞻。
Example:
例子:
>>> fruit_string = "10a11p"
>>> fruit_regex = r'(?=.*?(?P<pears>\d+)p)(?=.*?(?P<apples>\d+)a)'
>>> re.match(fruit_regex, fruit_string).groupdict()
{'apples': '10', 'pears': '11'}
>>> re.match(fruit_regex, fruit_string).group(0)
'10a,11p'
>>> re.match(fruit_regex, fruit_string).group(1)
'11'
(?= ...)
is a look ahead:
(?= ...)
展望未来:
Matches if ... matches next, but doesn't consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it's followed by 'Asimov'.
匹配 if ... 匹配 next,但不消耗任何字符串。这称为先行断言。例如,Isaac (?=Asimov) 仅在后跟 'Asimov' 时才匹配 'Isaac '。
.*?(?P<pears>\d+)p
find a number followed a p anywhere in the string and name the number "pears"
.*?(?P<pears>\d+)p
在字符串中的任意位置找到一个跟在 ap 后面的数字,并将该数字命名为“梨”