如何在python中将多个正则表达式组合成一个？

Question

提问by Amit

I'm learning about regular expression. I don't know how to combine different regular expression to make a single generic regular expression.

我正在学习正则表达式。我不知道如何组合不同的正则表达式来制作一个通用的正则表达式。

I want to write a single regular expression which works for multiple cases. I know this is can be done with naive approach by using or" | "operator.

我想编写一个适用于多种情况的正则表达式。我知道这可以通过使用或“|”运算符以天真的方法完成。

I don't like this approach. Can anybody tell me better approach?

我不喜欢这种方法。有人能告诉我更好的方法吗？

Answer 1

回答by Lior Magen

You need to compile all your regex functions. Check this example:

您需要编译所有正则表达式函数。检查这个例子：

import re
re1 = r'\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*'
re2 = '\d*[/]\d*[A-Z]*\d*\s[A-Z]*\d*[A-Z]*'
re3 = '[A-Z]*\d+[/]\d+[A-Z]\d+'
re4 = '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*'

sentences = [string1, string2, string3, string4]
for sentence in sentences:
    generic_re = re.compile("(%s|%s|%s|%s)" % (re1, re2, re3, re4)).findall(sentence)

Answer 2

回答by nigel222

To findallwith an arbitrary series of REs all you have to do is concatenate the list of matches which each returns:

对于 findall任意系列的 RE，您所要做的就是连接每个返回的匹配项列表：

re_list = [
    '\d+\.\d*[L][-]\d*\s[A-Z]*[/]\d*', # re1 in question,
    ...
    '\d+[/]\d+[A-Z]*\d+\s\d+[A-z]\s[A-Z]*', # re4 in question
]

matches = []
for r in re_list:
   matches += re.findall( r, string)

For efficiency it would be better to use a list of compiled REs.

为了提高效率，最好使用已编译的 RE 列表。

Alternatively you could join the element RE strings using

或者，您可以使用加入元素 RE 字符串

generic_re = re.compile( '|'.join( re_list) )

Answer 3

回答by Karen McCulloch

I see lots of people are using pipes, but that seems to only match the first instance. If you want to match all, then try using lookaheads.

我看到很多人都在使用管道，但这似乎只匹配第一个实例。如果您想匹配所有内容，请尝试使用前瞻。

Example:

例子：

>>> fruit_string = "10a11p" 
>>> fruit_regex = r'(?=.*?(?P<pears>\d+)p)(?=.*?(?P<apples>\d+)a)'
>>> re.match(fruit_regex, fruit_string).groupdict()
{'apples': '10', 'pears': '11'}
>>> re.match(fruit_regex, fruit_string).group(0)
'10a,11p'
>>> re.match(fruit_regex, fruit_string).group(1)
'11'

(?= ...)is a look ahead:

(?= ...)展望未来：

Matches if ... matches next, but doesn't consume any of the string. This is called a lookahead assertion. For example, Isaac (?=Asimov) will match 'Isaac ' only if it's followed by 'Asimov'.

匹配 if ... 匹配 next，但不消耗任何字符串。这称为先行断言。例如，Isaac (?=Asimov) 仅在后跟 'Asimov' 时才匹配 'Isaac '。

.*?(?P<pears>\d+)pfind a number followed a p anywhere in the string and name the number "pears"

.*?(?P<pears>\d+)p在字符串中的任意位置找到一个跟在 ap 后面的数字，并将该数字命名为“梨”

如何在python中将多个正则表达式组合成一个？

提问by Amit

回答by Lior Magen

回答by nigel222

回答by Karen McCulloch

相关推荐

最近更新

标签

如何在python中将多个正则表达式组合成一个？

提问by Amit

回答by Lior Magen

回答by nigel222

回答by Karen McCulloch

相关推荐

Python VSCode -- 如何设置调试工作目录

Python - 请求，Selenium - 登录时传递 cookie

Python Tensorflow ValueError：没有要保存的变量

Python 'float' 对象没有属性 'astype'

相关推荐

最近更新

标签