Python正则表达式多次匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17407691/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:07:10  来源:igfitidea点击:

Python regex to match multiple times

pythonregexmultiple-matches

提问by mavili

I'm trying to match a pattern against strings that could have multiple instances of the pattern. I need every instance separately. re.findall()shoulddo it but I don't know what I'm doing wrong.

我正在尝试将模式与可能具有该模式的多个实例的字符串进行匹配。我需要单独的每个实例。re.findall()应该这样做,但我不知道我做错了什么。

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')

I need 'http://url.com/123', http://url.com/456and the two numbers 123 & 456 to be different elements of the matchlist.

我需要“ http://url.com/123”,http://url.com/456和两个数123 456是不同的元素match列表。

I have also tried '/review: ((http://url.com/(\d+)\s?)+)/'as the pattern, but no luck.

我也试过'/review: ((http://url.com/(\d+)\s?)+)/'这种模式,但没有运气。

采纳答案by Narendra Yadala

Use this. You need to place 'review' outside the capturing group to achieve the desired result.

用这个。您需要将“”放在捕获组之外以达到预期的结果。

pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)

This gives output

这给出了输出

>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]

回答by John Montgomery

You've got extra /'s in the regex. In python the pattern should just be a string. e.g. instead of this:

您在正则表达式中有额外的 / 。在python中,模式应该只是一个字符串。例如,而不是这个:

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)

It should be:

它应该是:

pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

Also typically in python you'd actually use a "raw" string like this:

通常在 python 中,您实际上会使用这样的“原始”字符串:

pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

The extra r on the front of the string saves you from having to do lots of backslash escaping etc.

字符串前面的额外 r 使您不必进行大量反斜杠转义等。

回答by til_b

Use a two-step approach: First get everything from "review:" to EOL, then tokenize that.

使用两步法:首先获取从“评论:”到 EOL 的所有内容,然后对其进行标记。

msg = 'this is the message. review: http://url.com/123 http://url.com/456'

review_pattern = re.compile('.*review: (.*)$')
urls = review_pattern.findall(msg)[0]

url_pattern = re.compile("(http://url.com/(\d+))")
url_pattern.findall(urls)