Python正则表达式多次匹配
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17407691/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python regex to match multiple times
提问by mavili
I'm trying to match a pattern against strings that could have multiple instances of the pattern. I need every instance separately. re.findall()
shoulddo it but I don't know what I'm doing wrong.
我正在尝试将模式与可能具有该模式的多个实例的字符串进行匹配。我需要单独的每个实例。re.findall()
应该这样做,但我不知道我做错了什么。
pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
I need 'http://url.com/123', http://url.com/456and the two numbers 123 & 456 to be different elements of the match
list.
我需要“ http://url.com/123”,http://url.com/456和两个数123 456是不同的元素match
列表。
I have also tried '/review: ((http://url.com/(\d+)\s?)+)/'
as the pattern, but no luck.
我也试过'/review: ((http://url.com/(\d+)\s?)+)/'
这种模式,但没有运气。
采纳答案by Narendra Yadala
Use this. You need to place 'review' outside the capturing group to achieve the desired result.
用这个。您需要将“”放在捕获组之外以达到预期的结果。
pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)
This gives output
这给出了输出
>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]
回答by John Montgomery
You've got extra /'s in the regex. In python the pattern should just be a string. e.g. instead of this:
您在正则表达式中有额外的 / 。在python中,模式应该只是一个字符串。例如,而不是这个:
pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
It should be:
它应该是:
pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)
Also typically in python you'd actually use a "raw" string like this:
通常在 python 中,您实际上会使用这样的“原始”字符串:
pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)
The extra r on the front of the string saves you from having to do lots of backslash escaping etc.
字符串前面的额外 r 使您不必进行大量反斜杠转义等。
回答by til_b
Use a two-step approach: First get everything from "review:" to EOL, then tokenize that.
使用两步法:首先获取从“评论:”到 EOL 的所有内容,然后对其进行标记。
msg = 'this is the message. review: http://url.com/123 http://url.com/456'
review_pattern = re.compile('.*review: (.*)$')
urls = review_pattern.findall(msg)[0]
url_pattern = re.compile("(http://url.com/(\d+))")
url_pattern.findall(urls)