Python正则表达式多次匹配

Question

提问by mavili

I'm trying to match a pattern against strings that could have multiple instances of the pattern. I need every instance separately. re.findall()shoulddo it but I don't know what I'm doing wrong.

我正在尝试将模式与可能具有该模式的多个实例的字符串进行匹配。我需要单独的每个实例。re.findall()应该这样做，但我不知道我做错了什么。

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)
match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')

I need 'http://url.com/123', http://url.com/456and the two numbers 123 & 456 to be different elements of the matchlist.

我需要“ http://url.com/123”，http://url.com/456和两个数123 456是不同的元素match列表。

I have also tried '/review: ((http://url.com/(\d+)\s?)+)/'as the pattern, but no luck.

我也试过'/review: ((http://url.com/(\d+)\s?)+)/'这种模式，但没有运气。

Answer 1

采纳答案by Narendra Yadala

Use this. You need to place 'review' outside the capturing group to achieve the desired result.

用这个。您需要将“”放在捕获组之外以达到预期的结果。

pattern = re.compile(r'(?:review: )?(http://url.com/(\d+))\s?', re.IGNORECASE)

This gives output

这给出了输出

>>> match = pattern.findall('this is the message. review: http://url.com/123 http://url.com/456')
>>> match
[('http://url.com/123', '123'), ('http://url.com/456', '456')]

Answer 2

回答by John Montgomery

You've got extra /'s in the regex. In python the pattern should just be a string. e.g. instead of this:

您在正则表达式中有额外的 / 。在python中，模式应该只是一个字符串。例如，而不是这个：

pattern = re.compile('/review: (http://url.com/(\d+)\s?)+/', re.IGNORECASE)

It should be:

它应该是：

pattern = re.compile('review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

Also typically in python you'd actually use a "raw" string like this:

通常在 python 中，您实际上会使用这样的“原始”字符串：

pattern = re.compile(r'review: (http://url.com/(\d+)\s?)+', re.IGNORECASE)

The extra r on the front of the string saves you from having to do lots of backslash escaping etc.

字符串前面的额外 r 使您不必进行大量反斜杠转义等。

Answer 3

回答by til_b

Use a two-step approach: First get everything from "review:" to EOL, then tokenize that.

使用两步法：首先获取从“评论：”到 EOL 的所有内容，然后对其进行标记。

msg = 'this is the message. review: http://url.com/123 http://url.com/456'

review_pattern = re.compile('.*review: (.*)$')
urls = review_pattern.findall(msg)[0]

url_pattern = re.compile("(http://url.com/(\d+))")
url_pattern.findall(urls)

Python正则表达式多次匹配

提问by mavili

采纳答案by Narendra Yadala

回答by John Montgomery

回答by til_b

相关推荐

最近更新

标签

Python正则表达式多次匹配

提问by mavili

采纳答案by Narendra Yadala

回答by John Montgomery

回答by til_b

相关推荐

Python 'numpy.ndarray' 对象没有属性 'remove'

Python Argv - 字符串转换为整数

Python pandas.qcut 和 pandas.cut 有什么区别？

Python 在二维矩阵中查找值的索引

相关推荐

最近更新

标签