Python 正则表达式错误 - 无需重复

Question

提问by goh

I get an error message when I use this expression:

使用此表达式时收到错误消息：

re.sub(r"([^\s\w])(\s*)+","\1","...")

I checked the regex at RegExrand it returns .as expected. But when I try it in Python I get this error message:

我在RegExr检查了正则表达式，它.按预期返回。但是当我在 Python 中尝试时，我收到此错误消息：

raise error, v # invalid expression
sre_constants.error: nothing to repeat

Can someone please explain?

有人可以解释一下吗？

Answer 1

采纳答案by mb14

It seems to be a python bug (that works perfectly in vim). The source of the problem is the (\s*...)+ bit. Basically , you can't do (\s*)+which make sense , because you are trying to repeat something which can be null.

这似乎是一个 python 错误（在 vim 中完美运行）。问题的根源是 (\s*...)+ 位。基本上，你不能做(\s*)+有意义的事情，因为你试图重复一些可以为空的东西。

>>> re.compile(r"(\s*)+")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 180, in compile
    return _compile(pattern, flags)
  File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 233, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

However (\s*\1)should not be null, but we know it only because we know what's in \1. Apparently python doesn't ... that's weird.

但是(\s*\1)不应该为空，但我们知道它只是因为我们知道\1 中的内容。显然 python 没有......这很奇怪。

Answer 2

回答by Franklyn

That is a Python bug between "*" and special characters.

这是“*”和特殊字符之间的 Python 错误。

Instead of

代替

re.compile(r"\w*")

Try:

尝试：

re.compile(r"[a-zA-Z0-9]*")

It works, however does not make the same regular expression.

它可以工作，但是不会生成相同的正则表达式。

This bug seems to have been fixed between 2.7.5 and 2.7.6.

此错误似乎已在 2.7.5 和 2.7.6 之间修复。

Answer 3

回答by Ando Jurai

It's not only a Python bug with * actually, it can also happen when you pass a string as a part of your regular expression to be compiled, like ;

实际上，这不仅是带有 * 的 Python 错误，当您将字符串作为要编译的正则表达式的一部分传递时，也会发生这种情况，例如 ;

import re
input_line = "string from any input source"
processed_line= "text to be edited with {}".format(input_line)
target = "text to be searched"
re.search(processed_line, target)

this will cause an error if processed line contained some "(+)" for example, like you can find in chemical formulae, or such chains of characters. the solution is to escape but when you do it on the fly, it can happen that you fail to do it properly...

例如，如果处理的行包含一些“（+）”，这将导致错误，例如您可以在化学式或此类字符链中找到。解决办法是逃跑，但是当你在飞行中逃跑时，可能会发生你未能正确完成的情况......

Answer 4

回答by nealmcb

Beyond the bug that was discovered and fixed, I'll just note that the error message sre_constants.error: nothing to repeatis a bit confusing. I was trying to use r'?.*'as a pattern, and thought it was complaining for some strange reason about the *, but the problem is actually that ?is a way of saying "repeat zero or one times". So I needed to say r'\?.*'to match a literal ?

除了发现和修复的错误之外，我会注意到错误消息sre_constants.error: nothing to repeat有点令人困惑。我试图r'?.*'用作一种模式，并认为它出于某种奇怪的原因抱怨*，但问题实际上?是一种说“重复零次或一次”的方式。所以我需要说r'\?.*'匹配文字?

Answer 5

回答by Ayoub Arroub

regular expression normally uses * and + in theory of language. I encounter the same bug while executing the line code

正则表达式在语言理论中通常使用 * 和 +。我在执行行代码时遇到了同样的错误

re.split("*",text)

to solve it, it needs to include \ before * and +

要解决它，它需要在 * 和 + 之前包含 \

re.split("\*",text)

Python 正则表达式错误 - 无需重复

提问by goh

采纳答案by mb14

回答by Franklyn

回答by Ando Jurai

回答by nealmcb

回答by Ayoub Arroub

相关推荐

最近更新

标签

Python 正则表达式错误 - 无需重复

提问by goh

采纳答案by mb14

回答by Franklyn

回答by Ando Jurai

回答by nealmcb

回答by Ayoub Arroub

相关推荐

python中的中文和日文字符支持

Python 仅在字符串末尾删除子字符串

Python 如何导出virtualenv？

Python 编写一个程序，计算在 12 个月内还清信用卡余额所需的最低每月固定付款额

相关推荐

最近更新

标签