Python多次重复错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19942314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:02:09  来源:igfitidea点击:

Python multiple repeat Error

pythonregex

提问by Presen

I'm trying to determine whether a term appears in a string.
Before and after the term must appear a space, and a standard suffix is also allowed.
Example:

我正在尝试确定一个术语是否出现在字符串中。
术语前后必须出现空格,也允许使用标准后缀。
例子:

term: google
string: "I love google!!! "
result: found

term: dog
string: "I love dogs "
result: found

I'm trying the following code:

我正在尝试以下代码:

regexPart1 = "\s"
regexPart2 = "(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s"  
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)

and get the error:

并得到错误:

raise error("multiple repeat")
sre_constants.error: multiple repeat

Update
Real code that fails:

更新
失败的真实代码:

term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"\s"
regexPart2 = r"(?:s|'s|!+|,|.|;|:|\(|\)|\"|\?+)?\s" 
p = re.compile(regexPart1 + term + regexPart2 , re.IGNORECASE)

On the other hand, the following termpasses smoothly (+instead of ++)

另一方面,以下term顺利通过(+而不是++

term = 'lg incite" OR author:"http+www.dealitem.com" OR "for sale'

采纳答案by abarnert

The problem is that, in a non-raw string, \"is ".

问题在于,在非原始字符串中,\"".

You get lucky with all of your other unescaped backslashes—\sis the same as \\s, not s; \(is the same as \\(, not (, and so on. But you should never rely on getting lucky, or assuming that you know the whole list of Python escape sequences by heart.

您对所有其他未转义的反斜杠\s很幸运— 与相同\\s,而不是s\(\\(、 不 等相同(。但是你永远不应该依赖运气,或者假设你已经记住了 Python 转义序列的整个列表。

Either print out your string and escape the backslashes that get lost (bad), escape allof your backslashes (OK), or just use raw strings in the first place (best).

要么打印出您的字符串并转义丢失的反斜杠(坏),转义所有反斜杠(好),要么首先使用原始字符串(最好)。



That being said, your regexp as posted won't match some expressions that it should, but it will never raise that "multiple repeat"error. Clearly, your actual code is different from the code you've shown us, and it's impossible to debug code we can't see.

话虽如此,您发布的正则表达式将与它应该匹配的某些表达式不匹配,但它永远不会引发该"multiple repeat"错误。显然,您的实际代码与您向我们展示的代码不同,不可能调试我们看不到的代码。



Now that you've shown a real reproducible test case, that's a separateproblem.

现在您已经展示了一个真正可重现的测试用例,这是一个单独的问题。

You're searching for terms that may have special regexp characters in them, like this:

您正在搜索可能包含特殊正则表达式字符的术语,如下所示:

term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'

That p++in the middle of a regexp means "1 or more of 1 or more of the letter p" (in the others, the same as "1 or more of the letter p") in some regexp languages, "always fail" in others, and "raise an exception" in others. Python's refalls into the last group. In fact, you can test this in isolation:

p++在正则表达式来中间的“1以上的1个或多个字母P的”(在其他情况下,同为“1个或多个字母P的”),在一些正则表达式语言,“总是失败”的人,并在其他人中“提出例外”。Pythonre属于最后一组。事实上,您可以单独测试:

>>> re.compile('p++')
error: multiple repeat

If you want to put random strings into a regexp, you need to call re.escapeon them.

如果要将随机字符串放入正则表达式,则需要调用re.escape它们。



One more problem (thanks to Ωmega):

还有一个问题(感谢Ωmega):

.in a regexp means "any character". So, ,|.|;|:"(I've just extracted a short fragment of your longer alternation chain) means "a comma, or any character, or a semicolon, or a colon"… which is the same as "any character". You probably wanted to escape the ..

.在正则表达式中表示“任何字符”。因此,,|.|;|:"(我刚刚提取了较长交替链的一小段)表示“逗号、任何字符、分号或冒号”……与“任何字符”相同。你可能想逃避..



Putting all three fixes together:

将所有三个修复程序放在一起:

term = 'lg incite" OR author:"http++www.dealitem.com" OR "for sale'
regexPart1 = r"\s"
regexPart2 = r"(?:s|'s|!+|,|\.|;|:|\(|\)|\"|\?+)?\s"  
p = re.compile(regexPart1 + re.escape(term) + regexPart2 , re.IGNORECASE)


As Ωmega also pointed out in a comment, you don't need to use a chain of alternations if they're all one character long; a character class will do just as well, more concisely and more readably.

正如 Ωmega 在评论中还指出的那样,如果它们都是一个字符长,则不需要使用一系列交替;字符类也一样,更简洁,更易读。

And I'm sure there are other ways this could be improved.

而且我确信还有其他方法可以改进。

回答by Patrick

The other answer is great, but I would like to point out that using regular expressions to find strings in other strings is not the best way to go about it. In python simply write:

另一个答案很好,但我想指出,使用正则表达式在其他字符串中查找字符串并不是最好的方法。在python中简单地写:

    if term in string:
         #do whatever