Python re 模块中的正则表达式是否支持单词边界 (\b)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3995034/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 13:45:38  来源:igfitidea点击:

Do regular expressions from the re module support word boundaries (\b)?

pythonregex

提问by D.C.

While trying to learn a little more about regular expressions, a tutorial suggested that you can use the \bto match a word boundary. However, the following snippet in the Python interpreter does not work as expected:

在尝试了解更多有关正则表达式的知识时,一个教程建议您可以使用\b来匹配单词边界。但是,Python 解释器中的以下代码段无法按预期工作:

>>> x = 'one two three'
>>> y = re.search("\btwo\b", x)

It should have been a match object if anything was matched, but it is None.

如果有任何匹配,它应该是一个匹配对象,但它是None.

Is the \bexpression not supported in Python or am I using it wrong?

\bPython 不支持该表达式还是我使用错误?

采纳答案by pyfunc

Why don't you try

你为什么不试试

word = 'two'
re.compile(r'\b%s\b' % word, re.I)

Output:

输出:

>>> word = 'two'
>>> k = re.compile(r'\b%s\b' % word, re.I)
>>> x = 'one two three'
>>> y = k.search( x)
>>> y
<_sre.SRE_Match object at 0x100418850>

Also forgot to mention, you should be using raw stringsin your code

也忘了提及,您应该在代码中使用原始字符串

>>> x = 'one two three'
>>> y = re.search(r"\btwo\b", x)
>>> y
<_sre.SRE_Match object at 0x100418a58>
>>> 

回答by Bolo

This will work: re.search(r"\btwo\b", x)

这将起作用: re.search(r"\btwo\b", x)

When you write "\b"in Python, it is a single character: "\x08". Either escape the backslash like this:

当您用"\b"Python编写时,它是单个字符:"\x08". 要么像这样逃避反斜杠:

"\b"

or write a raw string like this:

或者写一个像这样的原始字符串:

r"\b"

回答by Bill the Lizard

Just to explicitly explain whyre.search("\btwo\b", x)doesn't work, it's because \bin a Python string is shorthand for a backspace character.

只是为了明确解释为什么re.search("\btwo\b", x)不起作用,这是因为\b在 Python 中字符串是退格字符的简写。

print("foo\bbar")
fobar

So the pattern "\btwo\b"is looking for a backspace, followed by two, followed by another backspace, which the string you're searching in (x = 'one two three') doesn't have.

所以该模式"\btwo\b"正在寻找一个退格,然后是two,然后是另一个退格,您在 ( x = 'one two three') 中搜索的字符串没有。

To allow re.search(or compile) to interpret the sequence \bas a word boundary, either escape the backslashes ("\\btwo\\b") or use a raw string to create your pattern (r"\btwo\b").

要允许re.search(或compile)将序列解释\b为单词边界,请转义反斜杠 ( "\\btwo\\b") 或使用原始字符串来创建模式 ( r"\btwo\b")。