Python re 模块中的正则表达式是否支持单词边界 (\b)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3995034/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Do regular expressions from the re module support word boundaries (\b)?
提问by D.C.
While trying to learn a little more about regular expressions, a tutorial suggested that you can use the \bto match a word boundary. However, the following snippet in the Python interpreter does not work as expected:
在尝试了解更多有关正则表达式的知识时,一个教程建议您可以使用\b来匹配单词边界。但是,Python 解释器中的以下代码段无法按预期工作:
>>> x = 'one two three'
>>> y = re.search("\btwo\b", x)
It should have been a match object if anything was matched, but it is None.
如果有任何匹配,它应该是一个匹配对象,但它是None.
Is the \bexpression not supported in Python or am I using it wrong?
\bPython 不支持该表达式还是我使用错误?
采纳答案by pyfunc
Why don't you try
你为什么不试试
word = 'two'
re.compile(r'\b%s\b' % word, re.I)
Output:
输出:
>>> word = 'two'
>>> k = re.compile(r'\b%s\b' % word, re.I)
>>> x = 'one two three'
>>> y = k.search( x)
>>> y
<_sre.SRE_Match object at 0x100418850>
Also forgot to mention, you should be using raw stringsin your code
>>> x = 'one two three'
>>> y = re.search(r"\btwo\b", x)
>>> y
<_sre.SRE_Match object at 0x100418a58>
>>>
回答by Bolo
This will work: re.search(r"\btwo\b", x)
这将起作用: re.search(r"\btwo\b", x)
When you write "\b"in Python, it is a single character: "\x08". Either escape the backslash like this:
当您用"\b"Python编写时,它是单个字符:"\x08". 要么像这样逃避反斜杠:
"\b"
or write a raw string like this:
或者写一个像这样的原始字符串:
r"\b"
回答by Bill the Lizard
Just to explicitly explain whyre.search("\btwo\b", x)doesn't work, it's because \bin a Python string is shorthand for a backspace character.
只是为了明确解释为什么re.search("\btwo\b", x)不起作用,这是因为\b在 Python 中字符串是退格字符的简写。
print("foo\bbar")
fobar
So the pattern "\btwo\b"is looking for a backspace, followed by two, followed by another backspace, which the string you're searching in (x = 'one two three') doesn't have.
所以该模式"\btwo\b"正在寻找一个退格,然后是two,然后是另一个退格,您在 ( x = 'one two three') 中搜索的字符串没有。
To allow re.search(or compile) to interpret the sequence \bas a word boundary, either escape the backslashes ("\\btwo\\b") or use a raw string to create your pattern (r"\btwo\b").
要允许re.search(或compile)将序列解释\b为单词边界,请转义反斜杠 ( "\\btwo\\b") 或使用原始字符串来创建模式 ( r"\btwo\b")。

