pythons re.compile(r' pattern flags') 中的“r”是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21104476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does the "r" in pythons re.compile(r' pattern flags') mean?
提问by user61629
I am reading through http://docs.python.org/2/library/re.html. According to this the "r" in pythons re.compile(r' pattern flags') refers the raw string notation :
我正在阅读http://docs.python.org/2/library/re.html。据此,pythons re.compile( r'pattern flags') 中的“r”指的是原始字符串表示法:
The solution is to use Python's raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.
解决方案是对正则表达式模式使用 Python 的原始字符串表示法;在以“r”为前缀的字符串文字中,不会以任何特殊方式处理反斜杠。所以 r"\n" 是包含 '\' 和 'n' 的两字符字符串,而 "\n" 是包含换行符的单字符字符串。通常模式将使用这种原始字符串表示法在 Python 代码中表示。
Would it be fair to say then that:
那么这样说是否公平:
re.compile(rpattern) means that "pattern" is a regex while, re.compile(pattern) means that "pattern" is an exact match?
re.compile( rpattern) 表示“pattern”是正则表达式,而 re.compile(pattern) 表示“pattern”是完全匹配?
采纳答案by Peter Gibson
As @PauloBustated, the rstring prefix is not specifically related to regex's, but to strings generally in Python.
如前所述@PauloBu,r字符串前缀与正则表达式没有特别的关系,但通常与 Python 中的字符串相关。
Normal strings use the backslash character as an escape character for special characters (like newlines):
普通字符串使用反斜杠字符作为特殊字符(如换行符)的转义字符:
>>> print 'this is \n a test'
this is
a test
The rprefix tells the interpreter not to do this:
该r前缀告诉解释不这样做:
>>> print r'this is \n a test'
this is \n a test
>>>
This is important in regular expressions, as you need the backslash to make it to the remodule intact - in particular, \bmatches empty string specifically at the start and end of a word. reexpects the string \b, however normal string interpretation '\b'is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'), or tell python it is a raw string (r'\b').
这在正则表达式中很重要,因为您需要反斜杠才能使其re完整地连接到模块 - 特别是\b在单词的开头和结尾匹配空字符串。re需要 string \b,但是正常的字符串解释'\b'被转换为 ASCII 退格字符,因此您需要明确地转义反斜杠 ( '\\b'),或者告诉 python 它是一个原始字符串 ( r'\b')。
>>> import re
>>> re.findall('\b', 'test') # the backslash gets consumed by the python string interpreter
[]
>>> re.findall('\b', 'test') # backslash is explicitly escaped and is passed through to re module
['', '']
>>> re.findall(r'\b', 'test') # often this syntax is easier
['', '']
回答by Peter Gibson
No, as the documentation pasted in explains the rprefix to a string indicates that the string is a raw string.
不,因为粘贴的文档解释r了字符串的前缀表明该字符串是raw string.
Because of the collisions between Python escaping of characters and regex escaping, both of which use the back-slash \character, raw strings provide a way to indicate to python that you want an unescaped string.
由于 Python 字符转义和 regex 转义之间的冲突,两者都使用反斜杠\字符,因此原始字符串提供了一种向 python 指示您想要未转义字符串的方法。
Examine the following:
检查以下内容:
>>> "\n"
'\n'
>>> r"\n"
'\n'
>>> print "\n"
>>> print r"\n"
\n
Prefixing with an rmerely indicates to the string that backslashes \should be treated literally and not as escape characters for python.
以 a 为前缀r仅指示字符串\应按字面意思处理反斜杠,而不是将其视为 Python 的转义字符。
This is helpful, when for example you are searching on a word boundry. The regex for this is \b, however to capture this in a Python string, I'd need to use "\\b"as the pattern. Instead, I can use the raw string: r"\b"to pattern match on.
这很有用,例如,当您在单词边界上进行搜索时。正则表达式是\b,但是要在 Python 字符串中捕获它,我需要将其"\\b"用作模式。相反,我可以使用原始字符串:r"\b"来模式匹配。
This becomes especially handy when trying to find a literal backslash in regex. To match a backslash in regex I need to use the pattern \\, to escape this in python means I need to escape eachslash and the pattern becomes "\\\\", or the much simpler r"\\".
当试图在正则表达式中查找文字反斜杠时,这变得特别方便。要匹配正则表达式中的反斜杠,我需要使用模式\\,在 python 中转义这意味着我需要转义每个斜杠,模式变为"\\\\",或者更简单r"\\"。
As you can guess in longer and more complex regexes, the extra slashes can get confusing, so raw strings are generally considered the way to go.
正如您在更长、更复杂的正则表达式中所猜测的那样,额外的斜杠可能会令人困惑,因此通常认为原始字符串是要走的路。
回答by John La Rooy
No. Not everything in regex syntax needs to be preceded by \, so ., *, +, etc still have special meaning in a pattern
在号正则表达式的语法不需要一切都在前面加\,所以.,*,+,等还是有特殊意义的图案
The r''is often used as a convenience for regex that doneed a lot of \as it prevents the clutter of doubling up the \
在r''经常被用来作为一个正则表达式是方便你需要的很多\,因为它防止增加了一倍的杂乱\

