如何在 Python 原始字符串中匹配换行符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14689531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to match a new line character in Python raw string
提问by wei
I got a little confused about Python raw string. I know that if we use raw string, then it will treat '\' as a normal backslash (ex. r'\n' would be '\' and 'n'). However, I was wondering what if I want to match a new line character in raw string. I tried r'\n', but it didn't work. Anybody has some good idea about this?
我对 Python 原始字符串有点困惑。我知道如果我们使用原始字符串,那么它会将 '\' 视为普通的反斜杠(例如 r'\n' 将是 '\' 和 'n')。但是,我想知道如果我想在原始字符串中匹配一个新行字符怎么办。我试过 r'\n',但没有用。有人对此有什么好主意吗?
采纳答案by mgilson
In a regular expression, you need to specify that you're in multiline mode:
在正则表达式中,您需要指定您处于多行模式:
>>> import re
>>> s = """cat
... dog"""
>>>
>>> re.match(r'cat\ndog',s,re.M)
<_sre.SRE_Match object at 0xcb7c8>
Notice that retranslates the \n(raw string) into newline. As you indicated in your comments, you don't actually needre.Mfor it to match, but it does help with matching $and ^more intuitively:
请注意,re将\n(原始字符串)转换为换行符。正如您在评论中指出的,您实际上并不需要re.M匹配它,但它确实有助于匹配$并且^更直观:
>> re.match(r'^cat\ndog',s).group(0)
'cat\ndog'
>>> re.match(r'^cat$\ndog',s).group(0) #doesn't match
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>> re.match(r'^cat$\ndog',s,re.M).group(0) #matches.
'cat\ndog'
回答by Gareth Latty
The simplest answer is to simply not use a raw string. You can escape backslashes by using \\.
最简单的答案是根本不使用原始字符串。您可以使用\\.
If you have huge numbers of backslashes in some segments, then you could concatenate raw strings and normal strings as needed:
如果某些段中有大量反斜杠,则可以根据需要连接原始字符串和普通字符串:
r"some string \ with \ backslashes" "\n"
(Python automatically concatenates string literals with only whitespace between them.)
(Python 自动连接字符串文字,它们之间只有空格。)
Remember if you are working with paths on Windows, the easiest option is to just use forward slashes - it will still work fine.
请记住,如果您在 Windows 上使用路径,最简单的选择是使用正斜杠 - 它仍然可以正常工作。
回答by Rajat Subhra Bhowmick
def clean_with_puncutation(text):
from string import punctuation
import re
punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation}
punctuation_token['<br/>']="<TOKEN_BL>"
punctuation_token['\n']="<TOKEN_NL>"
punctuation_token['<EOF>']='<TOKEN_EOF>'
punctuation_token['<SOF>']='<TOKEN_SOF>'
#punctuation_token
regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#$\%\^\&\*\(\)\[\]\
{\}\;\:\,\.\/\?\|\`\_\+\\=\~\-\<\>]"
###Always put new sequence token at front to avoid overlapping results
#text = '<EOF>!@#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ '
text_=""
matches = re.finditer(regex, text)
index=0
for match in matches:
#print(match.group())
#print(punctuation_token[match.group()])
#print ("Match at index: %s, %s" % (match.start(), match.end()))
text_=text_+ text[index:match.start()] +" "
+punctuation_token[match.group()]+ " "
index=match.end()
return text_
回答by Mohammad Hossein zare mehrjard
you also can use [\r\n] for matching to new line
您也可以使用 [\r\n] 匹配新行

