如何在 Python 原始字符串中匹配换行符

Question

提问by wei

I got a little confused about Python raw string. I know that if we use raw string, then it will treat '\' as a normal backslash (ex. r'\n' would be '\' and 'n'). However, I was wondering what if I want to match a new line character in raw string. I tried r'\n', but it didn't work. Anybody has some good idea about this?

我对 Python 原始字符串有点困惑。我知道如果我们使用原始字符串，那么它会将 '\' 视为普通的反斜杠（例如 r'\n' 将是 '\' 和 'n'）。但是，我想知道如果我想在原始字符串中匹配一个新行字符怎么办。我试过 r'\n'，但没有用。有人对此有什么好主意吗？

Answer 1

采纳答案by mgilson

In a regular expression, you need to specify that you're in multiline mode:

在正则表达式中，您需要指定您处于多行模式：

>>> import re
>>> s = """cat
... dog"""
>>> 
>>> re.match(r'cat\ndog',s,re.M)
<_sre.SRE_Match object at 0xcb7c8>

Notice that retranslates the \n(raw string) into newline. As you indicated in your comments, you don't actually needre.Mfor it to match, but it does help with matching $and ^more intuitively:

请注意，re将\n（原始字符串）转换为换行符。正如您在评论中指出的，您实际上并不需要re.M匹配它，但它确实有助于匹配$并且^更直观：

>> re.match(r'^cat\ndog',s).group(0)
'cat\ndog'
>>> re.match(r'^cat$\ndog',s).group(0)  #doesn't match
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>> re.match(r'^cat$\ndog',s,re.M).group(0) #matches.
'cat\ndog'

Answer 2

回答by Gareth Latty

The simplest answer is to simply not use a raw string. You can escape backslashes by using \\.

最简单的答案是根本不使用原始字符串。您可以使用\\.

If you have huge numbers of backslashes in some segments, then you could concatenate raw strings and normal strings as needed:

如果某些段中有大量反斜杠，则可以根据需要连接原始字符串和普通字符串：

r"some string \ with \ backslashes" "\n"

(Python automatically concatenates string literals with only whitespace between them.)

（Python 自动连接字符串文字，它们之间只有空格。）

Remember if you are working with paths on Windows, the easiest option is to just use forward slashes - it will still work fine.

请记住，如果您在 Windows 上使用路径，最简单的选择是使用正斜杠 - 它仍然可以正常工作。

Answer 3

回答by Rajat Subhra Bhowmick

def clean_with_puncutation(text):    
    from string import punctuation
    import re
    punctuation_token={p:'<PUNC_'+p+'>' for p in punctuation}
    punctuation_token['<br/>']="<TOKEN_BL>"
    punctuation_token['\n']="<TOKEN_NL>"
    punctuation_token['<EOF>']='<TOKEN_EOF>'
    punctuation_token['<SOF>']='<TOKEN_SOF>'
  #punctuation_token



    regex = r"(<br/>)|(<EOF>)|(<SOF>)|[\n\!\@\#$\%\^\&\*\(\)\[\]\
           {\}\;\:\,\.\/\?\|\`\_\+\\=\~\-\<\>]"

###Always put new sequence token at front to avoid overlapping results
 #text = '<EOF>!@#$%^&*()[]{};:,./<>?\|`~-= _+\<br/>\n <SOF>\ '
    text_=""

    matches = re.finditer(regex, text)

    index=0

    for match in matches:
     #print(match.group())
     #print(punctuation_token[match.group()])
     #print ("Match at index: %s, %s" % (match.start(), match.end()))
        text_=text_+ text[index:match.start()] +" " 
              +punctuation_token[match.group()]+ " "
        index=match.end()
    return text_

Answer 4

回答by Mohammad Hossein zare mehrjard

you also can use [\r\n] for matching to new line

您也可以使用 [\r\n] 匹配新行

如何在 Python 原始字符串中匹配换行符

提问by wei

采纳答案by mgilson

回答by Gareth Latty

回答by Rajat Subhra Bhowmick

回答by Mohammad Hossein zare mehrjard

相关推荐

最近更新

标签

如何在 Python 原始字符串中匹配换行符

提问by wei

采纳答案by mgilson

回答by Gareth Latty

回答by Rajat Subhra Bhowmick

回答by Mohammad Hossein zare mehrjard

相关推荐

Python 从openpyxl中的坐标值获取行号和列号

python pandas从时间序列中提取唯一的日期

Python 如何允许列表 append() 方法返回新列表

如何在python中重置全局变量？

相关推荐

最近更新

标签