转义 Python 字符串中的正则表达式特殊字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4202538/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:44:42  来源:igfitidea点击:

Escape regex special characters in a Python string

pythonregexstringescaping

提问by Wolfy

Does Python have a function that I can use to escape special characters in a regular expression?

Python 是否有一个函数可以用来转义正则表达式中的特殊字符?

For example, I'm "stuck" :\should become I\'m \"stuck\" :\\.

例如,I'm "stuck" :\应该变成I\'m \"stuck\" :\\.

采纳答案by pyfunc

Use re.escape

re.escape

>>> import re
>>> re.escape(r'\ a.*$')
'\\\ a\.\*\$'
>>> print(re.escape(r'\ a.*$'))
\\ a\.\*$
>>> re.escape('www.stackoverflow.com')
'www\.stackoverflow\.com'
>>> print(re.escape('www.stackoverflow.com'))
www\.stackoverflow\.com

Repeating it here:

在这里重复一遍:

re.escape(string)

Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.

转义(字符串)

返回所有非字母数字反斜杠的字符串;如果您想匹配可能包含正则表达式元字符的任意文字字符串,这将非常有用。

As of Python 3.7 re.escape()was changed to escape only characters which are meaningful to regex operations.

从 Python 3.7 开始,re.escape()已更改为仅转义对正则表达式操作有意义的字符。

回答by poke

It's not that hard:

这并不难:

def escapeSpecialCharacters ( text, characters ):
    for character in characters:
        text = text.replace( character, '\' + character )
    return text

>>> escapeSpecialCharacters( 'I\'m "stuck" :\', '\'"' )
'I\\'m \"stuck\" :\'
>>> print( _ )
I\'m \"stuck\" :\

回答by dp_

Use repr()[1:-1]. In this case, the double quotes don't need to be escaped. The [-1:1] slice is to remove the single quote from the beginning and the end.

使用 repr()[1:-1]。在这种情况下,双引号不需要转义。[-1:1] 切片是去除开头和结尾的单引号。

>>> x = raw_input()
I'm "stuck" :\
>>> print x
I'm "stuck" :\
>>> print repr(x)[1:-1]
I\'m "stuck" :\

Or maybe you just want to escape a phrase to paste into your program? If so, do this:

或者您可能只是想转义一个短语以粘贴到您的程序中?如果是这样,请执行以下操作:

>>> raw_input()
I'm "stuck" :\
'I\'m "stuck" :\'

回答by Tim Ruddick

I'm surprised no one has mentioned using regular expressions via re.sub():

我很惊讶没有人提到通过re.sub()以下方式使用正则表达式:

import re
print re.sub(r'([\"])',    r'\', 'it\'s "this"')  # it's \"this\"
print re.sub(r"([\'])",    r'\', 'it\'s "this"')  # it\'s "this"
print re.sub(r'([\" \'])', r'\', 'it\'s "this"')  # it\'s\ \"this\"

Important things to note:

需要注意的重要事项:

  • In the searchpattern, include \as well as the character(s) you're looking for. You're going to be using \to escape your characters, so you need to escape thatas well.
  • Put parentheses around the searchpattern, e.g. ([\"]), so that the substitutionpattern can use the found character when it adds \in front of it. (That's what \1does: uses the value of the first parenthesized group.)
  • The rin front of r'([\"])'means it's a raw string. Raw strings use different rules for escaping backslashes. To write ([\"])as a plain string, you'd need to double all the backslashes and write '([\\"])'. Raw strings are friendlier when you're writing regular expressions.
  • In the substitutionpattern, you need to escape \to distinguish it from a backslash that precedes a substitution group, e.g. \1, hence r'\\\1'. To write thatas a plain string, you'd need '\\\\\\1'— and nobody wants that.
  • 搜索模式中,包括\您要查找的字符。你会使用\逃脱你的角色,所以你需要逃避 为好。
  • 将括号放在搜索模式周围,例如([\"]),这样替换模式可以在它\前面添加时使用找到的字符。(这就是 \1:使用第一个带括号的组的值。)
  • r前面r'([\"])'意味着它是一个原始字符串。原始字符串使用不同的规则来转义反斜杠。要写入([\"])纯字符串,您需要将所有反斜杠加倍并写入'([\\"])'. 编写正则表达式时,原始字符串更友好。
  • 替换模式中,您需要转义\以将其与替换组之前的反斜杠区分开来,例如\1,因此r'\\\1'。要把 写成一个普通的字符串,你需要'\\\\\\1'——而且没有人想要那样。

回答by spatar

As it was mentioned above, the answer depends on your case. If you want to escape a string for a regular expression then you should use re.escape(). But if you want to escape a specific set of characters then use this lambda function:

如上所述,答案取决于您的情况。如果要为正则表达式转义字符串,则应使用 re.escape()。但是如果你想转义一组特定的字符,那么使用这个 lambda 函数:

>>> escape = lambda s, escapechar, specialchars: "".join(escapechar + c if c in specialchars or c == escapechar else c for c in s)
>>> s = raw_input()
I'm "stuck" :\
>>> print s
I'm "stuck" :\
>>> print escape(s, "\", ['"'])
I'm \"stuck\" :\

回答by Christoph Roeder

If you only want to replace some characters you could use this:

如果你只想替换一些字符,你可以使用这个:

import re

print re.sub(r'([\.\\+\*\?\[\^\]$\(\)\{\}\!\<\>\|\:\-])', r'\', "example string.")