Python 如何从正则表达式组中排除字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4108561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 14:17:16  来源:igfitidea点击:

How to exclude a character from a regex group?

pythonregex

提问by atp

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?

我想从字符串(python)中去除除连字符之外的所有非字母数字字符。如何更改此正则表达式以匹配除连字符之外的任何非字母数字字符?

re.compile('[\W_]')

Thanks.

谢谢。

采纳答案by eldarerathis

You could just use a negated character class instead:

您可以只使用否定字符类:

re.compile(r"[^a-zA-Z0-9-]")

This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.

这将匹配不在字母数字范围或连字符中的任何内容。根据您当前的正则表达式,它还匹配下划线。

>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens  *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'

Notice that this also replaces spaces (which may certainly be what you want).

请注意,这也替换了空格(这肯定是您想要的)。



Edit:SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:

编辑:SilentGhost 建议引擎使用量词处理可能更便宜,在这种情况下,您可以简单地使用:

re.compile(r"[^a-zA-Z0-9-]+")

The +will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.

+只会导致任何连续匹配的字符同时匹配(并被替换)。

回答by Ned Batchelder

\wmatches alphanumerics, add in the hyphen, then negate the entire set: r"[^\w-]"

\w匹配字母数字,添加连字符,然后否定整个集合:r"[^\w-]"