Python 如何从正则表达式组中排除字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4108561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to exclude a character from a regex group?
提问by atp
I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?
我想从字符串(python)中去除除连字符之外的所有非字母数字字符。如何更改此正则表达式以匹配除连字符之外的任何非字母数字字符?
re.compile('[\W_]')
Thanks.
谢谢。
采纳答案by eldarerathis
You could just use a negated character class instead:
您可以只使用否定字符类:
re.compile(r"[^a-zA-Z0-9-]")
This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.
这将匹配不在字母数字范围或连字符中的任何内容。根据您当前的正则表达式,它还匹配下划线。
>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'
Notice that this also replaces spaces (which may certainly be what you want).
请注意,这也替换了空格(这肯定是您想要的)。
Edit:SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:
编辑:SilentGhost 建议引擎使用量词处理可能更便宜,在这种情况下,您可以简单地使用:
re.compile(r"[^a-zA-Z0-9-]+")
The +will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.
这+只会导致任何连续匹配的字符同时匹配(并被替换)。
回答by Ned Batchelder
\wmatches alphanumerics, add in the hyphen, then negate the entire set: r"[^\w-]"
\w匹配字母数字,添加连字符,然后否定整个集合:r"[^\w-]"

