Python 如何从正则表达式组中排除字符？

Question

提问by atp

I want to strip all non-alphanumeric characters EXCEPT the hyphen from a string (python). How can I change this regular expression to match any non-alphanumeric char except the hyphen?

我想从字符串（python）中去除除连字符之外的所有非字母数字字符。如何更改此正则表达式以匹配除连字符之外的任何非字母数字字符？

re.compile('[\W_]')

Thanks.

谢谢。

Answer 1

采纳答案by eldarerathis

You could just use a negated character class instead:

您可以只使用否定字符类：

re.compile(r"[^a-zA-Z0-9-]")

This will match anything that is not in the alphanumeric ranges or a hyphen. It also matches the underscore, as per your current regex.

这将匹配不在字母数字范围或连字符中的任何内容。根据您当前的正则表达式，它还匹配下划线。

>>> r = re.compile(r"[^a-zA-Z0-9-]")
>>> s = "some#%te_xt&with--##%--5 hy-phens  *#"
>>> r.sub("",s)
'sometextwith----5hy-phens'

Notice that this also replaces spaces (which may certainly be what you want).

请注意，这也替换了空格（这肯定是您想要的）。

Edit:SilentGhost has suggested it may likely be cheaper for the engine to process with a quantifier, in which case you can simply use:

编辑：SilentGhost 建议引擎使用量词处理可能更便宜，在这种情况下，您可以简单地使用：

re.compile(r"[^a-zA-Z0-9-]+")

The +will simply cause any runs of consecutively matched characters to all match (and be replaced) at the same time.

这+只会导致任何连续匹配的字符同时匹配（并被替换）。

Answer 2

回答by Ned Batchelder

\wmatches alphanumerics, add in the hyphen, then negate the entire set: r"[^\w-]"

\w匹配字母数字，添加连字符，然后否定整个集合：r"[^\w-]"

Python 如何从正则表达式组中排除字符？

提问by atp

采纳答案by eldarerathis

回答by Ned Batchelder

相关推荐

最近更新

标签

Python 如何从正则表达式组中排除字符？

提问by atp

采纳答案by eldarerathis

回答by Ned Batchelder

相关推荐

Python 具有大量数据的散点图

Python 返回工作日列表

python中的负零

杀死使用 Python 的 subprocess.Popen() 创建的进程

相关推荐

最近更新

标签