匹配任何字符,包括 Python 正则表达式子表达式中的换行符,而不是全局

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33312175/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:11:17  来源:igfitidea点击:

matching any character including newlines in a Python regex subexpression, not globally

pythonregex

提问by Jason S

I want to use re.MULTILINEbut NOTre.DOTALL, so that I can have a regex that includes both an "any character" wildcard and the normal .wildcard that doesn't match newlines.

我想使用re.MULTILINENOTre.DOTALL,以便我可以拥有一个包含“任何字符”通配符和.不匹配换行符的普通通配符的正则表达式。

Is there a way to do this? What should I use to match any character in those instances that I want to include newlines?

有没有办法做到这一点?在我想要包含换行符的那些实例中,我应该使用什么来匹配任何字符?

采纳答案by Wiktor Stribi?ew

To match a newline, or "any symbol" without re.S/re.DOTALL, you may use any of the following:

要匹配换行符或不带re.S/ 的“任何符号” re.DOTALL,您可以使用以下任何一种:

[\s\S]
[\w\W]
[\d\D]

The main idea is that the opposite shorthand classes inside a character class match any symbol there is in the input string.

主要思想是字符类中的相反速记类匹配输入字符串中的任何符号。

Comparing it to (.|\s)and other variations with alternation, the character class solution is much more efficient as it involves much less backtracking (when used with a *or +quantifier). Compare the small example: it takes (?:.|\n)+45 steps to complete, and it takes [\s\S]+just 2 steps.

将它(.|\s)与具有交替的其他变体进行比较,字符类解决方案效率更高,因为它涉及的回溯要少得多(与 a*+量词一起使用时)。对比一下小例子:需要(?:.|\n)+45步才能完成,只需要[\s\S]+2步。