php 如何使用正则表达式使点匹配换行符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1985941/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 04:32:41  来源:igfitidea点击:

How to make dot match newline characters using regular expressions

phpregex

提问by Mark Byers

I have a string that contains normal characters, white charsets and newline characters between and . This regular expression doesn't work: /<div>(.*)<\/div>. It is because .*doesn't match newline characters. My question is, how to do this?

我有一个字符串,其中包含 和 之间的普通字符、白色字符集和换行符。此正则表达式不起作用:/<div>(.*)<\/div>. 这是因为.*不匹配换行符。我的问题是,如何做到这一点?

回答by Mark Byers

You need to use the DOTALLmodifier.

您需要使用DOTALL修饰符。

'/<div>(.*)<\/div>/s'

This might not give you exactly what you want because you are greedy matching. You might instead try a non-greedy match:

这可能不会给你你想要的东西,因为你是贪婪的匹配。您可以改为尝试非贪婪匹配:

'/<div>(.*?)<\/div>/s'

You could also solve this by matching everything except '<' if there aren't other tags:

如果没有其他标签,您也可以通过匹配除“<”之外的所有内容来解决此问题:

'/<div>([^<]*)<\/div>/'

Another observation is that you don't need to use /as your regular expression delimiters. Using another character means that you don't have to escape the /in </div>, improving readability. This applies to all the above regular expressions. Here's it would look if you use '#' instead of '/':

另一个观察结果是您不需要使用/正则表达式分隔符。使用另一个字符意味着您不必转义/in </div>,从而提高可读性。这适用于上述所有正则表达式。如果您使用 '#' 而不是 '/',则如下所示:

'#<div>([^<]*)</div>#'

However all these solutions can fail due to nested divs, extra whitespace, HTML comments and various other things. HTML is too complicated to parse with Regex, so you should consider using an HTML parser instead.

然而,所有这些解决方案都可能由于嵌套的 div、额外的空格、HTML 注释和其他各种原因而失败。HTML 太复杂,无法使用 Regex 进行解析,因此您应该考虑改用 HTML 解析器。

回答by Hossein

to match all characters you can use this trick:

要匹配所有字符,您可以使用此技巧:

%\<div\>([\s\S]*)\</div\>%

回答by acarlon

I know that this is an old one, but since I stumbled across it recently. You can also use the (?s)mode modifier. E.g.

我知道这是一个旧的,但因为我最近偶然发现了它。您还可以使用(?s)模式修饰符。例如

(?s)/<div>(.*?)<\/div>

回答by MillerMedia

An option would be:

一个选项是:

'/<div>(\n*|.*)<\/div>/i'

Which would match eithernewline or the dot identifier matches.

这将匹配任何新行或点标识符匹配。

回答by DavidsKanal

Maybe I'm missing the obvious, but is there any problem with just doing

也许我错过了显而易见的事情,但是这样做有什么问题吗

(.|\n)

? This matches either any character except newline ora newline, so every character. Solved it for me, at least.

? 这匹配除换行符换行符以外的任何字符,因此每个字符。至少为我解决了。

回答by pau.estalella

There is usually a flag in the regular expression compiler to tell it that dot should match newline characters.

正则表达式编译器中通常有一个标志来告诉它点应该匹配换行符。