在 PHP preg_replace 函数中包含新行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/695633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:32:59  来源:igfitidea点击:

Including new lines in PHP preg_replace function

phpregexnewline

提问by DisgruntledGoat

I'm trying to match a string that may appear over multiple lines. It starts and ends with a specific string:

我正在尝试匹配可能出现在多行中的字符串。它以特定字符串开头和结尾:

{a}some string
can be multiple lines
{/a}

Can I grab everything between {a}and {/a}with a regex? It seems the . doesn't match new lines, but I've tried the following with no luck:

我可以抓住之间的一切{a},并{/a}用正则表达式?看来 . 不匹配新行,但我尝试了以下但没有运气:

$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/', 'X', $template, -1, $count );
echo $count; // prints 0

It matches . or \n when they're on their own, but not together!

它匹配 。或 \n 当他们独自一人但不在一起时!

回答by strager

Use the smodifier:

使用s修饰符

$template = preg_replace( $'/\{a\}([.\n]+)\{\/a\}/s', 'X', $template, -1, $count );
//                                                ^
echo $count;

回答by Alan Moore

I think you've got more problems than just the dot not matching newlines, but let me start with a formatting recommendation. You can use just about any punctuation character as the regex delimiter, not just the slash ('/'). If you use another character, you won't have to escape slashes within the regex. I understand '%' is popular among PHPers; that would make your pattern argument:

我认为您遇到的问题不仅仅是点不匹配换行符,但让我从格式建议开始。您几乎可以使用任何标点符号作为正则表达式分隔符,而不仅仅是斜杠 ('/')。如果您使用其他字符,则不必在正则表达式中转义斜杠。我知道 '%' 在 PHPers 中很流行;这将使您的模式论点:

'%\{a\}([.\n]+)\{/a\}%'

Now, the reason that regex didn't work as you intended is because the dot loses its special meaning when it appears inside a character class (the square brackets)--so [.\n]just matches a dot or a linefeed. What you were looking for was (?:.|\n), but I would have recommended matching the carriage-return as well as the linefeed:

现在,正则表达式没有按照您的预期工作的原因是当它出现在字符类(方括号)中时,点失去了它的特殊含义——所以[.\n]只匹配一个点或换行符。你要找的是(?:.|\n),但我建议匹配回车和换行:

'%\{a\}((?:.|[\r\n])+)\{/a\}%'

That's because the word "newline" can refer to the Unix-style "\n", Windows-style "\r\n", or older-Mac-style "\r". Any given web page may contain any of those or a mixture of two or more styles; a mix of "\n" and "\r\n" is very common. But with /s mode (also known as single-line or DOTALL mode), you don't need to worry about that:

那是因为“换行符”一词可以指代 Unix 风格的“\n”、Windows 风格的“\r\n”或较旧的 Mac 风格的“\r”。任何给定的网页都可能包含其中任何一种或两种或多种风格的混合;"\n" 和 "\r\n" 的混合是很常见的。但是使用 /s 模式(也称为单行或 DOTALL 模式),您无需担心:

'%\{a\}(.+)\{/a\}%s'

However, there's another problem with the original regex that's still present in this one: the +is greedy. That means, if there's more than one {a}...{/a}sequence in the text, the first time your regex is applied it will match all of them, from the first {a}to the last {/a}. The simplest way to fix that is to make the +ungreedy (a.k.a, "lazy" or "reluctant") by appending a question mark:

但是,原始正则表达式还有另一个问题,它仍然存在于这个正则表达式中:+贪婪。这意味着,如果{a}...{/a}文本中有多个序列,则第一次应用正则表达式时,它将匹配所有序列,从第一个{a}到最后一个{/a}。解决这个问题的最简单方法是+通过附加一个问号来使不贪婪(又名,“懒惰”或“不情愿”):

'%\{a\}(.+?)\{/a\}%s'

Finally, I don't know what to make of the '$' before the opening quote of your pattern argument. I don't do PHP, but that looks like a syntax error to me. If someone could educate me in this matter, I'd appreciate it.

最后,我不知道在您的模式参数的开头引用之前如何处理 '$'。我不使用 PHP,但这对我来说似乎是一个语法错误。如果有人能在这方面教育我,我将不胜感激。

回答by John T

From http://www.regular-expressions.info/dot.html:

http://www.regular-expressions.info/dot.html

"The dot matches a single character, without caring what that character is. The only exception are newline characters."

“点匹配单个字符,而不关心该字符是什么。唯一的例外是换行符。”

you will need to add a trailing /s flag to your expression.

您需要在表达式中添加尾随 /s 标志。