php 当模式涉及美元符号 ($) 时,正则表达式失败

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5358010/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 21:08:39  来源:igfitidea点击:

Regex failing when pattern involves dollar sign ($)

phpregex

提问by Mr. Llama

I'm running into a bit of an issue when it comes to matching subpatterns that involve the dollar sign. For example, consider the following chunk of text:

在匹配涉及美元符号的子模式时,我遇到了一些问题。例如,考虑以下文本块:

Regular Price: .50       Final Price: .20
Regular Price: .99       Final Price: .25
Regular Price: .22       Final Price: .44
Regular Price: .66       Final Price: .88

I was attempting to match the Regular/Final price sets with the following regex, but it simply wasn't working (no matches at all):
preg_match_all("/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U", $data, $matches);

我试图将常规/最终价格集与以下正则表达式相匹配,但它根本不起作用(根本没有匹配项):
preg_match_all("/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U", $data, $matches);

I escaped the dollar sign, so what gives?

我逃脱了美元符号,那是什么?

回答by Mark Byers

Inside a double quoted string the backslash is treated as an escape character for the $. The backslash is removed by the PHP parser even before the preg_match_allfunction sees it:

在双引号字符串中,反斜杠被视为$. PHP 解析器甚至在preg_match_all函数看到它之前就删除了反斜杠:

$r = "/Regular Price: $(\d+\.\d{2}).*Final Price: $(\d+\.\d{2})/U";
var_dump($r);

Output (ideone):

输出(ideone):

"/Regular Price: $(\d+\.\d{2}).*Final Price: $(\d+\.\d{2})/U"
                 ^                           ^
              the backslashes are no longer there

To fix this use a single quoted string instead of a double quoted string:

要解决此问题,请使用单引号字符串而不是双引号字符串:

preg_match_all('/Regular Price: $(\d+\.\d{2}).*Final Price: $(\d+\.\d{2})/U',
               $data,
               $matches);

See it working online: ideone

在线查看:ideone

回答by shmeeps

I know this question is a little old, but I found this while trying to find the answer to the same problem. I saw that it was at the top of the search engine rankings, so I figured it would be good to explain a simple alternative, and why this happens with double quoted strings ( " )

我知道这个问题有点老了,但是我在试图找到同一问题的答案时发现了这一点。我看到它在搜索引擎排名中名列前茅,所以我认为最好解释一个简单的替代方案,以及为什么双引号字符串会发生这种情况( " )

The regular expression I was using contained plenty of single quote characters ( ' )in it, so I wasn't too keen on wrapping the expression with them, since I didn't want to escape all of those.

我使用的正则表达式中包含大量单引号字符( ' ),所以我不太热衷于用它们包装表达式,因为我不想逃避所有这些。

My solution was to "double escape" the dollar sign. In your example, it should look something similar to

我的解决方案是“双重逃避”美元符号。在您的示例中,它应该类似于

"/Regular Price: \$(\d+\.\d{2}).*Final Price: \$(\d+\.\d{2})/U";

Note that the dollar sign contains 3 slashes now \\\.

请注意,美元符号现在包含 3 个斜线\\\

Basically, we have two "levels" of interpretation, that of PHP, and that of the regex expression. What's happening is that with one slash, PHP interprets it as a literal character instead of variable modifier, so it eats the slash, interprets the string as outlined in Mark's answer, and then sends that to regex, which interprets as a look-behind.

基本上,我们有两个“级别”的解释,PHP 和正则表达式。发生的事情是,对于一个斜杠,PHP 将其解释为文字字符而不是变量修饰符,因此它会吃掉斜杠,按照 Mark 的回答中概述的方式解释字符串,然后将其发送到正则表达式,正则表达式解释为后视。

By "double escaping" the dollar sign, PHP interprets \\\$as \\and \$respectively. We escape the \from the first set of characters, and escape the $from the second set, resulting in just \$after PHP interpretation. This will send the literal string

通过“双重转义”美元符号,PHP分别解释\\\$\\\$。我们\从第一组字符中转义 ,并$从第二组字符中转义,导致在\$PHP 解释之后。这将发送文字字符串

"/Regular Price: $(\d+\.\d{2}).*Final Price: $(\d+\.\d{2})/U";

to regex, which will interpret \$as the character literal $, which will match $instead of acting as a look behind, since it is escaped. It is important to realize the double layers of interpretation here, since both PHP and regex have their own interpretation rules, and it may take up to 4 slashes to correctly escape characters.

到正则表达式,它将解释\$为字符字面量$,它将匹配$而不是作为后视,因为它被转义了。在这里实现双层解释很重要,因为 PHP 和 regex 都有自己的解释规则,正确转义字符可能需要多达 4 个斜线。

Single quote strings don't have this problem, since to use a variable $fooin a string, we would have to write

单引号字符串没有这个问题,因为要$foo在字符串中使用变量,我们必须写

'Hello '. $foo .'!';

instead of

代替

"Hello $foo!";

Like we can in double strings. Unlike double quoted strings, single quote strings can't interpret variables inside the string as variables (unless they are appended like in example above), instead interpreting them as plain text. Since we don't have to escape the variable anymore, we can get away with just

就像我们可以在双字符串中一样。与双引号字符串不同,单引号字符串不能将字符串内的变量解释为变量(除非它们像上面的示例一样被附加),而是将它们解释为纯文本。因为我们不必再逃避变量了,我们可以逃脱

'/Regular Price: $(\d+\.\d{2}).*Final Price: $(\d+\.\d{2})/U'

which will send \$to regex, the same as with \\\$in a double quote string.

这将发送\$到正则表达式,与\\\$双引号字符串中的相同。

It's all a matter of personal preference on which style you use, or which is easier for the pattern.

这完全取决于您使用哪种样式的个人偏好,或者哪种样式更容易使用。

TL;DR: Use \$for single-quoted strings like '/Hello \$bob/is', and \\\$for double quoted strings like "/Hello \\\$bob/is".

TL;DR:\$用于单引号字符串,如'/Hello \$bob/is'\\\$双引号字符串,如"/Hello \\\$bob/is".