php 警告:preg_replace():未知修饰符“]”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20705399/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 03:27:00  来源:igfitidea点击:

Warning: preg_replace(): Unknown modifier ']'

phpregexwordpresspreg-replace

提问by user3122995

I have the following error:

我有以下错误:

Warning: preg_replace(): Unknown modifier ']' in xxx.php on line 38

警告:preg_replace():xxx.php 中第 38 行的未知修饰符“]”

This is the code on line 38:

这是第 38 行的代码:

<?php echo str_replace("</ul></div>", "", preg_replace("<div[^>]*><ul[^>]*>", "", wp_nav_menu(array('theme_location' => 'nav', 'echo' => false)) )); ?>

How can I fix this problem?

我该如何解决这个问题?

回答by Amal Murali

Why the error occurs

为什么会出现错误

In PHP, a regular expression needs to be enclosed within a pair of delimiters. A delimiter can be any non-alphanumeric, non-backslash, non-whitespace character; /, #, ~are the most commonly used ones. Note that it is also possible to use bracket style delimiters where the opening and closing brackets are the starting and ending delimiter, i.e. <pattern_goes_here>, [pattern_goes_here]etc. are all valid.

在 PHP 中,正则表达式需要包含在一对分隔符中。分隔符可以是任何非字母数字、非反斜杠、非空白字符;/, #,~是最常用的。需要注意的是,还可以使用支架风格分隔符在打开和关闭括号开始和结束的分隔符,即<pattern_goes_here>[pattern_goes_here]等等都是有效的。

The "Unknown modifier X" error usually occurs in the following two cases:

未知修饰符X”错误通常发生在以下两种情况:

  • When your regular expression is missing delimiters.

  • When you use the delimiter insidethe pattern without escapingit.

  • 当您的正则表达式缺少 delimiters 时

  • 当您在模式中使用分隔符而不对其进行转义时。

In this case, the regular expression is <div[^>]*><ul[^>]*>. The regex engine considers everything from <to >as the regex pattern, and everything afterwards as modifiers.

在这种情况下,正则表达式是<div[^>]*><ul[^>]*>。正则表达式引擎将所有从<to>视为正则表达式模式,并将之后的所有内容视为修饰符。

Regex: <div[^>  ]*><ul[^>]*>
       │     │  │          │
       └──┬──┘  └────┬─────┘
       pattern    modifiers

]here is an unknown modifier, because it appears after the closing >delimiter. Which is why PHP throws that error.

]这是一个未知的修饰符,因为它出现在结束>定界符之后。这就是 PHP 抛出该错误的原因。

Depending on the pattern, the unknown modifier complaint might as well have been about *, +, p, /or )or almost any other letter/symbol. Only imsxeADSUXJuare valid PCRE modifiers.

根据不同的模式,未知的修饰投诉可能也已经约*+p/)或几乎任何其他字母/符号。只有imsxeADSUXJu有效PCRE修饰符

How to fix it

如何修复

The fix is easy. Just wrap your regex pattern with any valid delimiters. In this case, you could chose ~and get the following:

修复很容易。只需用任何有效的分隔符包装您的正则表达式模式。在这种情况下,您可以选择~并获得以下内容:

~<div[^>]*><ul[^>]*>~
│                   │
│                   └─ ending delimiter
└───────────────────── starting delimiter

If you're receiving this error despite having used a delimiter, it might be because the pattern itself contains unescaped occurrences of the said delimiter.

如果您在使用分隔符后仍收到此错误,可能是因为模式本身包含未转义的所述分隔符。

Or escape delimiters

或转义分隔符

/foo[^/]+bar/iwould certainly throw an error. So you can escape it using a \backslash if it appears anywhere within the regex:

/foo[^/]+bar/i肯定会抛出错误。因此,\如果它出现在正则表达式中的任何位置,您可以使用反斜杠对其进行转义:

/foo[^\/]+bar/i
│      │     │
└──────┼─────┴─ actual delimiters
       └─────── escaped slash(/) character

This is a tedious job if your regex pattern contains so many occurrences of the delimiter character.

如果您的正则表达式模式包含如此多的分隔符,这将是一项乏味的工作。

The cleaner way, of course, would be to use a different delimiter altogether. Ideally a character that does not appear anywhere inside the regex pattern, say #- #foo[^/]+bar#i.

当然,更简洁的方法是完全使用不同的分隔符。理想情况下,字符不会出现在正则表达式模式中的任何地方,例如#- #foo[^/]+bar#i

More reading:

更多阅读:

回答by mario

Other examples

其他例子

The reference answeralready explains the reason for "Unknown modifier" warnings. This is just a comparison of other typical variants.

参考答案已经解释了“未知的修饰词”警告的原因。这只是其他典型变体的比较。

  • When forgetting to add regex /delimiters/, the first non-letter symbol will be assumed to be one. Therefore the warning is often about what follows a grouping (…), […]meta symbol:

    preg_match("[a-zA-Z]+:\s*.$"
                ↑      ↑?
    
  • Sometimes your regex already uses a custom delimiter (:here), but still contains the same character as unescaped literal. It's then mistaken as premature delimiter. Which is why the very next symbol receives the "Unknown modifier ?" trophy:

    preg_match(":\[[\d:/]+\]:"
                ↑     ?     ↑
    
  • When using the classic /delimiter, take care to not have it within the regex literally. This most frequently happens when trying to match unescaped filenames:

    preg_match("/pathname/filename/i"
                ↑        ?         ↑
    

    Or when matching angle/square bracket style tags:

    preg_match("/<%tmpl:id>(.*)</%tmpl:id>/Ui"
                ↑               ?         ↑
    
  • Templating-style (Smarty or BBCode) regex patterns often require {…}or […]brackets. Both should usually be escaped. (An outermost {}pair being the exception though).

    They also get misinterpreted as paired delimiterswhen no actual delimiter is used. If they're then also used as literal character within, then that's, of course … an error.

    preg_match("{bold[^}]+}"
                ↑      ?  ↑
    
  • Whenever the warning says "Delimiter must not be alphanumeric or backslash" then you also entirely forgot delimiters:

    preg_match("ab?c*"
                ↑
    
  • "Unkown modifier 'g'" often indicates a regex that was copied verbatimly from JavaScript or Perl.

    preg_match("/abc+/g"
                      ?
    

    PHP doesn't use the /gglobal flag. Instead the preg_replacefunction works on all occurences, and preg_match_allis the "global" searching pendant to the one-occurence preg_match.

    So, just remove the /gflag.

    See also:
    · Warning: preg_replace(): Unknown modifier 'g'
    · preg_replace: bad regex == 'Unknown Modifier'?

  • A more peculiar case pertains the PCRE_EXTENDED /xflag. This is often (or should be) used for making regexps more lofty and readable.

    This allows to use inline #comments. PHP implements the regex delimiters atop PCRE. But it doesn't treat #in any special way. Which is how a literal delimiter in a #comment can become an error:

    preg_match("/
       ab?c+  # Comment with / slash in between
    /x"
    

    (Also noteworthy that using #as #abc+#xdelimiter can be doubly inadvisable.)

  • Interpolating variables into a regex requires them to be pre-escaped, or be valid regexps themselves. You can't tell beforehand if this is gonna work:

     preg_match("/id=$var;/"
                 ↑    ?   ↑
    

    It's best to apply $var = preg_quote($var, "/")in such cases.

    See also:
    · Unknown modifier '/' in ...? what is it?

    Another alternative is using \Q…\Eescapes for unquoted literal strings:

     preg_match("/id=\Q{$var}\E;/mix");
    

    Note that this is merely a convenience shortcut for meta symbols, not dependable/safe. It would fall apart in case that $varcontained a literal '\E'itself (however unlikely). And it does notmask the delimiteritself.

  • Deprecated modifier /eis an entirely different problem. This has nothing to do with delimiters, but the implicit expression interpretation mode being phased out. See also: Replace deprecated preg_replace /e with preg_replace_callback

  • 当忘记添加正则表达式/分隔符时/,第一个非字母符号将被假定为一个。因此,警告通常是关于 grouping (…), […]meta 符号之后的内容:

    preg_match("[a-zA-Z]+:\s*.$"
                ↑      ↑?
    
  • 有时您的正则表达式已经使用了自定义分隔符(:此处),但仍包含与未转义文字相同的字符。然后它被误认为是过早的分隔符。这就是为什么下一个符号会收到“未知修饰符”?杯:

    preg_match(":\[[\d:/]+\]:"
                ↑     ?     ↑
    
  • 使用经典/分隔符时,请注意不要在正则表达式中使用它。这在尝试匹配未转义的文件名时最常发生:

    preg_match("/pathname/filename/i"
                ↑        ?         ↑
    

    或者当匹配角/方括号样式标签时

    preg_match("/<%tmpl:id>(.*)</%tmpl:id>/Ui"
                ↑               ?         ↑
    
  • 模板样式(Smarty 或 BBCode)正则表达式模式通常需要{…}[…]括号。两者通常都应该逃脱。({}尽管最外面的一对是例外)。

    当不使用实际分隔时,它们也会被误解为成对分隔符。如果它们随后也被用作内部的文字字符,那么这当然是……一个错误。

    preg_match("{bold[^}]+}"
                ↑      ?  ↑
    
  • 每当警告说“分隔符不能是字母数字或反斜杠”时,您也完全忘记了分隔符:

    preg_match("ab?c*"
                ↑
    
  • 未知修饰符 'g'”通常表示从 JavaScript 或 Perl 逐字复制的正则表达式。

    preg_match("/abc+/g"
                      ?
    

    PHP 不使用/g全局标志。相反,该preg_replace函数适用于所有出现,并且preg_match_all是一次出现的“全局”搜索挂件preg_match

    所以,只需删除/g标志。

    另见:
    ·警告:preg_replace():未知修饰符'g'
    · preg_replace:错误的正则表达式=='未知修饰符'?

  • 一个更奇特的情况与PCRE_EXTENDED/x标志有关。这通常(或应该)用于使正则表达式更加高级和可读。

    这允许使用内联#注释。PHP 在 PCRE 之上实现了正则表达式分隔符。但它不会#以任何特殊方式对待。这就是#注释中的文字定界符如何成为错误的原因:

    preg_match("/
       ab?c+  # Comment with / slash in between
    /x"
    

    (同样值得注意的是,使用#作为#abc+#x分隔符可能是双重不可取的。)

  • 将变量插入正则表达式需要它们预先转义,或者本身就是有效的正则表达式。你不能事先知道这是否会奏效:

     preg_match("/id=$var;/"
                 ↑    ?   ↑
    

    $var = preg_quote($var, "/")在这种情况下最好应用。

    另请参阅:
    ·未知修饰符“/”在...?它是什么?

    另一种选择是\Q…\E对不带引号的文字字符串使用转义:

     preg_match("/id=\Q{$var}\E;/mix");
    

    请注意,这只是元符号的便捷快捷方式,而不可靠/安全。如果$var包含文字'\E'本身(但不太可能),它会崩溃。它并不能掩盖分隔符本身。

  • 不推荐使用的修饰符 /e是一个完全不同的问题。这与分隔符无关,而是隐式表达式解释模式正在逐步淘汰。另请参阅: 用 preg_replace_callback 替换已弃用的 preg_replace /e

Alternative regex delimiters

替代正则表达式分隔符

As mentioned already, the quickest solution to this error is just picking a distinct delimiter. Any non-letter symbol can be used. Visually distinctive ones are often preferred:

如前所述,解决此错误的最快方法就是选择一个不同的分隔符。可以使用任何非字母符号。视觉上与众不同的通常是首选:

  • ~abc+~
  • !abc+!
  • @abc+@
  • #abc+#
  • =abc+=
  • %abc+%
  • ~abc+~
  • !abc+!
  • @abc+@
  • #abc+#
  • =abc+=
  • %abc+%

Technically you could use $abc$or |abc|for delimiters. However, it's best to avoid symbols that serve as regex meta characters themselves.

从技术上讲,您可以使用$abc$|abc|分隔符。但是,最好避免使用本身作为正则表达式元字符的符号。

The hash #as delimiter is rather popular too. But care should be taken in combination with the x/PCRE_EXTENDEDreadability modifier. You can't use # inlineor (?#…)comments then, because those would be confused as delimiters.

#作为分隔符的哈希也相当流行。但是在与x/ PCRE_EXTENDEDreadability 修饰符结合使用时应该小心。你不能使用# inlineor(?#…)注释,因为它们会被混淆为分隔符。

Quote-only delimiters

仅引号分隔符

Occassionally you see "and 'used as regex delimiters paired with their conterpart as PHP string enclosure:

偶尔你会看到"'用作正则表达式分隔符,与它们的对应物配对作为 PHP 字符串外壳:

  preg_match("'abc+'"
  preg_match('"abc+"'

Which is perfectly valid as far as PHP is concerned. It's sometimes convenient and unobtrusive, but not always legible in IDEs and editors.

就 PHP 而言,这是完全有效的。它有时方便且不引人注目,但在 IDE 和编辑器中并不总是清晰易读。

Paired delimiters

成对的分隔符

An interesting variation are paired delimiters. Instead of using the same symbol on both ends of a regex, you can use any <...>(...)[...]{...}bracket/braces combination.

一个有趣的变化是成对的分隔符。您可以使用任何<...>(...)[...]{...}括号/大括号组合,而不是在正则表达式的两端使用相同的符号。

  preg_match("(abc+)"   # just delimiters here, not a capture group

While most of them also serve as regex meta characters, you can often use them without further effort. As long as those specific braces/parens within the regex are paired or escaped correctly, these variants are quite readable.

虽然它们中的大多数也用作正则表达式元字符,但您通常可以毫不费力地使用它们。只要正则表达式中的那些特定大括号/括号正确配对或转义,这些变体就非常易读。

Fancy regex delimiters

花哨的正则表达式分隔符

A somewhat lazy trick (which is not endorsed hereby) is using non-printable ASCII characters as delimiters. This works easily in PHP by using double quotes for the regex string, and octal escapes for delimiters:

一个有点懒惰的技巧(这里不认可)是使用不可打印的 ASCII 字符作为分隔符。通过对正则表达式字符串使用双引号,并为分隔符使用八进制转义,这在 PHP 中很容易工作:

 preg_match("
delimiter = *p++;
if (isalnum((int)*(unsigned char *)&delimiter) || delimiter == '\') {
        php_error_docref(NULL,E_WARNING, "Delimiter must not…");
        return NULL;
}
1 abc+
int brackets = 1;   /* brackets nesting level */
while (*pp != 0) {
        if (*pp == '\' && pp[1] != 0) pp++;
        else if (*pp == end_delimiter && --brackets <= 0)
                break;
        else if (*pp == start_delimiter)
                brackets++;
        pp++;
}
1mix"

The \001is just a control character ?that's not usually needed. Therefore it's highly unlikely to appear within most regex patterns. Which makes it suitable here, even though not very legible.

\001只是一个?通常不需要的控制字符。因此,它极不可能出现在大多数正则表达式中。这使它适合这里,即使不是很清晰。

Sadly you can't use Unicode glyps ?as delimiters. PHP only allows single-byte characters. And why is that? Well, glad you asked:

遗憾的是,您不能使用 Unicode glyps?作为分隔符。PHP 只允许使用单字节字符。那为什么呢?好吧,很高兴你问:

PHPs delimiters atop PCRE

PCRE 上的 PHP 分隔符

The preg_*functions utilize the PCREregex engine, which itself doesn't care or provide for delimiters. For resemblence with Perl the preg_*functions implement them. Which is also why you can use modifier letters /isminstead of just constants as parameter.

这些preg_*函数使用PCRE正则表达式引擎,它本身并不关心或提供分隔符。为了与 Perl 相似,preg_*函数实现了它们。这也是为什么你可以使用修饰字母/ism而不是常量作为参数的原因

See ext/pcre/php_pcre.con how the regex string is preprocessed:

有关如何预处理正则表达式字符串的信息,请参见ext/pcre/php_pcre.c

  • First all leading whitespace is ignored.

  • Any non-alphanumeric symbol is taken as presumed delimiter. Note that PHP only honors single-byte characters:

    delimiter = *p++;
    if (isalnum((int)*(unsigned char *)&delimiter) || delimiter == '\') {
            php_error_docref(NULL,E_WARNING, "Delimiter must not…");
            return NULL;
    }
    
  • The rest of the regex string is traversed left-to-right. Only backslash \\-escaped symbols are ignored. \Qand \Eescapingis not honored.

  • Should the delimiter be found again, the remainder is verified to only contain modifier letters.

  • If the delimiter is one of the ([{< )]}> )]}>pairable braces/brackets, then the processing logic is more elaborate.

    int brackets = 1;   /* brackets nesting level */
    while (*pp != 0) {
            if (*pp == '\' && pp[1] != 0) pp++;
            else if (*pp == end_delimiter && --brackets <= 0)
                    break;
            else if (*pp == start_delimiter)
                    brackets++;
            pp++;
    }
    

    It looks for correctly paired left and right delimiter, but ignores other braces/bracket types when counting.

  • The raw regex string is passed to the PCRE backend only after delimiter and modifier flags have been cut out.

  • 首先,所有前导空格都被忽略。

  • 任何非字母数字符号都被视为假定的分隔符。请注意,PHP 仅支持单字节字符:

    <?php
    try 
    {
        return pattern('invalid] pattern')->match($s)->all();
    }
    catch (MalformedPatternException $e) 
    {
        // your pattern was invalid
    }
    
  • 正则表达式字符串的其余部分从左到右遍历。只有反斜杠\\转义的符号会被忽略。\Q\E逃避不兑现。

  • 如果再次找到分隔符,则余数将被验证为仅包含修饰符字母。

  • 如果分隔符是([{< )]}> )]}>可配对的大括号/括号之一,则处理逻辑更加复杂。

    ##代码##

    它寻找正确配对的左右分隔符,但在计数时忽略其他大括号/括号类型。

  • 只有在分隔符和修饰符标志被删除后,原始正则表达式字符串才会传递到 PCRE 后端。

Now this is all somewhat irrelevant. But explains where the delimiter warnings come from. And this whole procedure is all to have a minimum of Perl compatibility. There are a few minor deviations of course, like the […]character class context not receiving special treatment in PHP.

现在这一切都有些无关紧要。但解释了分隔符警告的来源。而这整个程序都是为了有最低限度的Perl兼容性。当然有一些小的偏差,比如[…]字符类上下文在 PHP 中没有得到特殊处理。

More references

更多参考

回答by Danon

If you would like to get an exception (MalformedPatternException), instead of warnings or using preg_last_error()- consider using T-Regx library:

如果您想获得异常 ( MalformedPatternException),而不是警告或使用preg_last_error()- 考虑使用T-Regx 库

##代码##