php 在正则表达式中应该转义哪些文字字符?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5484084/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What literal characters should be escaped in a regex?
提问by The Pellmeister
I just wrote a regex for use with the php function preg_match
that contains the following part:
我刚刚写了一个正则表达式,用于preg_match
包含以下部分的 php 函数:
[\w-.]
To match any word character, as well as a minus sign and the dot. While it seems to work in preg_match, I tried to put it into a utility called Reggyand it complaints about "Empty range in char class". Trial and error taught me that this issue was solved by escaping the minus sign, turning the regex into
匹配任何单词字符,以及减号和点。虽然它似乎在 preg_match 中工作,但我试图将它放入一个名为Reggy的实用程序中,它抱怨“char class 中的空范围”。反复试验告诉我这个问题是通过转义减号来解决的,把正则表达式变成
[\w\-.]
Since the original appears to work in PHP, I am wondering why I should or should not be escaping the minus sign, and - since the dot is also a character with a meaning in PHP - why I would not need to escape the dot. Is the utility I am using just being silly, is it working with another regex dialect or is my regex really incorrect and am I just lucky that preg_match lets me get away with it?
由于原始似乎在 PHP 中工作,我想知道为什么我应该或不应该转义减号,并且 - 因为点也是 PHP 中一个有意义的字符 - 为什么我不需要转义点。我正在使用的实用程序是愚蠢的,它是否与另一种正则表达式方言一起使用,还是我的正则表达式真的不正确,我只是幸运 preg_match 让我逃脱了它?
回答by Bart Kiers
In many regex implementations, the following rules apply:
在许多正则表达式实现中,以下规则适用:
Meta characters inside a character class are:
字符类中的元字符是:
^
(negation)-
(range)]
(end of the class)\
(escape char)
^
(否定)-
(范围)]
(课程结束)\
(转义字符)
So these should all be escaped. There are some corner cases though:
所以这些都应该逃脱。不过也有一些极端情况:
-
needs no escaping if placed at the very start, or end of the class ([abc-]
or[-abc]
). In quite a few regex implementations, it also needs no escaping when placed directly after a range ([a-c-abc]
) or short-hand character class ([\w-abc]
). This is what you observed^
needs no escaping when it's notat the start of the class:[^a]
means any char excepta
, and[a^]
matches eithera
or^
, which equals:[\^a]
]
needs no escaping if it's the only character in the class:[]]
matches the char]
-
如果放在班级的开头或结尾([abc-]
或[-abc]
),则不需要转义。在相当多的正则表达式实现中,当直接放在范围 ([a-c-abc]
) 或简写字符类 ([\w-abc]
)之后时,它也不需要转义。这是你观察到的^
当它不在类的开头时不需要转义:[^a]
表示除a
,之外的任何字符,并[a^]
匹配a
or^
,它等于:[\^a]
]
如果它是类中唯一的字符,则不需要转义:[]]
匹配字符]
回答by bw_üezi
[\w.-]
- the
.
usually means any character but between[]
has no special meaning -
between[]
indicates a range unless if it's escaped or either first or last character between[]
- the
.
通常表示任何字符,但 between[]
没有特殊含义 -
之间[]
表示一个范围,除非它被转义或者之间的第一个或最后一个字符[]
回答by Your Common Sense
While there are indeed some characters should be escaped in a regex, you're asking not about regex but about character class. Where dash symbol being special one.
虽然确实有一些字符应该在 regex 中转义,但您问的不是正则表达式而是字符类。其中破折号符号是特殊的。
instead of escaping it you could put it at the end of class, [\w.-]
你可以把它放在课后,而不是逃避它, [\w.-]
回答by mario
The full stop loses its meta meaning in the character class.
句号在字符类中失去了元意义。
The -
has special meaning in the character class. If it isn't placed at the start or at the end of the square brackets, it must be escaped. Otherwise it denotes a character range (A-Z
).
在-
有字符类特殊的意义。如果它没有放在方括号的开头或结尾,则必须对其进行转义。否则,它表示一个字符范围 ( A-Z
)。
You triggered another special casehowever. [\w-.]
works because \w
does not denote a single character. As such PCRE can not possibly create a character range. \w
is a possibly non-coherent class of symbols, so there is no end-character which could be used to create the range Z till .
. Also the full stop .
would preceed the first ascii character a
that \w
could match. There is no range constructable. Hencewhy -
worked without escaping for you.
但是,您触发了另一个特殊情况。[\w-.]
有效,因为\w
不表示单个字符。因此,PCRE 不可能创建字符范围。\w
是一个可能不连贯的符号类,因此没有可用于创建 range 的结束字符Z till .
。此外句号.
将preceed第一个ASCII字符a
是\w
可以匹配。没有可构造的范围。因此为什么-
工作而不为你逃脱。
回答by RedClover
If you are using php and you need to escape special regex chars, just use preg_quote
:
如果您使用的是 php 并且需要转义特殊的正则表达式字符,只需使用preg_quote
:
An example from php.net:
来自php.net 的一个例子:
<?php
// In this example, preg_quote($word) is used to keep the
// asterisks from having special meaning to the regular
// expression.
$textbody = "This book is *very* difficult to find.";
$word = "*very*";
$textbody = preg_replace ("/" . preg_quote($word, '/') . "/",
"<i>" . $word . "</i>",
$textbody);
?>