php PHP正则表达式非捕获非匹配组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5944747/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP regex non-capture non-match group
提问by Ben
I'm making a date matching regex, and it's all going pretty well, I've got this so far:
我正在制作一个日期匹配正则表达式,而且一切进展顺利,到目前为止我已经有了:
"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"
It will (hopefully) match single or double digit days and months, and double or quadruple digit years in the 21st century. A few trials and errors have gotten me this far.
它将(希望)匹配 21 世纪的一位数或两位数的日期和月份,以及两位数或四位数的年份。一些试验和错误让我走到了这一步。
But, I've got two simple questions regarding these results:
但是,关于这些结果,我有两个简单的问题:
(?: )
what is a simple explanation for this? Apparently it's a non-matching group. But then...What is the trailing
?
for? e.g.(? )?
(?: )
对此的简单解释是什么?显然这是一个不匹配的组。但是之后...拖尾是
?
为了什么?例如(? )?
回答by Jose_X
[Edited (again) to improve formatting and fix the intro.]
[编辑(再次)以改进格式并修复介绍。]
This is a comment and an answer.
这是评论和答案。
The answer part... I do agree with alex' earlier answer.
答案部分......我同意亚历克斯之前的回答。
(?: )
, in contrast to( )
, is used to avoid capturing text, generally so as to have fewer back references thrown in with those you do want or to improve speed performance.The ? following the
(?: )
-- or when following anything except* + ?
or{}
-- means that the preceding item may or may not be found within a legitimate match. Eg,/z34?/
will match z3 as well as z34 but it won't match z35 or z etc.
(?: )
与 相比( )
,用于避免捕获文本,通常是为了减少与您需要的反向引用或提高速度性能。这 ?跟随
(?: )
- 或当跟随任何内容时,除了* + ?
或{}
- 意味着在合法匹配中可能会或可能不会找到前面的项目。例如,/z34?/
将匹配 z3 和 z34,但不会匹配 z35 或 z 等。
The comment part... I made what might considered to be improvements to the regex you were working on:
评论部分......我对您正在处理的正则表达式进行了可能被认为是改进的内容:
(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)
-- First, it avoids things like 0-0-2011
-- 首先,它避免了 0-0-2011 之类的事情
-- Second, it avoids things like 233443-4-201154564
-- 其次,它避免了 233443-4-201154564 之类的东西
-- Third, it includes things like 1-1-2022
-- 第三,它包括像 1-1-2022 这样的东西
-- Forth, it includes things like 1-1-11
-- 第四,它包括像 1-1-11 这样的东西
-- Fifth, it avoids things like 34-4-11
-- 第五,它避免了像 34-4-11 这样的事情
-- Sixth, it allows you to capture the day, month, and year so you can refer to these more easily in code.. code that would, for example, do a further check (is the second captured group 2 and is either the first captured group 29 and this a leap year or else the first captured group is <29) in order to see if a feb 29 date qualified or not.
-- 第六,它允许您捕获日、月和年,以便您可以在代码中更轻松地引用这些.. 代码将,例如,进行进一步检查(是第二个捕获的组 2,是第一个捕获的组 29 并且这是一个闰年,否则第一个捕获的组是 <29),以便查看 2 月 29 日的日期是否合格。
Finally, note that you'll still get dates that won't exist, eg, 31-6-11. If you want to avoid these, then try:
最后,请注意,您仍然会得到不存在的日期,例如 31-6-11。如果您想避免这些,请尝试:
(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)
Also, I assumed the dates would be preceded and followed by a space (or beg/end of line), but you may want ot adjust that (eg, to allow punctuations).
另外,我假设日期前后都有一个空格(或beg/end of line),但你可能想要调整它(例如,允许标点符号)。
A commenter elsewhere referenced this resource which you might find useful: http://rubular.com/
其他地方的评论者引用了您可能会发现有用的此资源:http: //rubular.com/
回答by alex
- It is a non capturing group. You can not back reference it. Usually used to declutter backreferences and/or increase performance.
- It means the previous capturing group is optional.
- 它是一个非捕获组。你不能反向引用它。通常用于整理反向引用和/或提高性能。
- 这意味着前面的捕获组是可选的。
回答by Ehsan Chavoshi
Subpatterns
子模式
Subpatterns are delimited by parentheses (round brackets), which can be nested. Marking part of a pattern as a subpattern does two things:
子模式由圆括号(圆括号)分隔,可以嵌套。将模式的一部分标记为子模式有两件事:
- It localizes a set of alternatives. For example, the pattern cat(aract|erpillar|) matches one of the words "cat", "cataract", or "caterpillar". Without the parentheses, it would match "cataract", "erpillar" or the empty string.
- It sets up the subpattern as a capturing subpattern (as defined above). When the whole pattern matches, that portion of the subject string that matched the subpattern is passed back to the caller via the ovector argument of pcre_exec(). Opening parentheses are counted from left to right (starting from 1) to obtain the numbers of the capturing subpatterns.
- 它本地化了一组替代方案。例如,模式 cat(aract|erpillar|) 匹配单词“cat”、“cataract”或“caterpillar”之一。如果没有括号,它将匹配“cataract”、“erpillar”或空字符串。
- 它将子模式设置为捕获子模式(如上定义)。当整个模式匹配时,与子模式匹配的主题字符串部分将通过 pcre_exec() 的 ovector 参数传递回调用者。从左到右(从 1 开始)计算左括号以获得捕获子模式的数量。
For example, if the string "the red king" is matched against the pattern the ((red|white) (king|queen)) the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.
例如,如果字符串“the red king”与模式 ((red|white) (king|queen)) 匹配,则捕获的子字符串为“red king”、“red”和“king”,并编号1、2 和 3。
The fact that plain parentheses fulfill two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 65535. It may not be possible to compile such large patterns, however, depending on the configuration options of libpcre.
圆括号实现两个功能的事实并不总是有帮助的。有时需要分组子模式而不需要捕获要求。如果左括号后跟“?:”,则子模式不进行任何捕获,并且在计算任何后续捕获子模式的数量时不计算在内。例如,如果字符串“the white Queen”与模式 ((?:red|white) (king|queen)) 匹配,则捕获的子字符串是“white Queen”和“queen”,编号为 1 和 2 . 捕获的子串的最大数量是 65535。不过,可能无法编译这么大的模式,这取决于 libpcre 的配置选项。
As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns
作为一种方便的简写,如果在非捕获子模式的开头需要任何选项设置,选项字母可能会出现在“?”之间。和“:”。因此这两种模式
(?i:saturday|sunday)
(?:(?i)saturday|sunday)
match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".
匹配完全相同的字符串集。由于从左到右尝试替代分支,并且直到到达子模式的末尾才重置选项,因此一个分支中的选项设置确实会影响后续分支,因此上述模式匹配“SUNDAY”以及“Saturday”。
It is possible to name a subpattern using the syntax (?Ppattern). This subpattern will then be indexed in the matches array by its normal numeric position and also by name. PHP 5.2.2 introduced two alternative syntaxes (?pattern) and (?'name'pattern).
可以使用语法 (?Ppattern) 命名子模式。然后,这个子模式将在匹配数组中按其正常数字位置和名称进行索引。PHP 5.2.2 引入了两种替代语法 (?pattern) 和 (?'name'pattern)。
Sometimes it is necessary to have multiple matching, but alternating subgroups in a regular expression. Normally, each of these would be given their own backreference number even though only one of them would ever possibly match. To overcome this, the (?| syntax allows having duplicate numbers. Consider the following regex matched against the string Sunday:
有时需要在正则表达式中有多个匹配但交替的子组。通常,即使其中只有一个可能匹配,它们中的每一个都会被赋予自己的反向引用编号。为了克服这个问题, (?| 语法允许有重复的数字。考虑以下与字符串 Sunday 匹配的正则表达式:
(?:(Sat)ur|(Sun))day
Here Sun is stored in backreference 2, while backreference 1 is empty. Matching yields Sat in backreference 1 while backreference 2 does not exist. Changing the pattern to use the (?| fixes this problem:
此处 Sun 存储在反向引用 2 中,而反向引用 1 为空。匹配在反向引用 1 中产生 Sat,而反向引用 2 不存在。更改模式以使用 (?| 解决此问题:
(?|(Sat)ur|(Sun))day
Using this pattern, both Sun and Sat would be stored in backreference 1.
使用此模式,Sun 和 Sat 都将存储在反向引用 1 中。
Reference : http://php.net/manual/en/regexp.reference.subpatterns.php
参考:http: //php.net/manual/en/regexp.reference.subpatterns.php