java `Greedy` 和 `Reluctant` 正则表达式量词有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1139171/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between `Greedy` and `Reluctant` regular expression quantifiers?
提问by jjnguy
From the Patternjavadocs:
从Patternjavadocs:
Greedy quantifiers:
X? X, once or not at all
X* X, zero or more times
X+ X, one or more times
X{n} X, exactly n times
X{n,} X, at least n times
X{n,m} X, at least n but not more than m times
Reluctant quantifiers:
X?? X, once or not at all
X*? X, zero or more times
X+? X, one or more times
X{n}? X, exactly n times
X{n,}? X, at least n times
X{n,m}? X, at least n but not more than m times
The description of what they do is the same...so, what is the difference?
他们所做的事情的描述是一样的……那么,有什么区别呢?
I would really appreciate some examples.
我真的很感激一些例子。
I am coding in Java, but I hear this concept is the same for most modern regex implementations.
我正在用 Java 编码,但我听说这个概念对于大多数现代正则表达式实现都是一样的。
回答by PatrikAkerstrand
A greedy operator always try to "grab" as much of the input as possible, while a reluctant quantifier will match as little of the input as possible and still create a match.
贪婪的操作符总是试图“抓取”尽可能多的输入,而不情愿的量词将匹配尽可能少的输入并仍然创建匹配。
Example:
例子:
"The red fox jumped over the red fence"
/(.*)red/ => = "The red fox jumped over the "
/(.*?)red/ => = "The "
"aaa"
/a?a*/ => = "a", = "aa"
/a??a*/ => = "", = "aaa"
"Mr. Doe, John"
/^(?:Mrs?.)?.*\b(.*)$/ => = "John"
/^(?:Mrs?.)?.*?\b(.*)$/ => = "Doe, John"
回答by akf
From this link, where the tutorial author acknowledges the spirit of your question:
从这个链接,教程作者承认你的问题的精神:
At first glance it may appear that the quantifiers X?, X?? and X?+ do exactly the same thing, since they all promise to match "X, once or not at all". There are subtle implementation differences which will be explained near the end of this section.
乍一看,量词 X?, X?? 和 X?+ 做完全相同的事情,因为它们都承诺匹配“X,一次或根本不匹配”。有一些细微的实现差异,将在本节末尾解释。
They go on to put together examples and offer the explanation:
他们继续整理示例并提供解释:
Greedy quantifiers are considered "greedy" because they force the matcher to read in, or eat, the entire input string prior to attempting the first match. If the first match attempt (the entire input string) fails, the matcher backs off the input string by one character and tries again, repeating the process until a match is found or there are no more characters left to back off from. Depending on the quantifier used in the expression, the last thing it will try matching against is 1 or 0 characters.
The reluctant quantifiers, however, take the opposite approach: They start at the beginning of the input string, then reluctantly eat one character at a time looking for a match. The last thing they try is the entire input string.
贪婪量词被认为是“贪婪的”,因为它们强制匹配器在尝试第一次匹配之前读入或吃掉整个输入字符串。如果第一次匹配尝试(整个输入字符串)失败,匹配器将输入字符串后退一个字符并再次尝试,重复该过程直到找到匹配或没有更多字符可以后退。根据表达式中使用的量词,它尝试匹配的最后一个字符是 1 或 0 个字符。
然而,不情愿量词采用相反的方法:它们从输入字符串的开头开始,然后不情愿地一次吃掉一个字符以寻找匹配项。他们尝试的最后一件事是整个输入字符串。
And for extra credit, the possessive explanation:
为了额外的功劳,所有格解释:
Finally, the possessive quantifiers always eat the entire input string, trying once (and only once) for a match. Unlike the greedy quantifiers, possessive quantifiers never back off, even if doing so would allow the overall match to succeed.
最后,所有格量词总是吃掉整个输入字符串,尝试一次(并且仅一次)匹配。与贪婪量词不同,所有格量词永远不会退缩,即使这样做会使整体匹配成功。
回答by David Waters
A greedy quantifier will match as much as possible and still get a match A reluctant quantifier will match the smallest amount possible.
贪婪的量词将尽可能多地匹配并仍然匹配 不情愿的量词将匹配尽可能小的数量。
for example given the string
例如给定字符串
abcdef
abcdef
the greedy qualifier
贪婪的限定符
ab[a-z]*[a-z] would match abcdef
ab[az]*[az] 将匹配 abcdef
the reluctant qualifier
不情愿的预选赛
ab[a-z]*?[a-z] would match abc
ab[az]*?[az] 将匹配 abc
回答by Jorn
say you have a regex "a\w*b", and use it on "abab"Greedy matching will match "abab"(it looks for an a, as much occurrences of \was possible, and a b) and reluctant matching will match just "ab"(as little \was possible)
假设你有一个 regex "a\w*b",并在"abab"贪婪匹配上使用它会匹配"abab"(它会寻找一个a,尽可能多的出现\w,以及一个b),而勉强匹配只会匹配"ab"(尽可能少\w)
回答by Brad Gilbert
There is documentation on how Perl handles these quantifiers perldoc perlre.
有关于 Perl 如何处理这些量词的文档perldoc perlre。
By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a "?". Note that the meanings don't change, just the "greediness":By default, when a quantified subpattern does not allow the rest of the overall pattern to match, Perl will backtrack. However, this behaviour is sometimes undesirable. Thus Perl provides the "possessive" quantifier form as well.*? Match 0 or more times, not greedily +? Match 1 or more times, not greedily ?? Match 0 or 1 time, not greedily {n}? Match exactly n times, not greedily {n,}? Match at least n times, not greedily {n,m}? Match at least n but not more than m times, not greedilyFor instance,*+ Match 0 or more times and give nothing back ++ Match 1 or more times and give nothing back ?+ Match 0 or 1 time and give nothing back {n}+ Match exactly n times and give nothing back (redundant) {n,}+ Match at least n times and give nothing back {n,m}+ Match at least n but not more than m times and give nothing backwill never match, as the'aaaa' =~ /a++a/a++will gobble up all thea's in the string and won't leave any for the remaining part of the pattern. This feature can be extremely useful to give perl hints about where it shouldn't backtrack. For instance, the typical "match a double-quoted string" problem can be most efficiently performed when written as:as we know that if the final quote does not match, backtracking will not help. See the independent subexpression/"(?:[^"\]++|\.)*+"/(?>...)for more details; possessive quantifiers are just syntactic sugar for that construct. For instance the above example could also be written as follows:/"(?>(?:(?>[^"\]+)|\.)*)"/

