Java正则表达式中\\1*运算符的含义
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38705002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
The meaning of \\1* operator in Java regexes
提问by Friedman
I am learning about Java regexes, and I noticed the following operator:
我正在学习 Java 正则表达式,我注意到以下运算符:
\*1
I'm having hard time figuring out what it means (searching in the web didn't help). For example, what is the difference between these two options:
我很难弄清楚它的含义(在网上搜索没有帮助)。例如,这两个选项有什么区别:
Pattern p1 = Pattern.compile("(a)\1*"); // option1
Pattern p2 = Pattern.compile("(a)"); // option2
Matcher m1 = p1.matcher("a");
Matcher m2 = p2.matcher("a");
System.out.println(m1.group(0));
System.out.println(m2.group(0));
Result:
结果:
a
a
Thanks!
谢谢!
采纳答案by Nicolas Filotto
\\1
is back reference corresponding in this case to the first capturing group which is (a)
here.
\\1
在这种情况下是对应于(a)
此处的第一个捕获组的反向引用。
So (a)\\1*
is equivalent to (a)a*
in this particular case.
所以(a)\\1*
相当于(a)a*
在这种特殊情况下。
Here is an example that shows the difference:
这是一个显示差异的示例:
Pattern p1 = Pattern.compile("(a)\1*");
Pattern p2 = Pattern.compile("(a)");
Matcher m1 = p1.matcher("aa");
Matcher m2 = p2.matcher("aa");
m1.find();
System.out.println(m1.group());
m2.find();
System.out.println(m2.group());
Output:
输出:
aa
a
As you can see when you have several a
the first regular expression captures all the successive a
while the second one captures only the first one.
正如您所看到的,当您有多个时a
,第一个正则表达式捕获所有连续的,a
而第二个仅捕获第一个。
回答by assylias
\\1*
looks for a
again, 0 or more times. Maybe easier to understand would be this example, using (a)\\1+
, which looks for at least 2 a
s:
\\1*
a
再次查找,0 次或更多次。也许更容易理解的是这个例子, using (a)\\1+
,它寻找至少 2a
秒:
Pattern p1 = Pattern.compile("(a)\1+");
Matcher m1 = p1.matcher("aaaaabbaaabbba");
while (m1.find()) System.out.println(m1.group());
the output will be:
输出将是:
aaaaa
aaa
啊啊啊啊
啊
But the last a
won't match because it is not repeated.
但最后一个a
不匹配,因为它没有重复。
回答by Imposter
In Perl, \1 through \9 are always interpreted as back references; a backslash-escaped number greater than 9 is treated as a back reference if at least that many subexpressions exist, otherwise it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.
在 Perl 中,\1 到 \9 总是被解释为反向引用;如果至少存在那么多子表达式,则将大于 9 的反斜杠转义数字视为反向引用,否则,如果可能,将其解释为八进制转义。在此类中,八进制转义符必须始终以零开头。在这个类中,\1 到 \9 总是被解释为反向引用,如果正则表达式中至少在那个点存在那么多子表达式,则接受更大的数字作为反向引用,否则解析器将丢弃数字直到数字小于或等于现有组数或为一位数。
From the Patterndocs.
来自模式文档。
So it looks like p2
is only good for one "a"
while p1
is good for any number of "a"
as long as there is at least one. The star is X* X, zero or more times
. It is called a Kleene star.
所以它看起来p2
只对一个有好处,"a"
而只要至少有一个就p1
对任意数量都有好处"a"
。明星是X* X, zero or more times
。它被称为克莱恩星。