Java正则表达式中\\1*运算符的含义

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38705002/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 20:32:15  来源:igfitidea点击:

The meaning of \\1* operator in Java regexes

javaregex

提问by Friedman

I am learning about Java regexes, and I noticed the following operator:

我正在学习 Java 正则表达式,我注意到以下运算符:

\*1

I'm having hard time figuring out what it means (searching in the web didn't help). For example, what is the difference between these two options:

我很难弄清楚它的含义(在网上搜索没有帮助)。例如,这两个选项有什么区别:

    Pattern p1 = Pattern.compile("(a)\1*"); // option1
    Pattern p2 = Pattern.compile("(a)"); // option2

    Matcher m1 = p1.matcher("a");
    Matcher m2 = p2.matcher("a");

    System.out.println(m1.group(0));
    System.out.println(m2.group(0));

Result:

结果:

a
a

Thanks!

谢谢!

采纳答案by Nicolas Filotto

\\1is back reference corresponding in this case to the first capturing group which is (a)here.

\\1在这种情况下是对应于(a)此处的第一个捕获组的反向引用。

So (a)\\1*is equivalent to (a)a*in this particular case.

所以(a)\\1*相当于(a)a*在这种特殊情况下。

Here is an example that shows the difference:

这是一个显示差异的示例:

Pattern p1 = Pattern.compile("(a)\1*");
Pattern p2 = Pattern.compile("(a)");

Matcher m1 = p1.matcher("aa");
Matcher m2 = p2.matcher("aa");

m1.find();
System.out.println(m1.group());
m2.find();
System.out.println(m2.group());

Output:

输出:

aa
a

As you can see when you have several athe first regular expression captures all the successive awhile the second one captures only the first one.

正如您所看到的,当您有多个时a,第一个正则表达式捕获所有连续的,a而第二个仅捕获第一个。

回答by assylias

\\1*looks for aagain, 0 or more times. Maybe easier to understand would be this example, using (a)\\1+, which looks for at least 2 as:

\\1*a再次查找,0 次或更多次。也许更容易理解的是这个例子, using (a)\\1+,它寻找至少 2a秒:

Pattern p1 = Pattern.compile("(a)\1+");
Matcher m1 = p1.matcher("aaaaabbaaabbba");
while (m1.find()) System.out.println(m1.group());

the output will be:

输出将是:

aaaaa
aaa

啊啊啊啊

But the last awon't match because it is not repeated.

但最后一个a不匹配,因为它没有重复。

回答by Imposter

In Perl, \1 through \9 are always interpreted as back references; a backslash-escaped number greater than 9 is treated as a back reference if at least that many subexpressions exist, otherwise it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.

在 Perl 中,\1 到 \9 总是被解释为反向引用;如果至少存在那么多子表达式,则将大于 9 的反斜杠转义数字视为反向引用,否则,如果可能,将其解释为八进制转义。在此类中,八进制转义符必须始终以零开头。在这个类中,\1 到 \9 总是被解释为反向引用,如果正则表达式中至少在那个点存在那么多子表达式,则接受更大的数字作为反向引用,否则解析器将丢弃数字直到数字小于或等于现有组数或为一位数。

From the Patterndocs.

来自模式文档。

So it looks like p2is only good for one "a"while p1is good for any number of "a"as long as there is at least one. The star is X* X, zero or more times. It is called a Kleene star.

所以它看起来p2只对一个有好处,"a"而只要至少有一个就p1对任意数量都有好处"a"。明星是X* X, zero or more times。它被称为克莱恩星。