Java正则表达式中\\1*运算符的含义

Question

提问by Friedman

I am learning about Java regexes, and I noticed the following operator:

我正在学习 Java 正则表达式，我注意到以下运算符：

\*1

I'm having hard time figuring out what it means (searching in the web didn't help). For example, what is the difference between these two options:

我很难弄清楚它的含义（在网上搜索没有帮助）。例如，这两个选项有什么区别：

    Pattern p1 = Pattern.compile("(a)\1*"); // option1
    Pattern p2 = Pattern.compile("(a)"); // option2

    Matcher m1 = p1.matcher("a");
    Matcher m2 = p2.matcher("a");

    System.out.println(m1.group(0));
    System.out.println(m2.group(0));

Result:

结果：

a
a

Thanks!

谢谢！

Answer 1

采纳答案by Nicolas Filotto

\\1is back reference corresponding in this case to the first capturing group which is (a)here.

\\1在这种情况下是对应于(a)此处的第一个捕获组的反向引用。

So (a)\\1*is equivalent to (a)a*in this particular case.

所以(a)\\1*相当于(a)a*在这种特殊情况下。

Here is an example that shows the difference:

这是一个显示差异的示例：

Pattern p1 = Pattern.compile("(a)\1*");
Pattern p2 = Pattern.compile("(a)");

Matcher m1 = p1.matcher("aa");
Matcher m2 = p2.matcher("aa");

m1.find();
System.out.println(m1.group());
m2.find();
System.out.println(m2.group());

Output:

输出：

aa
a

As you can see when you have several athe first regular expression captures all the successive awhile the second one captures only the first one.

正如您所看到的，当您有多个时a，第一个正则表达式捕获所有连续的，a而第二个仅捕获第一个。

Answer 2

回答by assylias

\\1*looks for aagain, 0 or more times. Maybe easier to understand would be this example, using (a)\\1+, which looks for at least 2 as:

\\1*a再次查找，0 次或更多次。也许更容易理解的是这个例子， using (a)\\1+，它寻找至少 2a秒：

Pattern p1 = Pattern.compile("(a)\1+");
Matcher m1 = p1.matcher("aaaaabbaaabbba");
while (m1.find()) System.out.println(m1.group());

the output will be:

输出将是：

aaaaa
aaa

啊啊啊啊
啊

But the last awon't match because it is not repeated.

但最后一个a不匹配，因为它没有重复。

Answer 3

回答by Imposter

In Perl, \1 through \9 are always interpreted as back references; a backslash-escaped number greater than 9 is treated as a back reference if at least that many subexpressions exist, otherwise it is interpreted, if possible, as an octal escape. In this class octal escapes must always begin with a zero. In this class, \1 through \9 are always interpreted as back references, and a larger number is accepted as a back reference if at least that many subexpressions exist at that point in the regular expression, otherwise the parser will drop digits until the number is smaller or equal to the existing number of groups or it is one digit.

在 Perl 中，\1 到 \9 总是被解释为反向引用；如果至少存在那么多子表达式，则将大于 9 的反斜杠转义数字视为反向引用，否则，如果可能，将其解释为八进制转义。在此类中，八进制转义符必须始终以零开头。在这个类中，\1 到 \9 总是被解释为反向引用，如果正则表达式中至少在那个点存在那么多子表达式，则接受更大的数字作为反向引用，否则解析器将丢弃数字直到数字小于或等于现有组数或为一位数。

From the Patterndocs.

来自模式文档。

So it looks like p2is only good for one "a"while p1is good for any number of "a"as long as there is at least one. The star is X* X, zero or more times. It is called a Kleene star.

所以它看起来p2只对一个有好处，"a"而只要至少有一个就p1对任意数量都有好处"a"。明星是X* X, zero or more times。它被称为克莱恩星。

Java正则表达式中\\1*运算符的含义

提问by Friedman

采纳答案by Nicolas Filotto

回答by assylias

回答by Imposter

相关推荐

最近更新

标签

Java正则表达式中\\1*运算符的含义

提问by Friedman

采纳答案by Nicolas Filotto

回答by assylias

回答by Imposter

相关推荐

Java 有没有好的自然语言处理库

Java indexOf 查找字符串中出现的所有单词

Java JAXB：我如何注释类以便它们属于不同的命名空间？

Java 在 onClick 中为 View 添加涟漪效果

相关推荐

最近更新

标签