Java 替换字符串中的反向引用语法(为什么是美元符号?)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2890700/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 14:06:33  来源:igfitidea点击:

Backreferences Syntax in Replacement Strings (Why Dollar Sign?)

javaregexsyntaxreplacebackreference

提问by polygenelubricants

In Java, and it seems in a few other languages, backreferences in the pattern are preceded by a backslash (e.g. \1, \2, \3, etc), but in a replacement string they preceded by a dollar sign (e.g. $1, $2, $3, and also $0).

在Java中,它似乎在少数其他语言,在模式的反向引用由一个反斜杠(如前面\1\2\3,等),但在替换字符串他们前面加一个美元符号(例如$1$2$3,和也$0)。

Here's a snippet to illustrate:

这是一个片段来说明:

System.out.println(
    "left-right".replaceAll("(.*)-(.*)", "\2-\1") // WRONG!!!
); // prints "2-1"

System.out.println(
    "left-right".replaceAll("(.*)-(.*)", "-")   // CORRECT!
); // prints "right-left"

System.out.println(
    "You want million dollar?!?".replaceAll("(\w*) dollar", "US\$ ")
); // prints "You want US$ million?!?"

System.out.println(
    "You want million dollar?!?".replaceAll("(\w*) dollar", "US$ \1")
); // throws IllegalArgumentException: Illegal group reference

Questions:

问题:

  • Is the use of $for backreferences in replacement strings unique to Java? If not, what language started it? What flavors use it and what don't?
  • Why is this a good idea? Why not stick to the same pattern syntax? Wouldn't that lead to a more cohesive and an easier to learn language?
    • Wouldn't the syntax be more streamlined if statements 1 and 4 in the above were the "correct" ones instead of 2 and 3?
  • $在替换字符串中使用for 反向引用是 Java 独有的吗?如果不是,是什么语言开始的?哪些口味使用它,哪些不使用?
  • 为什么这是个好主意?为什么不坚持相同的模式语法?这不会导致语言更有凝聚力和更容易学习吗?
    • 如果上面的语句 1 和 4 是“正确”的语句而不是语句 2 和 3,那么语法不是更精简吗?

采纳答案by Stephen C

Is the use of $ for backreferences in replacement strings unique to Java?

在替换字符串中使用 $ 进行反向引用是 Java 独有的吗?

No. Perl uses it, and Perl certainly predates Java's Patternclass. Java's regex support is explicitly described in terms of Perl regexes.

不。Perl 使用它,而且 Perl 肯定早于 Java 的Pattern类。Java 的正则表达式支持是根据 Perl 正则表达式明确描述的。

For example: http://perldoc.perl.org/perlrequick.html#Search-and-replace

例如:http: //perldoc.perl.org/perlrequick.html#Search-and-replace

Why is this a good idea?

为什么这是个好主意?

Well obviously you don't think it is a good idea! But one reason that it is a good idea is to make Java search/replace support (more) compatible with Perl's.

很明显你不认为这是一个好主意!但它是一个好主意的一个原因是使 Java 搜索/替换支持(更多)与 Perl 兼容。

There is another possiblereason why $might have been viewed as a better choice than \. That is that \has to be written as \\in a Java String literal.

可能被视为比 更好的选择还有另一个可能的原因。那就是必须像Java 字符串文字那样编写。 $\\\\

But all of this is pure speculation. None of us were in the room when the design decisions were made. And ultimately it doesn't really matter why they designed the replacement String syntax that way. The decisions have been made and set in concrete, and any further discussion is purely academic ... unless you just happen to be designing a new language or a new regex library for Java.

但这一切纯属猜测。做出设计决定时,我们没有人在房间里。最终,他们为什么以这种方式设计替换字符串语法并不重要。这些决定已经做出并具体确定,任何进一步的讨论都纯属学术性……除非您恰好正在为 Java 设计一种新语言或新的正则表达式库。

回答by polygenelubricants

After doing some research, I've understood the issues now: Perl hadto use a different symbol for pattern backreferences and replacement backreferences, and while java.util.regex.*doesn't haveto follow suit, it chooses to, not for a technical but rather traditional reason.

之后做一些研究,我现在已经明白的问题:Perl中不得不使用的模式反向引用和反向引用更换不同的符号,虽然java.util.regex.*具有跟风,它选择,而不是技术,而是传统的原因。



On the Perl side

在 Perl 方面

(Please keep in mind that all I know about Perl at this point comes from reading Wikipedia articles, so feel free to correct any mistakes I may have made)

(请记住,我目前对 Perl 的所有了解都来自阅读维基百科文章,因此请随时纠正我可能犯的任何错误)

The reason why it hadto be done this way in Perl is the following:

在 Perl 中必须以这种方式完成的原因如下:

  • Perl uses $as a sigil (i.e. a symbol attached to variable name).
  • Perl string literals are variable interpolated.
  • Perl regex actually captures groups as variables $1, $2, etc.
  • Perl$用作符号(即附加到变量名的符号)。
  • Perl 字符串文字是可变插值的。
  • Perl的正则表达式实际上捕获组作为变量$1$2等等。

Thus, because of the way Perl is interpreted and how its regex engine works, a preceding slash for backreferences (e.g. \1) in the pattern must be used, because if the sigil $is used instead (e.g. $1), it would cause unintended variable interpolation into the pattern.

因此,由于的方式Perl是解释和如何其正则表达式引擎的工作原理,对于反向引用(例如,前述斜杠\1必须使用在图案),因为如果印记$来代替(例如$1),这将导致意外的可变内插入图案。

The replacement string, due to how it works in Perl, is evaluated within the context of every match. It is most natural for Perl to use variable interpolation here, so the regex engine captures groups into variables $1, $2, etc, to make this work seamlessly with the rest of the language.

由于在 Perl 中的工作方式,替换字符串在每个匹配的上下文中进行评估。这是最自然的Perl来这里用变量代换,所以正则表达式引擎捕获群体纳入变量$1$2等等,与语言的其余部分,使这项工作无缝连接。

References

参考



On the Java side

在 Java 方面

Java is a very different language than Perl, but most importantly here is that there is no variable interpolation. Moreover, replaceAllis a method call, and as with all method calls in Java, arguments are evaluated once, prior to the method invoked.

Java 是一种与 Perl 非常不同的语言,但最重要的是这里没有变量插值。此外,replaceAll是一个方法调用,与 Java 中的所有方法调用一样,在调用方法之前,参数会被评估一次。

Thus, variable interpolation feature by itself is not enough, since in essence the replacement string must be re-evaluated on every match, and that's just not the semantics of method calls in Java. A variable-interpolated replacement string that is evaluated beforethe replaceAllis even invoked is practically useless; the interpolation needs to happen duringthe method, on every match.

因此,变量插值特性本身是不够的,因为本质上必须在每次匹配时重新评估替换字符串,而这不是 Java 中方法调用的语义。被评估的可变内插替换字符串之前replaceAll甚至被调用实际上是无用的; 插值需要方法期间发生,在每次比赛中。

Since that is not the semantics of Java language, replaceAllmust do this "just-in-time" interpolation manually. As such, there is absolutely no technical reasonwhy $is the escape symbol for backreferences in replacement strings. It could've very well been the \. Conversely, backreferences in the pattern could also have been escaped with $instead of \, and it would've still worked just as fine technically.

由于这不是 Java 语言的语义,因此replaceAll必须手动执行此“即时”插值。因此,也绝对没有技术上的原因,为什么$是在替换字符串反向引用转义符号。它很可能是\. 相反,模式中的反向引用也可以使用$而不是转义\,并且它在技术上仍然可以正常工作。

The reason Java does regex the way it does is purely traditional: it's simply following the precedent set by Perl.

Java 以它的方式执行正则表达式的原因纯粹是传统的:它只是遵循 Perl 设置的先例。