带有反向引用的 Java String.replaceAll()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36267354/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 17:38:58  来源:igfitidea点击:

Java String.replaceAll() with back reference

javaregex

提问by Jeffrey

There is a Java Regex question: Given a string, if the "*" is at the start or the end of the string, keep it, otherwise, remove it. For example:

有一个 Java Regex 问题:给定一个字符串,如果“*”在字符串的开头或结尾,则保留它,否则将其删除。例如:

  1. *--> *
  2. **--> **
  3. *******--> **
  4. *abc**def*--> *abcdef*
  1. *--> *
  2. **--> **
  3. *******--> **
  4. *abc**def*--> *abcdef*

The answer is:

答案是:

str.replaceAll("(^\*)|(\*$)|\*", "");

I tried the answer on my machine and it works. But I don't know how it works.

我在我的机器上尝试了答案并且它有效。但我不知道它是如何工作的。

From my understanding, all matched substrings should be replaced with $1$2. However, it works as:

根据我的理解,所有匹配的子字符串都应该替换为$1$2. 但是,它的工作原理是:

  1. (^\\*)replaced with $1,
  2. (\\*$)replaced with $2,
  3. \\*replaced with empty.
  1. (^\\*)替换为$1
  2. (\\*$)替换为$2
  3. \\*替换为空。

Could someone explain how it works? More specifically, if there is |between expressions, how String.replaceAll()works with back reference?

有人可以解释它是如何工作的吗?更具体地说,如果|表达式之间存在String.replaceAll(),反向引用如何工作?

Thank you in advance.

先感谢您。

采纳答案by Saleem

I'll try to explain what's happening in regex.

我将尝试解释正则表达式中发生的事情。

str.replaceAll("(^\*)|(\*$)|\*", "");

$1represents first group which is (^\\*)$2represents 2nd group (\\*$)

$1代表第一组,即(^\\*)$2代表第二组(\\*$)

when you call str.replaceAll, you are essentially capturing both groups and everything else but when replacing, replace captured text with whatever got captured in both groups.

当您调用 时str.replaceAll,您实际上是在捕获两个组和其他所有内容,但是在替换时,将捕获的文本替换为两个组中捕获的任何内容。

Example: *abc**def* --> *abcdef*

例子: *abc**def* --> *abcdef*

Regex is found string starting with *, it will put in $1group, next it will keep looking until it find *at end of group and store it in #2. now when replacing it will eliminate all *except one stored in $1or $2

正则表达式找到以 开头的字符串*,将放入$1组中,然后继续查找,直到*在组末尾找到并将其存储在#2. 现在更换时将消除所有*除了存储在$1$2

For more information see Capture Groups

有关更多信息,请参阅捕获组

回答by anubhava

You can use lookarounds in your regex:

您可以在正则表达式中使用环视:

String repl = str.replaceAll("(?<!^)\*+(?!$)", "");

RegEx Demo

正则表达式演示

RegEx Breakup:

正则表达式分解:

(?<!^)   # If previous position is not line start
\*+     # match 1 or more *
(?!$)    # If next position is not line end


OP's regex is:

OP的正则表达式是:

(^\*)|(\*$)|\*

It uses 2 captured groups, one for *at start and another for *at end and uses back-references in replacements. Which might work here but will be way more slower to finish for larger string as evident in # of steps taken in this demo. That is 209 vs 48steps using look-arounds.

它使用 2 个捕获的组,一个用于*开始,另一个用于*结束,并在替换中使用反向引用。这可能在这里工作,但完成更大的字符串会更慢,如本演示中采取的步骤#所示。这是使用环视的209和 48步。

Another smaller improvement in OP's regex is to use quantifier:

OP 正则表达式的另一个较小改进是使用 quantifier

(^\*)|(\*$)|\*+

回答by Sebastian Proske

Well, let's first take a look at your regex (^\\*)|(\\*$)|\\*- it matches every *, if it is at the start, it is captured into group 1, if it is at the end, it is captured into group 2 - every other *is matched, but not put into any group.

好吧,让我们先来看看你的正则表达式(^\\*)|(\\*$)|\\*- 它匹配 each *,如果它在开始,它被捕获到第 1 组,如果它在最后,它被捕获到第 2 组 - 每隔一个*匹配,但不是放入任何组。

The Replace pattern $1$2 replaces every single match with the content of group 1 and group 2 - so in case of a *at the beginning or the end of a match, the content of one of the groups is that *itself and is therefore replaced by itself. For all the other matches, the groups contain only empty strings, so the matched * is replaced with this empty string.

替换模式 $1$2 用组 1 和组 2 的内容替换每个匹配项 - 因此,如果*在匹配开始或结束时出现 a,其中一个组的内容就是它*本身,因此被它自己替换. 对于所有其他匹配项,这些组仅包含空字符串,因此匹配的 * 将替换为此空字符串。

Your problem was probably that $1$2 is not a literal replace, but a backreference to captured groups.

您的问题可能是 $1$2 不是字面替换,而是对捕获组的反向引用。

回答by Laurel

According to the Javadoc:

根据 Javadoc:

Note that backslashes () and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.

请注意,替换字符串中的反斜杠 () 和美元符号 ($) 可能会导致结果与将其视为文字替换字符串时的结果不同;参见 Matcher.replaceAll。如果需要,使用 Matcher.quoteReplacement(java.lang.String) 抑制这些字符的特殊含义。

Your regex: "(^\\*)|(\\*$)|\\*"

你的正则表达式: "(^\\*)|(\\*$)|\\*"

After removing quotes and Stringescapes: (^\*)|(\*$)|\*

删除引号和String转义后:(^\*)|(\*$)|\*

There are three parts, separated by pipes |. The pipes mean OR, which means that replaceAll()replaces them with the stuff from the second part: $1$2. Essentially, the 1st part >> $1, the second >> $2, the third >> "". Note that "the 1st part" == $1, and so on... So it's not technically replaced.

分为三部分,由管道隔开|。管道的意思是 OR,这意味着replaceAll()用第二部分的内容替换它们:$1$2. 本质上,第一部分 >> $1,第二部分 >> $2,第三部分 >> ""。请注意,“第一部分”== $1,依此类推......所以它在技术上没有被替换。

1 (^\*)is a capture group (the first). ^anchors to the string start. \*matches *, but needs the escape \.

1(^\*)是捕获组(第一个)。^锚定到字符串开始。\*匹配*,但需要逃脱\

2 (\*$)again, a capture group (2nd one). Difference here is it anchors to the end with $

2(\*$)再次,一个捕获组(第二个)。不同之处在于它锚定到最后$

3 \*like before, matches a literal *

3\*像以前一样,匹配文字*

The thing you need to understand about regexes is it will always take the first path if it matches. While *s at the beginning and end of the string could be matched by the 3rd part, they match the first or second parts instead.

您需要了解的关于正则表达式的事情是,如果匹配,它将始终采用第一条路径。虽然*字符串开头和结尾的 s 可以与第三部分匹配,但它们会匹配第一部分或第二部分。

回答by Grayman

Others have given very good answers so I won't repeat them. A suggestion when you are working to understand issues such as this is to temporarily add delimiters to the replacement string so that it is clear what is happening at each stage.

其他人已经给出了很好的答案,所以我不会重复。当您正在努力了解此类问题时,建议将分隔符临时添加到替换字符串中,以便清楚地了解每个阶段发生的情况。

e.g. use "<$1|$2>"This will give results of <x|y>where x is $1 and y is $2

例如使用"<$1|$2>"这将给出<x|y>x 为 1 美元,y 为 2 美元的结果

String str = "*ab**c*d*";
str.replaceAll("(^\*)|(\*$)|\*", "<|>");

The result is: <*|>ab<|><|>c<|>d<|*>

结果是: <*|>ab<|><|>c<|>d<|*>

So for the first asterisk, $1 = * and $2 is empty because (^\\*)matches.

所以对于第一个星号,$1 = * 和 $2 是空的,因为(^\\*)匹配。

For mid-string asterisks, both $1 and $2 are empty because neither capturing group matches.

对于字符串中的星号,$1 和 $2 都是空的,因为两个捕获组都不匹配。

For the final asterisk, $1 is empty and $2 is * because (^\\*)does not match but (\\*$)does.

对于最后一个星号,$1 是空的,$2 是 * 因为(^\\*)不匹配但匹配(\\*$)