如何使用 Java Regex 查找字符串中的所有重复字符序列？

Question

提问by David Urry

Parsing a random string looking for repeating sequences using Java and Regex.

使用 Java 和 Regex 解析随机字符串以查找重复序列。

Consider strings:

考虑字符串：

aaabbaaacccbb

I'd like to find a regular expression that will find all the matches in the above string:

我想找到一个可以找到上述字符串中所有匹配项的正则表达式：

aaabbaaacccbb
^^^  ^^^

aaabbaaacccbb
   ^^      ^^

What is the regex expression that will check a string for any repeating sequences of characters and return the groups of those repeating characters such that group 1 = aaa and group 2 = bb. Also note that I've used an example string but any repeating characters are valid: RonRonJoeJoe ... ... ,, ,,...,,

什么是正则表达式，它将检查字符串中是否有任何重复的字符序列并返回这些重复字符的组，使得组 1 = aaa 和组 2 = bb。另请注意，我使用了一个示例字符串，但任何重复的字符都是有效的：RonRonJoeJoe ... ... ,, ,,...,,

Answer 1

回答by Guillaume Polet

This does it:

这样做：

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        String s = "aaabbaaacccbb";
        find(s);
        String s1 = "RonRonRonJoeJoe .... ,,,,";
        find(s1);
        System.err.println("---");
        String s2 = "RonBobRonJoe";
        find(s2);
    }

    private static void find(String s) {
        Matcher m = Pattern.compile("(.+)\1+").matcher(s);
        while (m.find()) {
            System.err.println(m.group());
        }
    }
}

OUTPUT:

输出：

aaa
bb
aaa
ccc
bb
RonRonRon
JoeJoe
....
,,,,
---

Answer 2

回答by Trevor Freeman

The below should work for all requirements. It is actually a combination of a couple of the answers here, and it will print out all of the substrings that are repeated anywhere else in the string.

以下应该适用于所有要求。它实际上是这里几个答案的组合，它将打印出在字符串中其他任何地方重复的所有子字符串。

I set it to only return substrings of at least 2 characters, but it can be easily changed to single characters by changing "{2,}" in the regex to "+".

我将其设置为仅返回至少 2 个字符的子字符串，但可以通过将正则表达式中的“{2,}”更改为“+”轻松将其更改为单个字符。

public static void main(String[] args)
{
  String s = "RonSamJoeJoeSamRon";
  Matcher m = Pattern.compile("(\S{2,})(?=.*?\1)").matcher(s);
  while (m.find())
  {
    for (int i = 1; i <= m.groupCount(); i++)
    {
      System.out.println(m.group(i));
    }
  }
}

Output:
Ron
Sam
Joe

输出：
罗恩
山
姆乔

Answer 3

回答by anubhava

You can use this positive lookaheadbased regex:

您可以使用这个positive lookahead基于正则表达式：

((\w)\2+)(?=.*\1)

Code:

代码：

String elem = "aaabbaaacccbb";
String regex = "((\w)\2+)(?=.*\1)";
Pattern p = Pattern.compile(regex);
Matcher matcher = p.matcher(elem);
for (int i=1; matcher.find(); i++)
System.out.println("Group # " + i + " got: " + matcher.group(1));

OUTPUT:

输出：

Group # 1 got: aaa
Group # 2 got: bb

Answer 4

回答by Reverend Gonzo

This seems to work, though it gives subsequences as well:

这似乎有效，尽管它也给出了子序列：

(To be fair, this was built off of Guillame's code)

（公平地说，这是根据 Guillame 的代码构建的）

public static void main(final String[] args) {
    // final String s = "RonRonJoeJoe";
    // final String s = "RonBobRonJoe";
    final String s = "aaabbaaacccbb";

    final Pattern p = Pattern.compile("(.+).*\1");

    final Matcher m = p.matcher(s);
    int start = 0;
    while (m.find(start)) {
        System.out.println(m.group(1));
        start = m.toMatchResult().end(1);
    }
}

Answer 5

回答by Reverend Gonzo

You could disregard overlap.

您可以忽略重叠。

// overlapped 1 or more chars
(?=(\w{1,}).*)
// overlapped 2 or more chars
(?=(\w{2,}).*)
// overlapped 3 or more chars, etc ..
(?=(\w{3,}).*)

Or, you could consume (non-overlapped) ..

或者，您可以使用（非重叠）..

// 1 or more chars
(?=(\w{1,}).*) 
// 2 or more chars
(?=(\w{2,}).*) 
// 3 or more chars, etc ..
(?=(\w{3,}).*)

如何使用 Java Regex 查找字符串中的所有重复字符序列？

提问by David Urry

回答by Guillaume Polet

回答by Trevor Freeman

回答by anubhava

Code:

代码：

OUTPUT:

输出：

回答by Reverend Gonzo

回答by Reverend Gonzo

相关推荐

最近更新

标签

如何使用 Java Regex 查找字符串中的所有重复字符序列？

提问by David Urry

回答by Guillaume Polet

回答by Trevor Freeman

回答by anubhava

Code:

代码：

OUTPUT:

输出：

回答by Reverend Gonzo

回答by Reverend Gonzo

相关推荐

java 导入图像文件，将其添加到ArrayList，然后依次显示图像

Java 最佳实践 - 在类变量之前声明构造函数是一件坏事吗？

java JPA 坚持多对多

java 在面板中设置图像图标

相关推荐

最近更新

标签