java 编写正则表达式来检测重复字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17793962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-01 15:00:38  来源:igfitidea点击:

Writing a regex to detect repeat-characters

javaregex

提问by sharon Hwk

I need to write a regex, that would identify a word that have a repeating character setat the end. According to the following code fragment, the repeating character set is An. I need to write a regex so this will be spotted and displayed.

我需要编写一个正则表达式,它可以识别一个在末尾具有重复字符集的单词。根据以下代码片段,重复字符集为. 我需要编写一个正则表达式,以便发现并显示它。An

According to the following code, \\wwill match any word character (including digit, letter, or special character). But i only want to identify english characters.

根据以下代码,\\w将匹配任何单词字符(包括数字、字母或特殊字符)。但我只想识别英文字符。

String stringToMatch = "IranAnAn";
Pattern p = Pattern.compile("(\w)\1+");
Matcher m = p.matcher(stringToMatch);
if (m.find())
{
    System.out.println("Word contains duplicate characters " + m.group(1));
}

UPDATE

更新

Word contains duplicate characters a
Word contains duplicate characters a
Word contains duplicate characters An

采纳答案by assylias

You want to catch as many characters in your set as possible, so instead of (\\w)you should use (\\w+)and you want the sequence to be at the end, so you need to add $(and I have removed the +after \\1which is not useful to detect repetition: only one repetition is needed):

你想在你的集合中捕捉尽可能多的字符,所以(\\w)你应该使用而不是(\\w+)你想要序列在最后,所以你需要添加$(我已经删除了对检测重复没有用的+后面\\1:只需要重复一次):

Pattern p = Pattern.compile("(\w+)\1$");

Your program then outputs Anas expected.

然后您的程序An按预期输出。

Finally, if you only want to capture ascii characters, you can use [a-zA-Z]instead of \\w:

最后,如果您只想捕获 ascii 字符,您可以使用[a-zA-Z]代替\\w

Pattern p = Pattern.compile("([a-zA-Z]+)\1$");

And if you want the character set to be at least 2 characters:

如果您希望字符集至少为 2 个字符:

Pattern p = Pattern.compile("([a-zA-Z]{2,})\1$");

回答by Michael Lang

If by "only English characters" you mean A-Z and a-z, the follow regex will work:

如果“仅英文字符”是指 AZ 和 az,则以下正则表达式将起作用:

".*([A-Za-z]{2,})\1$"