java 编写正则表达式来检测重复字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17793962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Writing a regex to detect repeat-characters
提问by sharon Hwk
I need to write a regex, that would identify a word that have a repeating character setat the end. According to the following code fragment, the repeating character set is An
. I need to write a regex so this will be spotted and displayed.
我需要编写一个正则表达式,它可以识别一个在末尾具有重复字符集的单词。根据以下代码片段,重复字符集为. 我需要编写一个正则表达式,以便发现并显示它。An
According to the following code, \\w
will match any word character (including digit, letter, or special character). But i only want to identify english characters.
根据以下代码,\\w
将匹配任何单词字符(包括数字、字母或特殊字符)。但我只想识别英文字符。
String stringToMatch = "IranAnAn";
Pattern p = Pattern.compile("(\w)\1+");
Matcher m = p.matcher(stringToMatch);
if (m.find())
{
System.out.println("Word contains duplicate characters " + m.group(1));
}
UPDATE
更新
Word contains duplicate characters a
Word contains duplicate characters a
Word contains duplicate characters An
采纳答案by assylias
You want to catch as many characters in your set as possible, so instead of (\\w)
you should use (\\w+)
and you want the sequence to be at the end, so you need to add $
(and I have removed the +
after \\1
which is not useful to detect repetition: only one repetition is needed):
你想在你的集合中捕捉尽可能多的字符,所以(\\w)
你应该使用而不是(\\w+)
你想要序列在最后,所以你需要添加$
(我已经删除了对检测重复没有用的+
后面\\1
:只需要重复一次):
Pattern p = Pattern.compile("(\w+)\1$");
Your program then outputs An
as expected.
然后您的程序An
按预期输出。
Finally, if you only want to capture ascii characters, you can use [a-zA-Z]
instead of \\w
:
最后,如果您只想捕获 ascii 字符,您可以使用[a-zA-Z]
代替\\w
:
Pattern p = Pattern.compile("([a-zA-Z]+)\1$");
And if you want the character set to be at least 2 characters:
如果您希望字符集至少为 2 个字符:
Pattern p = Pattern.compile("([a-zA-Z]{2,})\1$");
回答by Michael Lang
If by "only English characters" you mean A-Z and a-z, the follow regex will work:
如果“仅英文字符”是指 AZ 和 az,则以下正则表达式将起作用:
".*([A-Za-z]{2,})\1$"