Java 正则表达式 (?i) 与 Pattern.CASE_INSENSITIVE
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41471321/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java regex (?i) vs Pattern.CASE_INSENSITIVE
提问by Paddy
I'm using "\\b(\\w+)(\\W+\\1\\b)+"
along with input = input.replaceAll(regex, "$1");
to find duplicate words in a string and remove the duplicates. For example the string input = "for for for" would become "for".
我正在使用"\\b(\\w+)(\\W+\\1\\b)+"
withinput = input.replaceAll(regex, "$1");
在字符串中查找重复的单词并删除重复项。例如,字符串 input = "for for for" 将变为 "for"。
However it is failing to turn "Hello hello" into "Hello" even though I have used Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
然而,即使我使用过它也无法将“Hello hello”变成“Hello” Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
I can correct it by using "(?i)\\b(\\w+)(\\W+\\1\\b)+"
but I want to know why this is necessary? Why do I have to use the (?i) flag when I have already specified Pattern.CASE_INSENSITIVE?
我可以通过使用来纠正它,"(?i)\\b(\\w+)(\\W+\\1\\b)+"
但我想知道为什么这是必要的?当我已经指定了 Pattern.CASE_INSENSITIVE 时,为什么我必须使用 (?i) 标志?
Heres the full code for clarity:
为了清楚起见,这里是完整的代码:
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class DuplicateWords {
public static void main(String[] args) {
String regex = "\b(\w+)(\W+\1\b)+";
Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Scanner in = new Scanner(System.in);
int numSentences = Integer.parseInt(in.nextLine());
while (numSentences-- > 0) {
String input = in.nextLine();
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
while (m.find()) {
input = input.replaceAll(regex, "");
}
// Prints the modified sentence.
System.out.println(input);
}
in.close();
}
}
采纳答案by anubhava
Your problem is that you're defining a regex with CASE_SENSITIVE
flag but not using it correctly in replaceAll
method.
您的问题是您正在定义带有CASE_SENSITIVE
标志的正则表达式,但没有在replaceAll
方法中正确使用它。
You can also use (?i)
in the middle of the regex for ignore case match of back-reference \1
like this:
您还可以(?i)
在正则表达式的中间使用忽略反向引用的大小写匹配,\1
如下所示:
String repl = "Hello hello".replaceAll("\b(\w+)(\W+(?i:\1)\b)+", "");
//=> Hello
And then use Matcher.replaceAll
later.
然后Matcher.replaceAll
稍后使用。
Working Code:
工作代码:
public class DuplicateWords {
public static void main(String[] args) {
String regex = "\b(\w+)(\W+(?i:\1)\b)+";
Pattern p = Pattern.compile(regex);
// OR this one also works
// String regex = "\b(\w+)(\W+\1\b)+";
// Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Scanner in = new Scanner(System.in);
int numSentences = Integer.parseInt(in.nextLine());
while (numSentences-- > 0) {
String input = in.nextLine();
Matcher m = p.matcher(input);
// Check for subsequences of input that match the compiled pattern
if (m.find()) {
input = m.replaceAll("");
}
// Prints the modified sentence.
System.out.println(input);
}
in.close();
}
}