Java 正则表达式 (?i) 与 Pattern.CASE_INSENSITIVE

Question

提问by Paddy

I'm using "\\b(\\w+)(\\W+\\1\\b)+"along with input = input.replaceAll(regex, "$1");to find duplicate words in a string and remove the duplicates. For example the string input = "for for for" would become "for".

我正在使用"\\b(\\w+)(\\W+\\1\\b)+"withinput = input.replaceAll(regex, "$1");在字符串中查找重复的单词并删除重复项。例如，字符串 input = "for for for" 将变为 "for"。

However it is failing to turn "Hello hello" into "Hello" even though I have used Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

然而，即使我使用过它也无法将“Hello hello”变成“Hello” Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

I can correct it by using "(?i)\\b(\\w+)(\\W+\\1\\b)+"but I want to know why this is necessary? Why do I have to use the (?i) flag when I have already specified Pattern.CASE_INSENSITIVE?

我可以通过使用来纠正它，"(?i)\\b(\\w+)(\\W+\\1\\b)+"但我想知道为什么这是必要的？当我已经指定了 Pattern.CASE_INSENSITIVE 时，为什么我必须使用 (?i) 标志？

Heres the full code for clarity:

为了清楚起见，这里是完整的代码：

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DuplicateWords {

public static void main(String[] args) {

    String regex = "\b(\w+)(\W+\1\b)+";
    Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

    Scanner in = new Scanner(System.in);
    int numSentences = Integer.parseInt(in.nextLine());

    while (numSentences-- > 0) {
        String input = in.nextLine();

        Matcher m = p.matcher(input);

        // Check for subsequences of input that match the compiled pattern
        while (m.find()) {
            input = input.replaceAll(regex, "");
        }

        // Prints the modified sentence.
        System.out.println(input);
    }
    in.close();
}
}

Answer 1

采纳答案by anubhava

Your problem is that you're defining a regex with CASE_SENSITIVEflag but not using it correctly in replaceAllmethod.

您的问题是您正在定义带有CASE_SENSITIVE标志的正则表达式，但没有在replaceAll方法中正确使用它。

You can also use (?i)in the middle of the regex for ignore case match of back-reference \1like this:

您还可以(?i)在正则表达式的中间使用忽略反向引用的大小写匹配，\1如下所示：

String repl = "Hello hello".replaceAll("\b(\w+)(\W+(?i:\1)\b)+", "");
//=> Hello

And then use Matcher.replaceAlllater.

然后Matcher.replaceAll稍后使用。

Working Code:

工作代码：

public class DuplicateWords {

    public static void main(String[] args) {

        String regex = "\b(\w+)(\W+(?i:\1)\b)+";
        Pattern p = Pattern.compile(regex);

        // OR this one also works
        // String regex = "\b(\w+)(\W+\1\b)+";
        // Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);

        Scanner in = new Scanner(System.in);
        int numSentences = Integer.parseInt(in.nextLine());

        while (numSentences-- > 0) {
            String input = in.nextLine();

            Matcher m = p.matcher(input);

            // Check for subsequences of input that match the compiled pattern
            if (m.find()) {
                input = m.replaceAll("");
            }

            // Prints the modified sentence.
            System.out.println(input);
        }
        in.close();
    }
}

Java 正则表达式 (?i) 与 Pattern.CASE_INSENSITIVE

提问by Paddy

采纳答案by anubhava

相关推荐

最近更新

标签

Java 正则表达式 (?i) 与 Pattern.CASE_INSENSITIVE

提问by Paddy

采纳答案by anubhava

相关推荐

java Spring Boot 1.4 @DataJpaTest - 创建名为“dataSource”的 bean 时出错

java android.support.v4 库出错

java java中的字符串生成器到字符串数组

java 深度比较两个json并显示差异

相关推荐

最近更新

标签