Java正则表达式匹配具有特殊字符的精确单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18045397/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 22:02:08  来源:igfitidea点击:

Java Regular Expression to Match Exact Word with Special Characters

javaregexstring

提问by Ankur Raiyani

I have list of keywords entered by the user and they may contains the special characters like $, #, @, ^, &,etc.

我有用户输入的关键字列表,它们可能包含特殊字符等$, #, @, ^, &,

As per my requirement when ever i receive list of text messages i need to search for all the keywords in every message.

根据我的要求,当我收到短信列表时,我需要搜索每条消息中的所有关键字。

We need to match exact keyword.

我们需要匹配精确的关键字

CASE 1: Simple Keyword - Simple Message

案例 1:简单关键字 - 简单消息

I used \bto match exact keyword and it worksfine.

我曾经\b匹配精确的关键字,它工作正常。

public static void main(String[] args) {
        String patternStr =  "(?i)\bHello\b";

        Pattern pattern = Pattern.compile(patternStr);

        List<String> strList = new ArrayList<String>();
        strList.add("HHello Message");
        strList.add("This is Hello Message ");
        strList.add("Now Hellos again.");

        for(String str : strList) {
            Matcher matcher = pattern.matcher(str);
            System.out.println(">> "+matcher.find());
        }
    }

OUTPUT as Expected

按预期输出

>> false
>> true
>> false

CASE 2 : Simple Keyword - Message with Special Character

案例 2:简单关键字 - 带有特殊字符的消息

Now, if i run above same code for following messages then it didn't workas expected.

现在,如果我为以下消息运行相同的代码,那么它没有按预期工作

List<String> strList = new ArrayList<String>();
strList.add("#Hello Message");
strList.add("This is Hello Message ");
strList.add("Now Hellos again.");

OUTPUT:

输出:

true
true
false

Expected OUTPUT

预期输出

false
true
false

CASE 3 : Keyword & Message with Special Character

案例 3 : 带有特殊字符的关键字和消息

If i receive following messages and Keyword is #Hello. I wrote following code but it didn't work.

如果我收到以下消息并且关键字是#Hello. 我写了以下代码,但没有用

public static void main(String[] args) {
        String patternStr =  "(?i)\b#Hello\b";

        Pattern pattern = Pattern.compile(patternStr);

        List<String> strList = new ArrayList<String>();
        strList.add("HHello Message");
        strList.add("This is #Hello Message ");
        strList.add("Now Hellos again.");

        for(String str : strList) {
            Matcher matcher = pattern.matcher(str);
            System.out.println(">> "+matcher.find());
        }
    }

OUTPUT:

输出:

>> false
>> false
>> false

Expected OUTPUT:

预期输出:

>> false
>> true
>> false

How can i escape the special characters and resolveCASE 2 and CASE 3.

我怎样才能转义特殊字符并解决CASE 2 and CASE 3.

Please help.

请帮忙。

采纳答案by Mena

Case 2 seems the opposite as case 3, so I don't think you can combine the Patterns.

第 2 种情况似乎与第 3 种情况相反,因此我认为您不能将Patterns组合起来。

For case 2, your Patterncould look like:

对于情况 2,您Pattern可能看起来像:

Pattern pattern = Pattern.compile("(\s|^)Hello(\s|$)", Pattern.CASE_INSENSITIVE);

In this case we surround the keyword by whitespace or beginning/end of input.

在这种情况下,我们用空格或输入的开头/结尾将关键字包围起来。

For case 3, your Patterncould look like:

对于案例 3,您Pattern可能看起来像:

Pattern pattern = Pattern.compile("[\$#@\^&]Hello(\s|$)", Pattern.CASE_INSENSITIVE);

In this case, we precede the keyword with any of the special characters of your choice (note the escaped reserved characters $and ^), then we accept whitespace or the end of input as the character following the keyword.

在这种情况下,我们在关键字前面加上您选择的任何特殊字符(注意转义的保留字符$^),然后我们接受空格或输入的结尾作为关键字后面的字符。

回答by Alex Shesterov

Use (?:^|\s)("start of text or whitespace") instead of the first \b, and (?:$|\s)("end of text or whitespace") instead of the second \bin your regex.

在正则表达式中使用(?:^|\s)("start of text or whitespace") 而不是 first \b,和(?:$|\s)("end of text or whitespace") 而不是第二个\b

回答by James Robinson

The problem comes from the way that "exact word" is defined. It is not just whitespace that can surround the word to make it a word. For example in most circumstances one would want an exact word match for 'Hello' to work with.

问题来自于“精确词”的定义方式。不仅仅是空格可以围绕单词使其成为单词。例如,在大多数情况下,人们希望与“Hello”完全匹配。

"hello there", "That young man just said hello to that other young man" and "I wish people would still answer the telephone by saying ahoy rather than Hello."

“你好”、“那个年轻人刚刚和那个年轻人打了招呼”和“我希望人们在接电话时仍然会说“嗨”而不是“你好”。

If you want the match to be only split on whitespace then I believe you will have to specify the whitespace condition. Assuming you also want to it to match at the end then I would propose something like this.

如果您希望匹配仅在空格上拆分,那么我相信您将必须指定空格条件。假设你也希望它在最后匹配,那么我会提出这样的建议。

Pattern pattern = Pattern.compile("\(^\| \)" + escapeSearchString(patternString) + "\( \|$\)");

and then have a couple of methods like this

然后有几个这样的方法

public String escapeSearchString(String patternString) {
    StringBuilder stringBuilder = new StringBuilder(patternString.length() * 3);
    for (char c : patternString.toCharArray()) {
        if (isEscapableCharacter(c)) {
            stringBuilder.append("\");
        }
        stringBuilder.append(c);
    }
}

public boolean isEscapableCharacter(char c) {
    switch (c) {
        case '#':
        case '$':
        case '@':
        case '^':
        case '&':
            return true;
        default:
            return false;
    }
}

It would probably be better to iterate over a char[] for the escapable characters and load them from a config file.

为可转义字符迭代 char[] 并从配置文件加载它们可能会更好。

回答by Pshemo

Try maybe this way

试试这种方式

String patternStr = "(?i)(?<=\s|^)"+Pattern.quote(searchedStubstring)+"(?=\s|$)";

(?<=...) and (?=...) is positive look behind and aheadso it will check if before your searchedStubstringwill have

(?<=...) 和 (?=...) 是积极的前后所以它会检查你searchedStubstring是否有

  • white-space \\sor start of the input ^before, and
  • white-space \\sor end of the input &after it.
  • 之前\\s输入的空格或开头^,以及
  • 在它之后\\s的输入的空格或结尾&

Also in case you would like to searched for special characters like $+and others you need to escape them. To do this you can use Pattern.quote(searchedStubstring)

此外,如果您想搜索特殊字符,例如$+和其他字符,您需要对它们进行转义。为此,您可以使用Pattern.quote(searchedStubstring)

回答by a.s.p.

for example if your word want to have special char (for example here '#') at the begining and end of this you have to write the following:

例如,如果您的单词想要在开头和结尾使用特殊字符(例如这里的“#”),您必须编写以下内容:

Pattern p = Pattern.compile("(\s|^|#)"+word+"(\s|\#|$)", Pattern.CASE_INSENSITIVE);

if you want exact match:

如果你想要完全匹配:

Pattern p = Pattern.compile("(\s|^)"+word+"(\s|$)", Pattern.CASE_INSENSITIVE);

with '|' is like OR so you can add as match special char's you want ..for example:

用“|” 就像 OR 所以你可以添加你想要的匹配特殊字符..例如:

Pattern p = Pattern.compile("(\s|^|#|:|-)"+word+"(\s|\#|\,|\.|$)", Pattern.CASE_INSENSITIVE);

char '^' means to detect the string at beginning of line and '$' means at end of line. see more here: Summary of regular-expression constructs

char '^' 表示在行首检测字符串,'$' 表示在行尾。在此处查看更多信息: 正则表达式结构摘要