Java正则表达式匹配具有特殊字符的精确单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18045397/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Regular Expression to Match Exact Word with Special Characters
提问by Ankur Raiyani
I have list of keywords entered by the user and they may contains the special characters like $, #, @, ^, &,
etc.
我有用户输入的关键字列表,它们可能包含特殊字符等$, #, @, ^, &,
。
As per my requirement when ever i receive list of text messages i need to search for all the keywords in every message.
根据我的要求,当我收到短信列表时,我需要搜索每条消息中的所有关键字。
We need to match exact keyword.
我们需要匹配精确的关键字。
CASE 1: Simple Keyword - Simple Message
案例 1:简单关键字 - 简单消息
I used \b
to match exact keyword and it worksfine.
我曾经\b
匹配精确的关键字,它工作正常。
public static void main(String[] args) {
String patternStr = "(?i)\bHello\b";
Pattern pattern = Pattern.compile(patternStr);
List<String> strList = new ArrayList<String>();
strList.add("HHello Message");
strList.add("This is Hello Message ");
strList.add("Now Hellos again.");
for(String str : strList) {
Matcher matcher = pattern.matcher(str);
System.out.println(">> "+matcher.find());
}
}
OUTPUT as Expected
按预期输出
>> false
>> true
>> false
CASE 2 : Simple Keyword - Message with Special Character
案例 2:简单关键字 - 带有特殊字符的消息
Now, if i run above same code for following messages then it didn't workas expected.
现在,如果我为以下消息运行相同的代码,那么它没有按预期工作。
List<String> strList = new ArrayList<String>();
strList.add("#Hello Message");
strList.add("This is Hello Message ");
strList.add("Now Hellos again.");
OUTPUT:
输出:
true
true
false
Expected OUTPUT
预期输出
false
true
false
CASE 3 : Keyword & Message with Special Character
案例 3 : 带有特殊字符的关键字和消息
If i receive following messages and Keyword is #Hello
.
I wrote following code but it didn't work.
如果我收到以下消息并且关键字是#Hello
. 我写了以下代码,但没有用。
public static void main(String[] args) {
String patternStr = "(?i)\b#Hello\b";
Pattern pattern = Pattern.compile(patternStr);
List<String> strList = new ArrayList<String>();
strList.add("HHello Message");
strList.add("This is #Hello Message ");
strList.add("Now Hellos again.");
for(String str : strList) {
Matcher matcher = pattern.matcher(str);
System.out.println(">> "+matcher.find());
}
}
OUTPUT:
输出:
>> false
>> false
>> false
Expected OUTPUT:
预期输出:
>> false
>> true
>> false
How can i escape the special characters and resolveCASE 2 and CASE 3
.
我怎样才能转义特殊字符并解决CASE 2 and CASE 3
.
Please help.
请帮忙。
采纳答案by Mena
Case 2 seems the opposite as case 3, so I don't think you can combine the Pattern
s.
第 2 种情况似乎与第 3 种情况相反,因此我认为您不能将Pattern
s组合起来。
For case 2, your Pattern
could look like:
对于情况 2,您Pattern
可能看起来像:
Pattern pattern = Pattern.compile("(\s|^)Hello(\s|$)", Pattern.CASE_INSENSITIVE);
In this case we surround the keyword by whitespace or beginning/end of input.
在这种情况下,我们用空格或输入的开头/结尾将关键字包围起来。
For case 3, your Pattern
could look like:
对于案例 3,您Pattern
可能看起来像:
Pattern pattern = Pattern.compile("[\$#@\^&]Hello(\s|$)", Pattern.CASE_INSENSITIVE);
In this case, we precede the keyword with any of the special characters of your choice (note the escaped reserved characters $
and ^
), then we accept whitespace or the end of input as the character following the keyword.
在这种情况下,我们在关键字前面加上您选择的任何特殊字符(注意转义的保留字符$
和^
),然后我们接受空格或输入的结尾作为关键字后面的字符。
回答by Alex Shesterov
Use (?:^|\s)
("start of text or whitespace") instead of the first \b
, and (?:$|\s)
("end of text or whitespace") instead of the second \b
in your regex.
在正则表达式中使用(?:^|\s)
("start of text or whitespace") 而不是 first \b
,和(?:$|\s)
("end of text or whitespace") 而不是第二个\b
。
回答by James Robinson
The problem comes from the way that "exact word" is defined. It is not just whitespace that can surround the word to make it a word. For example in most circumstances one would want an exact word match for 'Hello' to work with.
问题来自于“精确词”的定义方式。不仅仅是空格可以围绕单词使其成为单词。例如,在大多数情况下,人们希望与“Hello”完全匹配。
"hello there", "That young man just said hello to that other young man" and "I wish people would still answer the telephone by saying ahoy rather than Hello."
“你好”、“那个年轻人刚刚和那个年轻人打了招呼”和“我希望人们在接电话时仍然会说“嗨”而不是“你好”。
If you want the match to be only split on whitespace then I believe you will have to specify the whitespace condition. Assuming you also want to it to match at the end then I would propose something like this.
如果您希望匹配仅在空格上拆分,那么我相信您将必须指定空格条件。假设你也希望它在最后匹配,那么我会提出这样的建议。
Pattern pattern = Pattern.compile("\(^\| \)" + escapeSearchString(patternString) + "\( \|$\)");
and then have a couple of methods like this
然后有几个这样的方法
public String escapeSearchString(String patternString) {
StringBuilder stringBuilder = new StringBuilder(patternString.length() * 3);
for (char c : patternString.toCharArray()) {
if (isEscapableCharacter(c)) {
stringBuilder.append("\");
}
stringBuilder.append(c);
}
}
public boolean isEscapableCharacter(char c) {
switch (c) {
case '#':
case '$':
case '@':
case '^':
case '&':
return true;
default:
return false;
}
}
It would probably be better to iterate over a char[] for the escapable characters and load them from a config file.
为可转义字符迭代 char[] 并从配置文件加载它们可能会更好。
回答by Pshemo
Try maybe this way
试试这种方式
String patternStr = "(?i)(?<=\s|^)"+Pattern.quote(searchedStubstring)+"(?=\s|$)";
(?<=...) and (?=...) is positive look behind and aheadso it will check if before your searchedStubstring
will have
(?<=...) 和 (?=...) 是积极的前后看所以它会检查你searchedStubstring
是否有
- white-space
\\s
or start of the input^
before, and - white-space
\\s
or end of the input&
after it.
- 之前
\\s
输入的空格或开头^
,以及 - 在它之后
\\s
的输入的空格或结尾&
。
Also in case you would like to searched for special characters like $
+
and others you need to escape them. To do this you can use Pattern.quote(searchedStubstring)
此外,如果您想搜索特殊字符,例如$
+
和其他字符,您需要对它们进行转义。为此,您可以使用Pattern.quote(searchedStubstring)
回答by a.s.p.
for example if your word want to have special char (for example here '#') at the begining and end of this you have to write the following:
例如,如果您的单词想要在开头和结尾使用特殊字符(例如这里的“#”),您必须编写以下内容:
Pattern p = Pattern.compile("(\s|^|#)"+word+"(\s|\#|$)", Pattern.CASE_INSENSITIVE);
if you want exact match:
如果你想要完全匹配:
Pattern p = Pattern.compile("(\s|^)"+word+"(\s|$)", Pattern.CASE_INSENSITIVE);
with '|' is like OR so you can add as match special char's you want ..for example:
用“|” 就像 OR 所以你可以添加你想要的匹配特殊字符..例如:
Pattern p = Pattern.compile("(\s|^|#|:|-)"+word+"(\s|\#|\,|\.|$)", Pattern.CASE_INSENSITIVE);
char '^' means to detect the string at beginning of line and '$' means at end of line. see more here: Summary of regular-expression constructs
char '^' 表示在行首检测字符串,'$' 表示在行尾。在此处查看更多信息: 正则表达式结构摘要