java 正则表达式匹配带有特殊字符的单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29077436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 14:39:23  来源:igfitidea点击:

Regex Match Word with special character

javaregex

提问by Med BEN

I would like to look for any word in a file containing different informations like date and percentage and some strings.

我想在包含不同信息(如日期和百分比以及一些字符串)的文件中查找任何单词。

Input:

输入:

21-02-2015 wordA 22 wordB wordC

Result:

结果:

wordA wordB wordC

Please help me for I am new to regex.

请帮助我,因为我是正则表达式的新手。

回答by Med BEN

This is the regex that could retrieve any string including special character answer found it :

这是可以检索任何字符串的正则表达式,包括找到它的特殊字符答案:

 (([a-zA-Z]+)(\W)?([a-zA-Z]+))
  • ([a-zA-Z]+)Look for character in Aa-zZ
  • (\W)look for the special character
  • ([a-zA-Z]+)if the special character is in the middle you look for the rest of the word
  • ([a-zA-Z]+)Aa-zZ 中查找字符
  • (\W)寻找特殊字符
  • ([a-zA-Z]+)如果特殊字符在中间,则查找单词的其余部分

回答by Biffen

Java's regex implementation supports character class intersection, for which this is a textbook usecase.

Java 的正则表达式实现支持字符类交集,这是一个教科书用例。

[\w&&[^\d]]will thus match a word character but not a digit. Together with Pattern.UNICODE_CHARACTER_CLASSit should match ‘special characters'.

[\w&&[^\d]]因此将匹配一个单词字符而不是一个数字。连同Pattern.UNICODE_CHARACTER_CLASS它应该匹配“特殊字符”。

Thus this code:

因此这段代码:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class test {

  public static void main(String... args) {
    String  input   = "21-02-2015 wordA 22 wordB wordC F?rtids";
    Matcher matcher = Pattern.compile("[\w&&[^\d]]+",
                                      Pattern.UNICODE_CHARACTER_CLASS)
                      .matcher(input);

    while (matcher.find() ) {
      System.out.println(matcher.group() );
    }
  }

}

Produces:

产生:

wordA
wordB
wordC
F?rtids