使用 Java 查找句子中的确切单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15779632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 20:50:10  来源:igfitidea点击:

Find exact word in a sentence using Java

javastring

提问by Tazo

I am writing a code to spot country names in the text. I am using a dictionary with names of countries say India, America, Sri Lanka, .... I am currently using text.contains(key)with keyfrom the dictionary. However, this returns true even for a string like Indiana. I tried putting the words of the sentence in an array and then doing the contains, similar approach can be considered with equals but they are really slow. Is there any other faster way you could think of?

我正在编写代码以在文本中识别国家/地区名称。我正在使用带有国家名称的字典说India, America, Sri Lanka, ...。我目前正在使用字典中的text.contains(key)with key。但是,即使对于像Indiana. 我尝试将句子的单词放在一个数组中,然后进行包含,可以考虑使用 equals 类似的方法,但它们真的很慢。你能想到其他更快的方法吗?

回答by Arun P Johny

Try to use word boundaryclass \b

尝试使用词边界\b

s.matches(".*\b" + key + "\b.*")

回答by Navin

Maybe you should be using some text processing library.

也许您应该使用一些文本处理库。

Here is a regex solution:

这是一个正则表达式解决方案:

import java.util.regex.*;
import static java.lang.System.*;
public class SO {
    public static void main(String[] args) {
        String[] dict={"india","america"};
        String patStr=".*\b(" + combine(dict,"|") + ")\b.*";
        out.println("pattern: "+patStr+"\n");
        Pattern pat=Pattern.compile(patStr);

        String input1="hello world india indiana";
        out.println(input1+"\t"+pat.matcher(input1).matches());

        String input2="hello world america americana";
        out.println(input2+"\t"+pat.matcher(input2).matches());

        String input3="hello world indiana amercana";
        out.println(input3+"\t"+pat.matcher(input3).matches());
    }
    static String combine(String[] s, String glue){
      int k=s.length;
      if (k==0) return null;
      StringBuilder out=new StringBuilder();
      out.append(s[0]);
      for (int x=1;x<k;++x)
        out.append(glue).append(s[x]);
      return out.toString();
    }
}

Output:

输出:

pattern: .*\b(india|america)\b.*

hello world india indiana       true
hello world america americana   true
hello world indiana amercana    false

回答by Pradeep Pati

contains()should have worked. You can also try String.indexOf(String). If it returns anything other than -1, that query string exists in the said String, otherwise not.

contains()应该有效。你也可以试试String.indexOf(String)。如果它返回 -1 以外的任何内容,则该查询字符串存在于所述字符串中,否则不存在。