使用 Java 查找句子中的确切单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15779632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find exact word in a sentence using Java
提问by Tazo
I am writing a code to spot country names in the text. I am using a dictionary with names of countries say India, America, Sri Lanka, ...
. I am currently using text.contains(key)
with key
from the dictionary. However, this returns true even for a string like Indiana
. I tried putting the words of the sentence in an array and then doing the contains, similar approach can be considered with equals but they are really slow. Is there any other faster way you could think of?
我正在编写代码以在文本中识别国家/地区名称。我正在使用带有国家名称的字典说India, America, Sri Lanka, ...
。我目前正在使用字典中的text.contains(key)
with key
。但是,即使对于像Indiana
. 我尝试将句子的单词放在一个数组中,然后进行包含,可以考虑使用 equals 类似的方法,但它们真的很慢。你能想到其他更快的方法吗?
回答by Arun P Johny
回答by Navin
Maybe you should be using some text processing library.
也许您应该使用一些文本处理库。
Here is a regex solution:
这是一个正则表达式解决方案:
import java.util.regex.*;
import static java.lang.System.*;
public class SO {
public static void main(String[] args) {
String[] dict={"india","america"};
String patStr=".*\b(" + combine(dict,"|") + ")\b.*";
out.println("pattern: "+patStr+"\n");
Pattern pat=Pattern.compile(patStr);
String input1="hello world india indiana";
out.println(input1+"\t"+pat.matcher(input1).matches());
String input2="hello world america americana";
out.println(input2+"\t"+pat.matcher(input2).matches());
String input3="hello world indiana amercana";
out.println(input3+"\t"+pat.matcher(input3).matches());
}
static String combine(String[] s, String glue){
int k=s.length;
if (k==0) return null;
StringBuilder out=new StringBuilder();
out.append(s[0]);
for (int x=1;x<k;++x)
out.append(glue).append(s[x]);
return out.toString();
}
}
Output:
输出:
pattern: .*\b(india|america)\b.*
hello world india indiana true
hello world america americana true
hello world indiana amercana false
回答by Pradeep Pati
contains()
should have worked. You can also try String.indexOf(String)
. If it returns anything other than -1, that query string exists in the said String, otherwise not.
contains()
应该有效。你也可以试试String.indexOf(String)
。如果它返回 -1 以外的任何内容,则该查询字符串存在于所述字符串中,否则不存在。