Java 字典搜索器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5922956/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Dictionary Searcher
提问by Brendan Lesniak
I am trying to implement a program that will take a users input, split that string into tokens, and then search a dictionary for the words in that string. My goal for the parsed string is to have every single token be an English word.
我正在尝试实现一个程序,该程序将接受用户输入,将该字符串拆分为标记,然后在字典中搜索该字符串中的单词。我对解析字符串的目标是让每个标记都是一个英文单词。
For Example:
例如:
Input:
aman
Split Method:
a man
a m an
a m a n
am an
am a n
ama n
Desired Output:
a man
I currently have this code which does everything up until the desired output part:
我目前有这个代码,它可以完成所有操作,直到所需的输出部分:
import java.util.Scanner;
import java.io.*;
public class Words {
public static String[] dic = new String[80368];
public static void split(String head, String in) {
// head + " " + in is a segmentation
String segment = head + " " + in;
// count number of dictionary words
int count = 0;
Scanner phraseScan = new Scanner(segment);
while (phraseScan.hasNext()) {
String word = phraseScan.next();
for (int i=0; i<dic.length; i++) {
if (word.equalsIgnoreCase(dic[i])) count++;
}
}
System.out.println(segment + "\t" + count + " English words");
// recursive calls
for (int i=1; i<in.length(); i++) {
split(head+" "+in.substring(0,i), in.substring(i,in.length()));
}
}
public static void main (String[] args) throws IOException {
Scanner scan = new Scanner(System.in);
System.out.print("Enter a string: ");
String input = scan.next();
System.out.println();
Scanner filescan = new Scanner(new File("src:\dictionary.txt"));
int wc = 0;
while (filescan.hasNext()) {
dic[wc] = filescan.nextLine();
wc++;
}
System.out.println(wc + " words stored");
split("", input);
}
}
I know there are better ways to store the dictionary (such as a binary search tree or a hash table), but I don't know how to implement those anyway.
我知道有更好的方法来存储字典(例如二叉搜索树或哈希表),但我不知道如何实现这些。
I am stuck on how to implement a method that would check the split string to see if every segment was a word in the dictionary.
我被困在如何实现一种方法来检查拆分字符串以查看每个段是否是字典中的单词。
Any help would be great, Thank you
任何帮助都会很棒,谢谢
回答by WhiteFang34
Splitting the input string every possible way is not going to finish in a reasonable amount of time if you want to support 20 or more characters. Here's a more efficient approach, comments inline:
如果您想支持 20 个或更多字符,以各种可能的方式拆分输入字符串不会在合理的时间内完成。这是一种更有效的方法,内联注释:
public static void main(String[] args) throws IOException {
// load the dictionary into a set for fast lookups
Set<String> dictionary = new HashSet<String>();
Scanner filescan = new Scanner(new File("dictionary.txt"));
while (filescan.hasNext()) {
dictionary.add(filescan.nextLine().toLowerCase());
}
// scan for input
Scanner scan = new Scanner(System.in);
System.out.print("Enter a string: ");
String input = scan.next().toLowerCase();
System.out.println();
// place to store list of results, each result is a list of strings
List<List<String>> results = new ArrayList<>();
long time = System.currentTimeMillis();
// start the search, pass empty stack to represent words found so far
search(input, dictionary, new Stack<String>(), results);
time = System.currentTimeMillis() - time;
// list the results found
for (List<String> result : results) {
for (String word : result) {
System.out.print(word + " ");
}
System.out.println("(" + result.size() + " words)");
}
System.out.println();
System.out.println("Took " + time + "ms");
}
public static void search(String input, Set<String> dictionary,
Stack<String> words, List<List<String>> results) {
for (int i = 0; i < input.length(); i++) {
// take the first i characters of the input and see if it is a word
String substring = input.substring(0, i + 1);
if (dictionary.contains(substring)) {
// the beginning of the input matches a word, store on stack
words.push(substring);
if (i == input.length() - 1) {
// there's no input left, copy the words stack to results
results.add(new ArrayList<String>(words));
} else {
// there's more input left, search the remaining part
search(input.substring(i + 1), dictionary, words, results);
}
// pop the matched word back off so we can move onto the next i
words.pop();
}
}
}
Example output:
示例输出:
Enter a string: aman
a man (2 words)
am an (2 words)
Took 0ms
Here's a much longer input:
这是一个更长的输入:
Enter a string: thequickbrownfoxjumpedoverthelazydog
the quick brown fox jump ed over the lazy dog (10 words)
the quick brown fox jump ed overt he lazy dog (10 words)
the quick brown fox jumped over the lazy dog (9 words)
the quick brown fox jumped overt he lazy dog (9 words)
Took 1ms
回答by dfb
If my answer seems silly, it's because you're really close and I'm not sure where you're stuck.
如果我的回答看起来很傻,那是因为你真的很亲近,我不确定你卡在哪里。
The simplest way given your code above would be to simply add a counter for the number of words and compare that to the number of matched words
给出上面代码的最简单方法是简单地为单词数添加一个计数器,并将其与匹配的单词数进行比较
int count = 0; int total = 0;
Scanner phraseScan = new Scanner(segment);
while (phraseScan.hasNext()) {
total++
String word = phraseScan.next();
for (int i=0; i<dic.length; i++) {
if (word.equalsIgnoreCase(dic[i])) count++;
}
}
if(total==count) System.out.println(segment);
Implementing this as a hash-table might be better (it's faster, for sure), and it'd be really easy.
将它实现为哈希表可能会更好(当然,它更快),而且真的很容易。
HashSet<String> dict = new HashSet<String>()
dict.add("foo")// add your data
int count = 0; int total = 0;
Scanner phraseScan = new Scanner(segment);
while (phraseScan.hasNext()) {
total++
String word = phraseScan.next();
if(dict.contains(word)) count++;
}
There are other, better ways to do this. One is a trie (http://en.wikipedia.org/wiki/Trie) which is a bit slower for lookup but stores data more efficiently. If you have a large dictionary, you might not be able ot fit it in memory, so you could use a database or key-value store like a BDB (http://en.wikipedia.org/wiki/Berkeley_DB)
还有其他更好的方法可以做到这一点。一个是 trie (http://en.wikipedia.org/wiki/Trie),它的查找速度稍慢,但存储数据的效率更高。如果您有一本大字典,您可能无法将其放入内存中,因此您可以使用数据库或键值存储,例如 BDB (http://en.wikipedia.org/wiki/Berkeley_DB)
回答by Justin Jose
package LinkedList;
包链表;
import java.util.LinkedHashSet;
导入 java.util.LinkedHashSet;
public class dictionaryCheck {
公共类字典检查{
private static LinkedHashSet<String> set;
private static int start = 0;
private static boolean flag;
public boolean checkDictionary(String str, int length) {
if (start >= length) {
return flag;
} else {
flag = false;
for (String word : set) {
int wordLen = word.length();
if (start + wordLen <= length) {
if (word.equals(str.substring(start, wordLen + start))) {
start = wordLen + start;
flag = true;
checkDictionary(str, length);
}
}
}
}
return flag;
}
public static void main(String[] args) {
// TODO Auto-generated method stub
set = new LinkedHashSet<String>();
set.add("Jose");
set.add("Nithin");
set.add("Joy");
set.add("Justine");
set.add("Jomin");
set.add("Thomas");
String str = "JoyJustine";
int length = str.length();
boolean c;
dictionaryCheck obj = new dictionaryCheck();
c = obj.checkDictionary(str, length);
if (c) {
System.out
.println("String can be found out from those words in the Dictionary");
} else {
System.out.println("Not Possible");
}
}
}
}