java 从 Trie 中获取单词列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2794381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting a list of words from a Trie
提问by user330572
I'm looking to use the following code to not check whether there is a word matching in the Trie but to return a list all words beginning with the prefix inputted by the user. Can someone point me in the right direction? I can't get it working at all.....
我希望使用以下代码不检查 Trie 中是否存在匹配的单词,而是返回以用户输入的前缀开头的所有单词的列表。有人可以指出我正确的方向吗?我根本无法让它工作......
public boolean search(String s)
{
Node current = root;
System.out.println("\nSearching for string: "+s);
while(current != null)
{
for(int i=0;i<s.length();i++)
{
if(current.child[(int)(s.charAt(i)-'a')] == null)
{
System.out.println("Cannot find string: "+s);
return false;
}
else
{
current = current.child[(int)(s.charAt(i)-'a')];
System.out.println("Found character: "+ current.content);
}
}
// If we are here, the string exists.
// But to ensure unwanted substrings are not found:
if (current.marker == true)
{
System.out.println("Found string: "+s);
return true;
}
else
{
System.out.println("Cannot find string: "+s +"(only present as a substring)");
return false;
}
}
return false;
}
}
回答by Imroze Aslam
I faced this problem while trying to make a text auto-complete module. I solved the problem by making a Trie in which each node contains it's parent node as well as children. First I searched for the node starting at the input prefix. Then I applied a Traversal on the Trie that explores all the nodes of the sub-tree with it's root as the prefix node. whenever a leaf node is encountered, it means that the end of a word starting from input prefix has been found. Starting from that leaf node I iterate through the parent nodes getting parent of parent, and reach the root of the subtree. While doing so I kept adding the keys of nodes in a stack. In the end I took the prefix and started appended it by popping the stack. I kept on saving the words in an ArrayList. At the end of the traversal I get all the words starting from the input prefix. Here is the code with usage example:
我在尝试制作文本自动完成模块时遇到了这个问题。我通过制作一个 Trie 来解决这个问题,其中每个节点都包含它的父节点和子节点。首先,我搜索从输入前缀开始的节点。然后我在 Trie 上应用了一个遍历,它探索了子树的所有节点,它的根作为前缀节点。每当遇到叶节点时,就意味着已找到从输入前缀开始的单词的结尾。从该叶节点开始,我遍历父节点,获取父节点的父节点,并到达子树的根节点。在这样做的同时,我不断在堆栈中添加节点的键。最后,我采用了前缀并通过弹出堆栈开始附加它。我一直将单词保存在 ArrayList 中。在遍历结束时,我得到了从输入前缀开始的所有单词。
class TrieNode
{
char c;
TrieNode parent;
HashMap<Character, TrieNode> children = new HashMap<Character, TrieNode>();
boolean isLeaf;
public TrieNode() {}
public TrieNode(char c){this.c = c;}
}
-
——
public class Trie
{
private TrieNode root;
ArrayList<String> words;
TrieNode prefixRoot;
String curPrefix;
public Trie()
{
root = new TrieNode();
words = new ArrayList<String>();
}
// Inserts a word into the trie.
public void insert(String word)
{
HashMap<Character, TrieNode> children = root.children;
TrieNode crntparent;
crntparent = root;
//cur children parent = root
for(int i=0; i<word.length(); i++)
{
char c = word.charAt(i);
TrieNode t;
if(children.containsKey(c)){ t = children.get(c);}
else
{
t = new TrieNode(c);
t.parent = crntparent;
children.put(c, t);
}
children = t.children;
crntparent = t;
//set leaf node
if(i==word.length()-1)
t.isLeaf = true;
}
}
// Returns if the word is in the trie.
public boolean search(String word)
{
TrieNode t = searchNode(word);
if(t != null && t.isLeaf){return true;}
else{return false;}
}
// Returns if there is any word in the trie
// that starts with the given prefix.
public boolean startsWith(String prefix)
{
if(searchNode(prefix) == null) {return false;}
else{return true;}
}
public TrieNode searchNode(String str)
{
Map<Character, TrieNode> children = root.children;
TrieNode t = null;
for(int i=0; i<str.length(); i++)
{
char c = str.charAt(i);
if(children.containsKey(c))
{
t = children.get(c);
children = t.children;
}
else{return null;}
}
prefixRoot = t;
curPrefix = str;
words.clear();
return t;
}
///////////////////////////
void wordsFinderTraversal(TrieNode node, int offset)
{
// print(node, offset);
if(node.isLeaf==true)
{
//println("leaf node found");
TrieNode altair;
altair = node;
Stack<String> hstack = new Stack<String>();
while(altair != prefixRoot)
{
//println(altair.c);
hstack.push( Character.toString(altair.c) );
altair = altair.parent;
}
String wrd = curPrefix;
while(hstack.empty()==false)
{
wrd = wrd + hstack.pop();
}
//println(wrd);
words.add(wrd);
}
Set<Character> kset = node.children.keySet();
//println(node.c); println(node.isLeaf);println(kset);
Iterator itr = kset.iterator();
ArrayList<Character> aloc = new ArrayList<Character>();
while(itr.hasNext())
{
Character ch = (Character)itr.next();
aloc.add(ch);
//println(ch);
}
// here you can play with the order of the children
for( int i=0;i<aloc.size();i++)
{
wordsFinderTraversal(node.children.get(aloc.get(i)), offset + 2);
}
}
void displayFoundWords()
{
println("_______________");
for(int i=0;i<words.size();i++)
{
println(words.get(i));
}
println("________________");
}
}//
Example
例子
Trie prefixTree;
prefixTree = new Trie();
prefixTree.insert("GOING");
prefixTree.insert("GONG");
prefixTree.insert("PAKISTAN");
prefixTree.insert("SHANGHAI");
prefixTree.insert("GONDAL");
prefixTree.insert("GODAY");
prefixTree.insert("GODZILLA");
if( prefixTree.startsWith("GO")==true)
{
TrieNode tn = prefixTree.searchNode("GO");
prefixTree.wordsFinderTraversal(tn,0);
prefixTree.displayFoundWords();
}
if( prefixTree.startsWith("GOD")==true)
{
TrieNode tn = prefixTree.searchNode("GOD");
prefixTree.wordsFinderTraversal(tn,0);
prefixTree.displayFoundWords();
}
回答by polygenelubricants
The simplest solution is to use a depth-first search.
最简单的解决方案是使用深度优先搜索。
You go down the trie, matching letter by letter from the input. Then, once you have no more letter to match, everything under that node are strings that you want. Recursively explore that whole subtrie, building the string as you go down to its nodes.
你沿着树搜索,从输入中一个字母一个字母地匹配。然后,一旦没有更多的字母可以匹配,该节点下的所有内容都是您想要的字符串。递归地探索整个子树,在你深入到它的节点时构建字符串。
回答by code muncher
After building Trie, you could do DFS starting from node, where you found prefix:
构建 Trie 后,您可以从节点开始执行 DFS,在那里您找到了前缀:
Here Node is Trie node, word=till now found word, res = list of words
def dfs(self, node, word, res):
# Base condition: when at leaf node, add current word into our list
if EndofWord at node:
res.append(word)
return
# For each level, go deep down, but DFS fashion
# add current char into our current word.
for w in node:
self.dfs(node[w], word + w, res)
回答by IVlad
This is easier to solve recursively in my opinion. It would go something like this:
在我看来,这更容易递归解决。它会是这样的:
- Write a recursive function
Printthat prints all the nodes in the trie rooted in the node you give as parameter. Wikitells you how to do this (look at sorting). - Find the last character of your prefix, and the node that is labeled with the character, going down from the root in your trie. Call the
Printfunction with this node as the parameter. Then just make sure you also output the prefix before each word, since this will give you all the words without their prefix.
- 编写一个递归函数
Print,打印以您作为参数提供的节点为根的特里树中的所有节点。Wiki告诉您如何执行此操作(查看排序)。 - 找到前缀的最后一个字符,以及标记有该字符的节点,从树中的根开始向下。
Print以该节点为参数调用函数。然后确保您还在每个单词之前输出前缀,因为这将为您提供所有没有前缀的单词。
If you don't really care about efficiency, you can just run Printwith the main root node and only print those words that start with the prefix you're interested in. This is easier to implement but slower.
如果你真的不关心效率,你可以只Print用主根节点运行,只打印那些以你感兴趣的前缀开头的单词。这更容易实现,但速度更慢。
回答by Oak
You need to traverse the sub-tree starting at the node you found for the prefix.
您需要从为前缀找到的节点开始遍历子树。
Start in the same way, i.e. finding the correct node. Then, instead of checking its marker, traverse that tree (i.e. go over all its descendants; a DFSis a good way to do it) , saving the substring used to reach the "current" node from the first node.
以同样的方式开始,即找到正确的节点。然后,不是检查其标记,而是遍历该树(即遍历其所有后代;DFS是一种很好的方法),保存用于从第一个节点到达“当前”节点的子字符串。
If the current node is marked as a word, output* the prefix + substring reached.
如果当前节点被标记为单词,则输出*到达的前缀+子串。
* or add it to a list or something.
* 或将其添加到列表或其他内容中。
回答by serega
I built a trie once for one of ITApuzzles
我为其中一个ITA拼图构建了一次尝试
public class WordTree {
class Node {
private final char ch;
/**
* Flag indicates that this node is the end of the string.
*/
private boolean end;
private LinkedList<Node> children;
public Node(char ch) {
this.ch = ch;
}
public void addChild(Node node) {
if (children == null) {
children = new LinkedList<Node>();
}
children.add(node);
}
public Node getNode(char ch) {
if (children == null) {
return null;
}
for (Node child : children) {
if (child.getChar() == ch) {
return child;
}
}
return null;
}
public char getChar() {
return ch;
}
public List<Node> getChildren() {
if (this.children == null) {
return Collections.emptyList();
}
return children;
}
public boolean isEnd() {
return end;
}
public void setEnd(boolean end) {
this.end = end;
}
}
Node root = new Node(' ');
public WordTree() {
}
/**
* Searches for a strings that match the prefix.
*
* @param prefix - prefix
* @return - list of strings that match the prefix, or empty list of no matches are found.
*/
public List<String> getWordsForPrefix(String prefix) {
if (prefix.length() == 0) {
return Collections.emptyList();
}
Node node = getNodeForPrefix(root, prefix);
if (node == null) {
return Collections.emptyList();
}
List<LinkedList<Character>> chars = collectChars(node);
List<String> words = new ArrayList<String>(chars.size());
for (LinkedList<Character> charList : chars) {
words.add(combine(prefix.substring(0, prefix.length() - 1), charList));
}
return words;
}
private String combine(String prefix, List<Character> charList) {
StringBuilder sb = new StringBuilder(prefix);
for (Character character : charList) {
sb.append(character);
}
return sb.toString();
}
private Node getNodeForPrefix(Node node, String prefix) {
if (prefix.length() == 0) {
return node;
}
Node next = node.getNode(prefix.charAt(0));
if (next == null) {
return null;
}
return getNodeForPrefix(next, prefix.substring(1, prefix.length()));
}
private List<LinkedList<Character>> collectChars(Node node) {
List<LinkedList<Character>> chars = new ArrayList<LinkedList<Character>>();
if (node.getChildren().size() == 0) {
chars.add(new LinkedList<Character>(Collections.singletonList(node.getChar())));
} else {
if (node.isEnd()) {
chars.add(new LinkedList<Character>
Collections.singletonList(node.getChar())));
}
List<Node> children = node.getChildren();
for (Node child : children) {
List<LinkedList<Character>> childList = collectChars(child);
for (LinkedList<Character> characters : childList) {
characters.push(node.getChar());
chars.add(characters);
}
}
}
return chars;
}
public void addWord(String word) {
addWord(root, word);
}
private void addWord(Node parent, String word) {
if (word.trim().length() == 0) {
return;
}
Node child = parent.getNode(word.charAt(0));
if (child == null) {
child = new Node(word.charAt(0));
parent.addChild(child);
} if (word.length() == 1) {
child.setEnd(true);
} else {
addWord(child, word.substring(1, word.length()));
}
}
public static void main(String[] args) {
WordTree tree = new WordTree();
tree.addWord("world");
tree.addWord("work");
tree.addWord("wolf");
tree.addWord("life");
tree.addWord("love");
System.out.println(tree.getWordsForPrefix("wo"));
}
}
}
回答by David Chavez
Here is an implementation in C++
这是在 C++ 中的实现
https://github.com/dchavezlive/Basic-Trie
https://github.com/dchavezlive/Basic-Trie
In your search function, you can have it return the node of where the prefix ends. If you make sure your node then has a field to save every child (vector?), then you can list all the children from that node where your prefix ends.
在您的搜索功能中,您可以让它返回前缀结束的节点。如果您确保您的节点有一个字段来保存每个子节点(向量?),那么您可以列出该节点中前缀结尾的所有子节点。
回答by Woot4Moo
You would need to use a ListList<String> myList = new ArrayList<String>();
if(matchingStringFound)
myList.add(stringToAdd);
您需要使用列表List<String> myList = new ArrayList<String>();
if(matchingStringFound)
myList.add(stringToAdd);
回答by Keith Randall
After your for loop, add a call to printAllStringsInTrie(current, s);
在 for 循环之后,添加对 printAllStringsInTrie(current, s); 的调用。
void printAllStringsInTrie(Node t, String prefix) {
if (t.current_marker) System.out.println(prefix);
for (int i = 0; i < t.child.length; i++) {
if (t.child[i] != null) {
printAllStringsInTrie(t.child[i], prefix + ('a' + i)); // does + work on (String, char)?
}
}
}
回答by SHIFA KHAN
The below recursive code can be used where your TrieNode is like this: This code works fine.
下面的递归代码可用于您的 TrieNode 是这样的:此代码工作正常。
TrieNode(char c)
{
this.con=c;
this.isEnd=false;
list=new ArrayList<TrieNode>();
count=0;
}
//--------------------------------------------------
public void Print(TrieNode root1, ArrayList<Character> path)
{
if(root1==null)
return;
if(root1.isEnd==true)
{
//print the entire path
ListIterator<Character> itr1=path.listIterator();
while(itr1.hasNext())
{
System.out.print(itr1.next());
}
System.out.println();
return;
}
else{
ListIterator<TrieNode> itr=root1.list.listIterator();
while(itr.hasNext())
{
TrieNode child=itr.next();
path.add(child.con);
Print(child,path);
path.remove(path.size()-1);
}
}

