string 实现字典的最佳数据结构?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10017808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best data structure for implementing a dictionary?
提问by Jatin
What would be the best data structure to store all the words of a dictionary? The best I could think of was to use a HashMap
, which will map to a HashTable
. Basically, depending upon the first character, we will get the associated HashTable
and then using this, we can add the words starting from that character. We'll then pick a good hash function based on the string.
存储字典中所有单词的最佳数据结构是什么?我能想到的最好的方法是使用 a HashMap
,它将映射到 a HashTable
。基本上,根据第一个字符,我们将获得关联HashTable
,然后使用它,我们可以添加从该字符开始的单词。然后我们将根据字符串选择一个好的散列函数。
Is there a better approach?
有没有更好的方法?
回答by templatetypedef
Depending on what you want to do, there are many good data structures.
根据你想要做什么,有很多好的数据结构。
If you just want to store the words and ask "is this word here or not?", a standard hash table with no other fancy machinery is a reasonable approach. If that word is list fixed in advance, consider using a perfect hash tableto get excellent performance and space usage.
如果您只想存储单词并询问“这个单词在这里还是不存在?”,没有其他花哨机制的标准哈希表是一种合理的方法。如果该词是预先确定的列表,请考虑使用完美的哈希表以获得出色的性能和空间使用率。
If you want to be able to check if a given prefix exists while supporting fast lookups, a trieis a good option, though it can be a bit space-inefficient. It also supports fast insertions or deletions. It also allows for iteration in alphabetical order, which hashing doesn't offer. This is essentially the structure you've described in your answer, but depending on the use case other representations of tries might be better.
如果您希望能够在支持快速查找的同时检查给定前缀是否存在,trie是一个不错的选择,尽管它可能有点空间效率低下。它还支持快速插入或删除。它还允许按字母顺序迭代,这是散列不提供的。这本质上是您在答案中描述的结构,但根据用例,其他尝试的表示可能会更好。
If in addition to the above, you know for a fact that the word list is fixed, consider using a DAWG(directed acyclic word graph), which is essentially a minimum-state DFA for the language. It's substantially more compact than the trie, but supports many of the same operations.
如果除上述之外,您知道单词列表是固定的,请考虑使用DAWG(有向无环词图),它本质上是该语言的最小状态 DFA。它比 trie 紧凑得多,但支持许多相同的操作。
If you want trie-like behavior but don't want to pay a huge space penalty, the ternary search treeis another viable option, as is the radix tree. These are very different structures, but can be much better than the trie in different circumstances.
如果您想要类似 trie 的行为但不想付出巨大的空间损失,则三元搜索树是另一个可行的选择,基数树也是如此。这些是非常不同的结构,但在不同情况下可能比 trie 好得多。
If space is a concern but you want a trie, look into the succinct trierepresentation, which has slower lookups but just about theoretically optimal space usage. The link discusses how it's being used in JavaScript as an easy way to transmit a huge amount of data. An alternative compact representation is the double-array trie, though admittedly I know very little about it.
如果空间是一个问题,但你想要一个特里树,看看简洁的特里树表示,它的查找速度较慢,但只是理论上最佳的空间使用。该链接讨论了如何在 JavaScript 中使用它作为传输大量数据的简单方法。另一种紧凑的表示是双数组 trie,尽管我对它知之甚少。
If you want to use the dictionary for operations like spell-checking where you need to find words similar to other words, the BK-treeis an excellent data structure to consider.
如果您想使用字典进行拼写检查等需要查找与其他单词相似的单词的操作,那么BK 树是一个值得考虑的极好数据结构。
Hope this helps!
希望这可以帮助!