java 将字典存储在哈希表中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12310243/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Storing a dictionary in a hashtable
提问by user1154644
I have an assignment that I am working on, and I can't get a hold of the professor to get clarity on something. The idea is that we are writing an anagram solver, using a given set of words, that we store in 3 different dictionary classes: Linear, Binary, and Hash.
我有一项正在处理的作业,我无法联系教授以弄清楚某些事情。这个想法是我们正在编写一个字谜求解器,使用一组给定的单词,我们存储在 3 个不同的字典类中:线性、二进制和哈希。
So we read in the words from a textfile, and for the first 2 dictionary objects(linear and binary), we store the words as an ArrayList...easy enough.
所以我们从文本文件中读取单词,对于前 2 个字典对象(线性和二进制),我们将单词存储为 ArrayList ......很容易。
But for the HashDictionary, he want's us to store the words in a HashTable. I'm just not sure what the values are going to be for the HashTable, or why you would do that. The instructions say we store the words in a Hashtable for quick retrieval, but I just don't get what the point of that is. Makes sense to store words in an arraylist, but I'm just not sure of how key/value pairing helps with a dictionary.
但是对于 HashDictionary,他希望我们将单词存储在 HashTable 中。我只是不确定 HashTable 的值是什么,或者你为什么要这样做。说明说我们将单词存储在 Hashtable 中以便快速检索,但我只是不明白这是什么意思。将单词存储在数组列表中是有意义的,但我不确定键/值配对如何帮助字典。
Maybe i'm not giving enough details, but I figured maybe someone would have seen something like this and its obvious to them.
也许我没有提供足够的细节,但我想也许有人会看到这样的事情,而且对他们来说很明显。
Each of our classes has a contains method, that returns a boolean representing whether or not a word passed in is in the dictionary, so the linear does a linear search of the arraylist, the binary does a binary search of the arraylist, and I'm not sure about the hash....
我们的每个类都有一个 contains 方法,该方法返回一个布尔值,表示传入的单词是否在字典中,因此线性对数组列表进行线性搜索,二进制对数组列表进行二分搜索,然后 I'我不确定哈希....
回答by cheeken
The difference is speed. Both methods work, but the hash table is fast.
区别在于速度。两种方法都有效,但哈希表速度很快。
When you use an ArrayList
, or any sort of List
, to find an element, you must inspect each list item, one by one, until you find the desired word. If the word isn't there, you've looped through the entire list.
当您使用ArrayList
或任何类型的List
来查找元素时,您必须一项一项地检查每个列表项,直到找到所需的单词。如果该词不存在,则您已经遍历了整个列表。
When you use a HashTable
, you perform some "magic" on the word you are looking up known as calculating the word's hash. Using that hash value, instead of looping through a list of values, you can immediately deduce where to find your word - or, if your word doesn't exist in the hash, that your word isn't there.
当您使用 a 时HashTable
,您对正在查找的单词执行一些“魔术”,称为计算单词的哈希值。使用该哈希值,而不是遍历值列表,您可以立即推断出在哪里可以找到您的单词 - 或者,如果您的单词不存在于哈希中,则您的单词不存在。
I've oversimplified here, but that's the general idea. You can find another question herewith a variety of explanations on how a hash table works.
我在这里过于简化了,但这是总体思路。您可以在此处找到另一个问题,其中包含有关哈希表如何工作的各种解释。
Here is a small code snippet utilizing a HashMap
.
这是一个使用HashMap
.
// We will map our words to their definitions; word is the key, definition is the value
Map<String, String> dictionary = new HashMap<String, String>();
map.put("hello","A common salutation");
map.put("chicken","A delightful vessel for protein");
// Later ...
map.get("chicken"); // Returns "A delightful vessel for protein";
The problem you describe asks that you use a HashMap
as the basis for a dictionary that fulfills three requirements:
您描述的问题要求您使用 aHashMap
作为满足三个要求的字典的基础:
- Adding a word to the dictionary
- Removing a word from the dictionary
- Checking if a word is in the dictionary
- 在字典中添加一个单词
- 从字典中删除一个词
- 检查单词是否在字典中
It seems counter-intuitive to use a map, which stores a key and a value, since all you really want to is store just a key (or just a value). However, as I described above, a HashMap
makes it extremely quick to find the value associated with a key. Similarly, it makes it extremely quick to see if the HashMap
knows about a key at all. We can leverage this quality by storing each of the dictionary words as a key in the HashMap
, and associating it with a garbage value (since we don't care about it), such as null
.
使用存储键和值的映射似乎违反直觉,因为您真正想要的只是存储一个键(或仅存储一个值)。但是,正如我上面所描述的,aHashMap
可以非常快速地找到与键关联的值。同样,它可以非常快速地查看是否HashMap
知道密钥。我们可以通过将每个字典单词存储为 中的键HashMap
,并将其与垃圾值(因为我们不关心它)相关联,例如null
.
You can see how to fulfill the three requirements, as follows.
您可以看到如何满足这三个要求,如下所示。
Map<String, Object> map = new HashMap<String, Object>();
// Add a word
map.put('word', null);
// Remove a word
map.remove('word');
// Check for the presence of a word
map.containsKey('word');
I don't want to overload you with information, but the requirements we have here align with a data structure known as a Set
. In Java, a commonly used Set
is the HashSet
, which is almost exactly what you are implementing with this bit of your homework assignment. (In fact, if this weren't a homework assignment explicitly instructing you to use a HashMap
, I'd recommend you instead use a HashSet
.)
我不想让您过多地提供信息,但我们这里的要求与称为Set
. 在 Java 中,一个常用的Set
是HashSet
,这几乎正是您在家庭作业中实现的内容。(事实上,如果这不是一项明确指示您使用 的家庭作业HashMap
,我建议您改为使用HashSet
。)
回答by djechlin
Arrays are hard to find stuff in. If I gave you array[0] = "cat"; array[1] = "dog"; array[2] = "pikachu";
, you'd have to check each element just to know if jigglypuff is a word. If I gave you hash["cat"] = 1; hash["dog"] = 1; hash["pikachu"] = 1;"
, instant to do this in, you just look it up directly. The value 1 doesn't matter in this particular case although you can put useful information there, such as how many times youv'e looked up a word, or maybe 1 will mean real word and 2 will mean name of a Pokemon, or for a real dictionary it could contain a sentence-long definition. Less relevant.
数组很难找到东西。如果我给了你array[0] = "cat"; array[1] = "dog"; array[2] = "pikachu";
,你必须检查每个元素才能知道 jigglypuff 是否是一个词。如果我给了你hash["cat"] = 1; hash["dog"] = 1; hash["pikachu"] = 1;"
,马上做这个,你直接查一下就行了。在这种特殊情况下,值 1 无关紧要,尽管您可以在其中放置有用的信息,例如您查找某个单词的次数,或者 1 表示真实单词,2 表示 Pokemon 的名称,或者一本真正的字典,它可以包含一个句子长的定义。不太相关。
回答by paddy
It sounds like you don't really understand hash tables then. Even Wikipediahas a good explanation of this data structure.
听起来您那时并不真正了解哈希表。甚至维基百科对这种数据结构也有很好的解释。
Your hash table is just going to be a large array of strings (initially all empty). You compute a hash value using the characters in your word, and then insert the word at that position in the table.
您的哈希表将是一个大的字符串数组(最初都是空的)。您使用单词中的字符计算哈希值,然后在表中的该位置插入单词。
There are issues when the hash value for two words is the same. And there are a few solutions. One is to store a list at each array position and just shove the word onto that list. Another is to step through the table by a known amount until you find a free position. Another is to compute a secondary hash using a different algorithm.
当两个单词的哈希值相同时会出现问题。并且有一些解决方案。一种是在每个数组位置存储一个列表,然后将单词推到该列表上。另一种方法是按已知量逐步遍历表格,直到找到一个空闲位置。另一种方法是使用不同的算法计算二级散列。
The point of this is that hash lookup is fast. It's very quick to compute a hash value, and then all you have to do is check that the word at that array position exists (and matches the search word). You follow the same rules for hash value collisions (in this case, mismatches) that you used for the insertion.
关键是哈希查找速度很快。计算散列值非常快,然后您要做的就是检查该数组位置的单词是否存在(并与搜索单词匹配)。对于用于插入的哈希值冲突(在本例中为不匹配),您遵循相同的规则。
You want your table size to be a prime number that is larger than the number of elements you intend to store. You also need a hash function that diverges quickly so that your data is more likely to be dispersed widely through your hash table (rather than being clustered heavily in one region).
您希望您的表大小是一个大于您打算存储的元素数量的素数。您还需要一个快速发散的散列函数,以便您的数据更有可能通过散列表广泛分布(而不是集中在一个区域中)。
Hope this is a help and points you in the right direction.
希望这对您有所帮助,并为您指明正确的方向。