Java 散列如何具有 o(1) 搜索时间?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4363539/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does hashing have an o(1) search time?
提问by algo-geeks
When we use a HashTable
for storing data, it is said that searching takes o(1) time. I am confused, can anybody explain?
当我们使用 aHashTable
来存储数据时,据说搜索需要 o(1) 时间。我很困惑,谁能解释一下?
采纳答案by mgiuca
Well it's a littlebit of a lie -- it can take longer than that, but it usually doesn't.
嗯,这有点撒谎——它可能需要更长的时间,但通常不会。
Basically, a hash table is an array containing all of the keys to search on. The position of each key in the array is determined by the hash function, which can be any function which always maps the same input to the same output. We shall assume that the hash function is O(1).
基本上,哈希表是一个包含所有要搜索的键的数组。数组中每个键的位置由哈希函数确定,哈希函数可以是始终将相同输入映射到相同输出的任何函数。我们假设散列函数是 O(1)。
So when we insert something into the hash table, we use the hash function (let's call it h) to find the location where to put it, and put it there. Now we insert another thing, hashing and storing. And another. Each time we insert data, it takes O(1) time to insert it (since the hash function is O(1).
所以当我们向哈希表中插入一些东西时,我们使用哈希函数(我们称之为h)来找到放置它的位置,然后把它放在那里。现在我们插入另一件事,散列和存储。还有一个。每次我们插入数据时,插入它需要 O(1) 时间(因为哈希函数是 O(1))。
Looking up data is the same. If we want to find a value, x, we have only to find out h(x), which tells us where xis located in the hash table. So we can look up any hash value in O(1) as well.
查资料也是一样。如果我们想找到一个值x,我们只需要找出h(x),它告诉我们x在哈希表中的位置。所以我们也可以在 O(1) 中查找任何哈希值。
Now to the lie: The above is a very nice theory with one problem: what if we insert data and there is already something in that position of the array? There is nothing which guarantees that the hash function won't produce the same output for two different inputs (unless you have a perfect hash function, but those are tricky to produce). Therefore, when we insert we need to take one of two strategies:
现在撒谎:上面是一个很好的理论,有一个问题:如果我们插入数据并且数组的那个位置已经有东西怎么办?没有什么可以保证散列函数不会为两个不同的输入产生相同的输出(除非你有一个完美的散列函数,但这些函数很难产生)。因此,当我们插入时,我们需要采取以下两种策略之一:
- Store multiple values at each spot in the array (say, each array slot has a linked list). Now when you do a lookup, it is still O(1) to arrive at the correct place in the array, but potentially a linear search down a (hopefully short) linked list. This is called "separate chaining".
- If you find something is already there, hash again and find another location. Repeat until you find an empty spot, and put it there. The lookup procedure can follow the same rules to find the data. Now it's still O(1) to get to the first location, but there is a potentially (hopefully short) linear search to bounce around the table till you find the data you are after. This is called "open addressing".
- 在数组中的每个位置存储多个值(例如,每个数组槽都有一个链表)。现在,当您进行查找时,到达数组中的正确位置仍然是 O(1),但可能会沿(希望是短的)链表进行线性搜索。这称为“单独链接”。
- 如果您发现某些东西已经存在,请再次散列并找到另一个位置。重复直到你找到一个空位,然后把它放在那里。查找过程可以遵循相同的规则来查找数据。现在到达第一个位置仍然是 O(1),但是有一个潜在的(希望是短的)线性搜索可以在桌子上跳来跳去,直到找到你想要的数据。这称为“开放寻址”。
Basically, both approaches are still mostlyO(1) but with a hopefully-short linear sequence. We can assume for most purposes that it is O(1). If the hash table is getting too full, those linear searches can become longer and longer, and then it is time to "re-hash" which means to create a new hash table of a much bigger size and insert all the data back into it.
基本上,这两种方法仍然主要是O(1),但具有希望短的线性序列。对于大多数目的,我们可以假设它是 O(1)。如果哈希表变得太满,那些线性搜索会变得越来越长,然后是“重新哈希”的时候了,这意味着创建一个更大的新哈希表并将所有数据插入其中.
回答by TalentTuner
Searching takes O(1) time if there is no collisons in the hashtable , so it is incorrect to sya that searching in a hashtable takes O(1) or constant time.
如果哈希表中没有冲突,搜索需要 O(1) 时间,因此 sya 认为在哈希表中搜索需要 O(1) 或恒定时间是不正确的。
See how Hashtable works on MSDN?
EDIT
编辑
mgiuca explains it beautifully and i am just adding one more Collosion Avoidance technique which is called Chaining.
mgiuca 很好地解释了它,我只是添加了另一种称为链接的碰撞避免技术。
IN this technique , We maintain a linklist of values at each location so when you have a collosion , your value will be added to the Linklist at that position so when you are searching for a value there may be scenario that you need to search the value in whole link list so in that case searching will not be O(1) operation.
在这种技术中,我们在每个位置维护一个值的链接列表,因此当您发生碰撞时,您的值将被添加到该位置的链接列表中,因此当您搜索一个值时,可能会出现您需要搜索该值的情况。在整个链接列表中,因此在这种情况下搜索将不是 O(1) 操作。