java java中的哈希函数是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/3069709/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is a hash function in java?
提问by Mohit Deshpande
回答by polygenelubricants
The Wikipedia article will have a lot of technical information, but a simplistic view of hashing is something like the following.
维基百科文章将包含大量技术信息,但哈希的简单视图如下所示。
Imagine that there's a magical function that can give a number to any object. Given the same object, it always return the same number.
想象一下,有一个神奇的函数可以给任何对象赋值。给定相同的对象,它总是返回相同的数字。
Immediately now you have a quick way to test if two objects are the same: ask this function for their numbers and compare. If they're different, then they're not the same.
现在你有一个快速的方法来测试两个对象是否相同:向这个函数询问它们的数字并进行比较。如果它们不同,那么它们就不相同。
But what if they have the same number? Could two different objects have the same number?
但是如果他们有相同的数字呢?两个不同的物体可以有相同的数字吗?
Yes, this is possible in most scenario. Let's say that the function can only give numbers between 1..10, for example, and there are 100 different objects. Then of course some different objects must have the same number. This is what is called a "collision". A "collision" makes our quick equality test not as useful, so as much as possible we want to minimize its happening. A good magical function is one that would try to minimize the number of "collision".
是的,这在大多数情况下都是可能的。例如,假设该函数只能给出 1..10 之间的数字,并且有 100 个不同的对象。那么当然一些不同的对象必须具有相同的编号。这就是所谓的“碰撞”。“碰撞”使我们的快速相等测试没有用,因此我们希望尽可能减少它的发生。一个好的魔法函数是试图最小化“碰撞”的次数。
So what else can you do with this number? Well, you can use it to index an array. Given an object, you can put it at the index given by the number from this magical function. This array is essentially what a hashtable is; this magical function is a hash function.
那么你还能用这个数字做什么呢?好吧,你可以用它来索引一个数组。给定一个对象,你可以把它放在这个神奇函数的数字给定的索引处。这个数组本质上就是一个哈希表;这个神奇的函数是一个哈希函数。
回答by M. Jessup
A hash function is a way to create a compact representation of an arbitrarily large amount of data. In java with the hashcode method this means somehow describing the state of your object (no matter how large) in an int (4 bytes). And is usually written to be a fairly fast as explained below.
散列函数是一种创建任意大量数据的紧凑表示的方法。在带有 hashcode 方法的 java 中,这意味着以某种方式以 int(4 字节)描述对象(无论多大)的状态。并且通常写得相当快,如下所述。
To simplify in hashtables/hashmaps the hashcode serves as sort of a cheap equals. Take two objects a and b of type Foo lets says to figure out if a.equals(b) takes 500 ms where as calculating a (efficient) hashcode only take 10ms. So if we want to know if a.equals(b) instead of doing that directly first we will look at the hashcodes and ask does a.hashCode() == b.hashCode(). Note that this will take only 20ms in our example.
为了简化哈希表/哈希映射,哈希码充当一种廉价的等于。拿两个 Foo 类型的对象 a 和 b 来说,计算 a.equals(b) 是否需要 500 毫秒,而计算一个(有效的)哈希码只需要 10 毫秒。因此,如果我们想知道是否 a.equals(b) 而不是直接这样做,我们将查看哈希码并询问是否 a.hashCode() == b.hashCode()。请注意,在我们的示例中,这将只需要 20 毫秒。
Because of the API definition of hashcode we know that if the hashcode of a is not equal to b then a.equals(b) should never be true.So in our above test if we see the hashcodes are unequal then we never need to do the longer .equals() test, this is why you should always override hashCode and equals together.
由于哈希码的 API 定义,我们知道如果 a的哈希码不等于 b,则 a.equals(b) 永远不应该为真。所以在我们上面的测试中,如果我们看到哈希码不相等,那么我们永远不需要做更长的 .equals() 测试,这就是为什么你应该总是覆盖 hashCode 和 equals 在一起。
You may also see references about writing "good" or "well distributed" hashcodes. This has to do with the fact that the inverse of the previous statements about hashcode and equals is not true. More specifically a.hashCode() == b.hashCode() does not necessarily imply a.equals(b)So the idea of a good hashcode is you reduce the likelyhood of a.hashCode() == b.hashCode() when a.equals(b) is false. You may have seen this referred to as a collision of a hash function.
您可能还会看到有关编写“良好”或“分布良好”哈希码的参考资料。这与之前关于 hashcode 和 equals 的语句的逆不正确有关。更具体地说a.hashCode() == b.hashCode() 并不一定意味着 a.equals(b)所以一个好的哈希码的想法是你减少 a.hashCode() == b.hashCode() 当a.equals(b) 是假的。您可能已经看到这称为散列函数的冲突。
Back to hashmaps/tables. These are based on key/value pairs. So when you add or retrieve a value you will supply a key. So the first thing the map has to do is look for the key, which means finding something that .equals() the key you provide. But as we discussed above .equals() can be incredibly slow which means comparisons can be greatly sped up by checking hashcodes first. Since when the hashcodes are well distributed you should know quickly when x is definitely != y.
回到哈希映射/表。这些基于键/值对。因此,当您添加或检索一个值时,您将提供一个键。所以地图必须做的第一件事就是寻找键,这意味着找到 .equals() 你提供的键。但是正如我们上面讨论的, .equals() 可能非常慢,这意味着通过首先检查哈希码可以大大加快比较速度。因为当哈希码分布良好时,您应该很快知道 x 肯定是 != y。
Now in addition to the comparison hashmaps/tables actually use the hashcodes to organize their internal storage of the data, however I think that is beyond the scope of what you are looking to understand at this point.
现在除了比较哈希图/表之外,实际上还使用哈希码来组织数据的内部存储,但是我认为这超出了您此时想要了解的范围。
回答by Aashish meena
HASH FUNCTION:- A hash function takes a group of characters (called a key) and maps it to a value of a certain length (called a hash value or hash). The hash value is representative of the original string of characters, but is normally smaller than the original. Hashing is done for indexing and locating items in databases because it is easier to find the shorter hash value than the longer string. Hashing is also used in encryption.This term is also known as a hashing algorithm or message digest function.
散列函数:- 散列函数采用一组字符(称为键)并将其映射到特定长度的值(称为散列值或散列)。哈希值代表原始字符串,但通常小于原始字符串。散列用于索引和定位数据库中的项目,因为找到较短的散列值比较长的字符串更容易。散列也用于加密。该术语也称为散列算法或消息摘要函数。
HASH MAP:- HashMap is a collection class that is designed to store elements as key-value pairs. Maps provide a way of looking up one thing based on the value of another.
HASH MAP:- HashMap 是一个集合类,旨在将元素存储为键值对。地图提供了一种根据另一事物的价值查找事物的方法。


A lookup table that is designed to efficiently store non-contiguous keys (account numbers, part numbers, etc.) that may have wide gaps in their alphabetic or numeric sequences.
一个查找表,旨在有效地存储在字母或数字序列中可能有很大差距的非连续键(帐号、零件号等)。
HASH TABLE:- Hash tables are created with an algorithm that stores the keys into hash buckets, which contain key-value pairs. Since different keys may hash to the same bucket, the goal of hash table design is to spread out the key-value pairs evenly with each bucket containing as few key-value pairs as possible. When an item is looked up, its key is hashed to find the appropriate bucket, and the bucket is then compared to find the right key-value pair.
哈希表:- 哈希表是使用一种算法创建的,该算法将键存储到包含键值对的哈希桶中。由于不同的key可能会散列到同一个bucket中,hash表设计的目标是将key-value对均匀分布,每个bucket包含的key-value对越少越好。当一个项目被查找时,它的键被散列以找到合适的桶,然后比较桶以找到正确的键值对。


回答by folone
Thisbook (and supporting video lectures) provide some great explanation of algorithms and data structures. There are some lectures about hash functions (1, 2). I'd recommend that.
这本书(和支持视频讲座)对算法和数据结构提供了一些很好的解释。有一些关于散列函数的讲座 ( 1, 2)。我建议这样做。

(source: mit.edu)

(来源:mit.edu)
Also, just FYI, Not really true, as pointed by polygenelubricantsin comments.hashCode(), called on an instance of Objectclass returns an address of this particular instance in memory.
此外,仅供参考,正如评论中的polygenelubricants所指出的那样,事实并非如此。hashCode()在Object类的实例上调用会返回此特定实例在内存中的地址。
回答by Michael Borgwardt
A hashtable is basically a way to store anything in an array and retrieve it almost as fast as looking up something in an array via an index, without wasting too much space.
哈希表基本上是一种将任何内容存储在数组中并检索它的方法,其速度几乎与通过索引在数组中查找某些内容一样快,而且不会浪费太多空间。
The job of a hash function is (in this context) to compute the array index at which an object will be stored, based on the contents of the object. That means, it must always return the same result for the same object, and should return different results for different objects as much as possible. When two different objects have the same hash, it's called a "collision", and you have to treat those cases specially, which makes the whole thing slower.
散列函数的工作是(在此上下文中)根据对象的内容计算存储对象的数组索引。也就是说,它必须始终为同一个对象返回相同的结果,并且应该尽可能为不同的对象返回不同的结果。当两个不同的对象具有相同的哈希值时,称为“冲突”,您必须特别对待这些情况,这会使整个过程变慢。
回答by Jatinder Pal
The mapping of keys to indices of a hash table is called hash function. Hash function contains two parts
键到哈希表索引的映射称为哈希函数。哈希函数包含两部分
Hash Code Map: It converts keys to integer of any range.
哈希码映射:将键转换为任意范围的整数。
Compression Map: It converts(brings) these integers to the range of keys hashtable has.
压缩映射:它将这些整数转换(带来)到哈希表具有的键范围。
Taken from http://coder2design.com/hashing/
回答by ininprsr
Hash function: If you pass the same object to this function any number of times, be it text, binary or number, you always get the same output. For the hash table purposes an integer returning hash function is used.
哈希函数:如果您多次将同一个对象传递给该函数,无论是文本、二进制还是数字,您总是会得到相同的输出。出于哈希表的目的,使用了返回整数的哈希函数。
Above functionality is calling hashing.
以上功能是调用散列。
Hash table: Miracle data structure of computer science that returns search result in constant time or O(1). It is based on the above concept of hashing. So, it has better access time than linkedlist, binary search trees etc.
哈希表:计算机科学的奇迹数据结构,可以在恒定时间或 O(1) 内返回搜索结果。它基于上述哈希概念。因此,它比链表、二叉搜索树等具有更好的访问时间。
Why nearly O(1): It uses an array as its base structure internally to store the objects and as arrays have constant access time hence, the Hash table does too.
为什么接近 O(1):它在内部使用数组作为其基本结构来存储对象,并且由于数组具有恒定的访问时间,因此哈希表也是如此。
[Basic internal]: So, it uses an array of fixed size internally and when you insert an (Key, Value) pair, it calculates key's hash and uses this hash value as index to store the (Key,Value) pair in the array. Next, when you search for the object using the same key, it uses the hash of the key again as index to search for the key in the array. Now, two objects can have same hash value and hence, while inserting these objects in the hash table there will be collision. There are two ways for collision resolution. You can refer this linkfor a sufficiently detailed discussion on this topic.
[Basic internal]:所以,它在内部使用一个固定大小的数组,当你插入一个(Key, Value)对时,它计算key的hash,并使用这个hash值作为索引将(Key,Value)对存储在数组中. 接下来,当您使用相同的键搜索对象时,它会再次使用键的哈希作为索引来搜索数组中的键。现在,两个对象可以具有相同的哈希值,因此在将这些对象插入哈希表时会发生冲突。有两种解决冲突的方法。您可以参考此链接以获取有关此主题的足够详细的讨论。
回答by Dheeraj Kumar Singh
HashCode()function, which returns an integer value, is used by HashMapto find a correct bucket.
HashCode()函数返回一个整数值,用于HashMap查找正确的存储桶。

