java Enum.hashCode() 背后的原因是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4885095/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the reason behind Enum.hashCode()?
提问by maaartinus
The method hashCode() in class Enum is final and defined as super.hashCode(), which means it returns a number based on the address of the instance, which is a random number from programmers POV.
类 Enum 中的 hashCode() 方法是 final 的,定义为 super.hashCode(),这意味着它根据实例的地址返回一个数字,该数字是程序员 POV 的随机数。
Defining it e.g. as ordinal() ^ getClass().getName().hashCode()
would be deterministic across different JVMs. It would even work a bit better, since the least significant bits would "change as much as possible", e.g., for an enum containing up to 16 elements and a HashMap of size 16, there'd be for sure no collisions (sure, using an EnumMap is better, but sometimes not possible, e.g. there's no ConcurrentEnumMap). With the current definition you have no such guarantee, have you?
将其定义为例如ordinal() ^ getClass().getName().hashCode()
跨不同 JVM 的确定性。它甚至会更好地工作,因为最低有效位会“尽可能多地改变”,例如,对于包含多达 16 个元素的枚举和大小为 16 的 HashMap,肯定不会有冲突(当然,使用 EnumMap 更好,但有时不可能,例如没有 ConcurrentEnumMap)。根据当前的定义,您没有这样的保证,是吗?
Summary of the answers
答案摘要
Using Object.hashCode()
compares to a nicer hashCode like the one above as follows:
使用Object.hashCode()
与上面类似的更好的 hashCode 进行比较,如下所示:
- PROS
- simplicity
- CONTRAS
- speed
- more collisions (for any size of a HashMap)
- non-determinism, which propagates to other objects making them unusable for
- deterministic simulations
- ETag computation
- hunting down bugs depending e.g. on a
HashSet
iteration order
- 优点
- 简单
- 对比
- 速度
- 更多冲突(对于任何大小的 HashMap)
- 非确定性,它传播到其他对象,使它们无法用于
- 确定性模拟
- ETag计算
- 根据
HashSet
迭代顺序寻找错误
I'd personally prefer the nicer hashCode, but IMHO no reason weights much, maybe except for the speed.
我个人更喜欢更好的 hashCode,但恕我直言,没有理由权重太多,也许除了速度。
UPDATE
更新
I was curious about the speed and wrote a benchmarkwith surprising results. For a price of a single field per class you can a deterministic hash code which is nearly four times faster. Storing the hash code in each field would be even faster, although negligibly.
我对速度很好奇,写了一个基准测试,结果出人意料。对于每类单个字段的价格,您可以获得快近四倍的确定性哈希码。将哈希码存储在每个字段中会更快,尽管可以忽略不计。
The explanation why the standard hash code is not much faster is that it can't be the object's address as objects gets moved by the GC.
标准散列码没有快多少的解释是,当对象被 GC 移动时,它不能是对象的地址。
UPDATE 2
更新 2
There are some strange things going onwith the hashCode
performance in general. When I understand them, there's still the open question, why System.identityHashCode
(reading from the object header) is way slower than accessing a normal object field.
总体而言,表演中发生了一些奇怪的事情hashCode
。当我理解它们时,仍然有一个悬而未决的问题,为什么System.identityHashCode
(从对象头读取)比访问普通对象字段慢得多。
采纳答案by JB Nizet
I think that the reason they made it final is to avoid developers shooting themselves in the foot by rewriting a suboptimal (or even incorrect) hashCode.
我认为他们最终确定的原因是为了避免开发人员通过重写一个次优(甚至不正确)的 hashCode 来打自己的脚。
Regarding the chosen implementation: it's not stable across JVMs, but it's very fast, avoid collisions, and doesn't need an additional field in the enum. Given the normally small number of instances of an enum class, and the speed of the equals method, I wouldn't be surprised if the HashMap lookup time was bigger with your algorithm than with the current one, due to its additional complexity.
关于所选的实现:它在 JVM 中不稳定,但速度非常快,避免冲突,并且不需要枚举中的额外字段。考虑到 enum 类的实例数量通常很少,并且 equals 方法的速度很快,如果您的算法的 HashMap 查找时间比当前算法大,我不会感到惊讶,因为它具有额外的复杂性。
回答by aioobe
The only reason for using Object's hashCode() and for making it final I can imagine, is to make me ask this question.
使用 Object 的 hashCode() 并使其成为最终的我可以想象的唯一原因是让我问这个问题。
First of all, you should not rely on such mechanisms for sharing objects between JVMs. That's simply not a supported use case. When you serialize / deserialize you should rely on your own comparison mechanisms or only "compare" the results against objects within your own JVM.
首先,您不应该依赖这种在 JVM 之间共享对象的机制。这根本不是受支持的用例。当您序列化/反序列化时,您应该依赖自己的比较机制,或者仅将结果与您自己的 JVM 中的对象进行“比较”。
The reason for letting enums hashCode
be implemented as Objects
hash code (based on identity) is because, within one JVM there will only be one instance of each enum object. This is enough to ensure that such implementation makes sense and is correct.
让枚举hashCode
实现为Objects
哈希码(基于身份)的原因是,在一个 JVM 中,每个枚举对象只有一个实例。这足以确保此类实施有意义且正确。
You could argue like "Hey, String and the wrappers for the primitives (Long, Integer, ...) all have well defined, deterministic, specifications of hashCode
! Why doesn't the enums have it?", Well, to begin with, you can have several distinct string references representing the same string which means that using super.hashCode
would be an error, so these classes necessarily need their own hashCode implementations. For these core classes it made sense to let them have well-defined deterministic hashCodes.
您可能会争辩说“嘿,字符串和原语(Long、Integer 等)的包装器都有明确定义的、确定性的、规范的hashCode
!为什么枚举没有它?” , 首先,您可以有多个不同的字符串引用来表示同一个字符串,这意味着使用super.hashCode
会出错,因此这些类必然需要它们自己的 hashCode 实现。对于这些核心类,让它们具有明确定义的确定性哈希码是有意义的。
Whydid they choose to solve it like this?
他们为什么选择这样解决?
Well, look at the requirements of the hashCode
implementation. The main concern is to make sure that each object should return a distincthash code (unless it is equal to another object). The identity-based approach is super efficient and guarantees this, while your suggestion does not. This requirement is apparently stronger than any "convenience bonus" about easing up on serialization etc.
好吧,看看实现的要求hashCode
。主要关心的是确保每个对象都应该返回一个不同的哈希码(除非它等于另一个对象)。基于身份的方法非常有效并能保证这一点,而您的建议则不然。这个要求显然比任何关于简化序列化等的“便利奖励”都要强。
回答by mavarazy
I've asked the same question, because did not saw this one. Why in Enum hashCode() refers to the Object hashCode() implementaion, instead of ordinal() function?
我问过同样的问题,因为没有看到这个。为什么在 Enum 中 hashCode() 指的是 Object hashCode() 实现,而不是 ordinal() 函数?
I encountered it as a sort of a problem, when defining my own hash function, for an Object relying on enum hashCode as one of the composites. When checking a value in a Set of Objects, returned by the function, I checked them in an order, which I would expect it to be the same, since the hashCode I define myself, and so I expect elements to fall at the same nodes on the tree, but since hashCode returned by enum changes from start to start, this assumption was wrong, and test could fail once in a while.
我在定义我自己的哈希函数时遇到了一个问题,对于依赖枚举 hashCode 作为组合之一的对象。在检查由函数返回的一组对象中的值时,我按顺序检查它们,我希望它是相同的,因为我自己定义了 hashCode,因此我希望元素落在相同的节点上在树上,但是由于 enum 返回的 hashCode 从头到尾都发生了变化,这个假设是错误的,测试可能偶尔会失败。
So, when I figured out the problem, I started using ordinal instead. I am not sure everyone writing hashCode for their Object realize this.
所以,当我发现问题时,我开始使用 ordinal 代替。我不确定为他们的对象编写 hashCode 的每个人都意识到这一点。
So basically, you can't define your own deterministic hashCode, while relying on enum hashCode, and you need to use ordinal instead
所以基本上,你不能定义你自己的确定性 hashCode,而依赖于 enum hashCode,你需要使用 ordinal 代替
P.S. This was too big for a comment :)
PS 这对于评论来说太大了:)
回答by pnt
The JVM enforcesthat for an enum constant, only one object will exist in memory. There is no way that you could end up with two different instance objects of the same enum constant within a single VM, not with reflection, not across the network via serialization/deserialization.
JVM强制规定,对于枚举常量,内存中将只存在一个对象。您不可能在单个 VM 中获得相同枚举常量的两个不同实例对象,而不是通过反射,而不是通过序列化/反序列化跨网络。
That being said, since it is the only object to represent this constant, it doesn't matter that its hascode is its address since no other object can occupy the same address space at the same time. It is guaranteed to be unique & "deterministic" (in the sense that in the same VM, in memory, all objects will have the same reference, no matter what it is).
话虽如此,因为它是唯一表示这个常量的对象,所以它的 hascode 是它的地址并不重要,因为没有其他对象可以同时占用相同的地址空间。它保证是唯一的和“确定性的”(从某种意义上说,在同一个 VM 中,在内存中,所有对象都将具有相同的引用,无论它是什么)。
回答by Mirko Klemm
One more reason that it is implemented like this I could imagine is because of the requirement for hashCode() and equals() to be consistent, and for the design goal of Enums that they sould be simple to use and compile-time constant (to use them is "case" constants). This also makes it legal to compare enum instances with "==", and you simply wouldn't want "equals" to behave differntly from "==" for enums. This again ties hashCode to the default Object.hashCode() reference-based behavior. As said before, I also don't expect equals() and hashCode() to consider two enum constants from different JVM as being equal. When talking about serialization: For instance fields typed as enums the default binary serializer in Java has a special behaviour that serializess only the name of the constant, and on deserialization the reference to the corresponding enum value in the de-serializing JVM is re-created. JAXB and other XML-based serialization mechanisms work in a similar way. So: just don't worry
我可以想象它像这样实现的另一个原因是因为要求 hashCode() 和 equals() 保持一致,并且为了 Enums 的设计目标,它们应该易于使用和编译时常量(以使用它们是“case”常量)。这也使得将枚举实例与“==”进行比较是合法的,并且您根本不希望“等于”的行为与枚举的“==”不同。这再次将 hashCode 与默认的 Object.hashCode() 基于引用的行为联系起来。如前所述,我也不希望 equals() 和 hashCode() 将来自不同 JVM 的两个枚举常量视为相等。在谈论序列化时:例如,作为枚举类型的字段,Java 中的默认二进制序列化程序有一个特殊的行为,即仅序列化常量的名称,并且在反序列化时,会重新创建对反序列化 JVM 中相应枚举值的引用。JAXB 和其他基于 XML 的序列化机制以类似的方式工作。所以:别担心
回答by OrangeDog
There is no requirement for hash codes to be deterministic between JVMs and no advantage gained if they were. If you are relying on this fact you are using them wrong.
不需要哈希码在 JVM 之间是确定性的,如果是,则不会获得任何优势。如果您依赖于这个事实,那么您就错误地使用了它们。
As only one instance of each enum value exists, Object.hashcode()
is guaranteed never to collide, is good code reuse and is very fast.
由于每个枚举值只存在一个实例,Object.hashcode()
因此保证永远不会发生冲突,是很好的代码重用并且非常快。
If equality is defined by identity, then Object.hashcode()
will always give the best performance.
如果相等是由同一性定义的,那么Object.hashcode()
将始终提供最佳性能。
The determinism of other hash codes is just a side effect of their implementation. As their equality is usually defined by field values, mixing in non-deterministic values would be a waste of time.
其他哈希码的确定性只是它们实现的副作用。由于它们的相等性通常由字段值定义,因此混合非确定性值会浪费时间。
回答by Andreas Dolk
As long as we can't send an enum object1to a different JVM I see no reason for putting such a requirements on enums (and objects in general)
只要我们不能将枚举对象1发送到不同的 JVM,我就认为没有理由对枚举(以及一般的对象)提出这样的要求
1I thought it was clear enough - an objectis an instance of a class. A serialized objectis a sequence of bytes, usually stored in a byte array. I was talking about an object.
1我认为这已经足够清楚了——一个对象是一个类的实例。甲序列化对象是一个字节序列,通常是存储在一个字节数组。我说的是一个对象。