Java 使用哈希码作为唯一 ID

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21368492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:19:57  来源:igfitidea点击:

Using hashcode for a unique ID

javauniquehashcode

提问by IcedDante

I am working in a java-based system where I need to set an id for certain elements in the visual display. One category of elements is Strings, so I decided to use the String.hashCode() method to get a unique identifier for these elements.

我在一个基于 java 的系统中工作,我需要为视觉显示中的某些元素设置一个 id。一类元素是字符串,因此我决定使用 String.hashCode() 方法来获取这些元素的唯一标识符。

The problem I ran into, however, is that the system I am working in borks if the id is negative and String.hashCodeoften returns negative values. One quick solution is to just use Math.abs() around the hashcode call to guarantee a positive result. What I was wondering about this approach is what are the chances of two distinct elements having the same hashcode?

然而,我遇到的问题是,如果 id 为负并且String.hashCode经常返回负值,那么我在 borks 中工作的系统。一种快速的解决方案是只在哈希码调用周围使用 Math.abs() 来保证肯定的结果。我想知道这种方法是两个不同元素具有相同哈希码的机会是多少?

For example, if one string returns a hashcode of -10 and another string returns a hashcode of 10 an error would occur. In my system we're talking about collections of objects that aren't more than 30 elements large typically so I don't think this would really be an issue, but I am curious as to what the math says.

例如,如果一个字符串返回的哈希码为 -10,而另一个字符串返回的哈希码为 10,则会发生错误。在我的系统中,我们讨论的是通常不超过 30 个元素的对象集合,所以我认为这不是一个真正的问题,但我对数学所说的内容感到好奇。

采纳答案by Bohemian

Hash codes can be thought of as pseudo-random numbers. Statistically, with a positive inthash code the chance of a collision between any two elements reaches 50% when the population size is about 54K (and 77K for anyint). See Birthday Problem Probability Tablefor collision probabilities of various hash code sizes.

哈希码可以被认为是伪随机数。从统计int上讲,使用正哈希码时,当总体大小约为 54K(任何77K int)时,任何两个元素之间发生冲突的机会达到 50% 。有关各种哈希码大小的冲突概率,请参阅生日问题概率表

Also, your idea to use Math.abs()alone is flawed: It does not always return a positive number! In 2's compliment arithmetic, the absolute value of Integer.MIN_VALUEis itself! Famously, the hash code of "polygenelubricants"is this value.

此外,您Math.abs()单独使用的想法是有缺陷的:它并不总是返回正数!在 2 的补码运算中, 的绝对值Integer.MIN_VALUE就是它本身!众所周知,的哈希码"polygenelubricants"就是这个值。

回答by Denys Séguret

You already can get two strings with the same hashcode. This should be obvious if you think that you have an infinite number of strings and only 2^32 possible hashcodes.

您已经可以获得具有相同哈希码的两个字符串。如果您认为您有无限数量的字符串并且只有 2^32 个可能的哈希码,那么这应该是显而易见的。

You just make it a little more probable when taking the absolute value. The risk is small but if you needan unique id, this isn't the right approach.

在取绝对值时,你只是让它更有可能。风险很小,但如果您需要一个唯一的 id,这不是正确的方法。

回答by jb.

Hashes are not unique, hence they are not apropriate for uniqueId.

哈希不是唯一的,因此它们不适合uniqueId

As to probability of hash collision, you could read about birthday paradox. Actually (from what I recall) when drawing from an uniform distribution of N values, you should expect collision after drawing $\sqrt(N)$ (you could get collision much earlier). The problem is that Java's implementation of hashCode(and especially when hashing short strings) doesnt provide uniform distribution, so you'll get collision much earlier.

至于哈希冲突的概率,你可以阅读生日悖论。实际上(根据我的回忆)当从 N 值的均匀分布中绘制时,您应该在绘制 $\sqrt(N)$ 后期待碰撞(您可能会更早地发生碰撞)。问题是 Java 的实现hashCode(尤其是在散列短字符串时)不提供均匀分布,因此您会更早地发生冲突。

回答by Dakkaron

What you can do when you only have 30-50 values as you said is register each String you get into an HashMap together with a running counter as value:

当您只有 30-50 个值时,您可以做的是将您进入 HashMap 的每个字符串与正在运行的计数器一起注册为值:

HashMap StringMap = new HashMap<String,Integer>();

StringMap.add("Test",1);
StringMap.add("AnotherTest",2);

You can then get your unique ID by calling this:

然后,您可以通过调用以下方法获取您的唯一 ID:

StringMap.get("Test"); //returns 1