Java 字符串上 hashCode() 的一致性

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/785091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 19:26:49  来源:igfitidea点击:

Consistency of hashCode() on a Java string

javastringhashcode

提问by knorv

The hashCode value of a Java String is computed as (String.hashCode()):

Java 字符串的 hashCode 值计算为(String.hashCode()):

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

Are there any circumstances (say JVM version, vendor, etc.) under which the following expression will evaluate to false?

是否在任何情况下(例如 JVM 版本、供应商等)以下表达式的计算结果为 false?

boolean expression = "This is a Java string".hashCode() == 586653468

Update #1:If you claim that the answer is "yes, there are such circumstances" - then please give a concrete example of when "This is a Java string".hashCode() != 586653468. Try to be as specific/concrete as possible.

更新 #1:如果你声称答案是“是的,有这样的情况” - 那么请给出一个具体的例子,说明什么时候“这是一个 Java 字符串”.hashCode() != 586653468。尽量具体/具体尽可能。

Update #2:We all know that relying on the implementation details of hashCode() is bad in general. However, I'm talking specifically about String.hashCode() - so please keep the answer focused to String.hashCode(). Object.hashCode() is totally irrelevant in the context of this question.

更新 #2:我们都知道依赖 hashCode() 的实现细节通常是不好的。但是,我是专门讨论 String.hashCode() - 所以请把答案集中在 String.hashCode() 上。Object.hashCode() 在这个问题的上下文中完全无关。

采纳答案by Jon Skeet

I can see that documentation as far back as Java 1.2.

我可以看到早在 Java 1.2 的文档。

While it's true that in generalyou shouldn't rely on a hash code implementation remaining the same, it's now documented behaviour for java.lang.String, so changing it would count as breaking existing contracts.

虽然通常您不应该依赖保持不变的哈希代码实现,但它现在已记录为 的行为java.lang.String,因此更改它会被视为破坏现有合同。

Wherever possible, you shouldn't rely on hash codes staying the same across versions etc - but in my mind java.lang.Stringis a special case simply because the algorithm hasbeen specified... so long as you're willing to abandon compatibility with releases before the algorithm was specified, of course.

在可能的情况下,您不应该依赖在不同版本等之间保持相同的哈希码 - 但在我看来,这java.lang.String是一个特殊情况,因为算法被指定……只要您愿意放弃与发布之前的版本的兼容性当然,算法是指定的。

回答by Martin OConnor

You should not rely on a hash code being equal to a specific value. Just that it will return consistent results within the same execution. The API docs say the following :

您不应依赖于等于特定值的哈希码。只是它会在同一次执行中返回一致的结果。API 文档说明如下:

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

hashCode 的总合约为:

  • 每当在 Java 应用程序执行期间在同一对象上多次调用它时,hashCode 方法必须始终返回相同的整数,前提是在对象的 equals 比较中使用的信息没有被修改。该整数不需要从应用程序的一次执行到同一应用程序的另一次执行保持一致。

EDITSince the javadoc for String.hashCode() specifies how a String's hash code is computed, any violation of this would violate the public API specification.

编辑由于 String.hashCode() 的 javadoc 指定了如何计算字符串的哈希码,任何违反此规定的行为都将违反公共 API 规范。

回答by Brian Agnew

Another (!) issue to worry about is the possible change of implementation between early/late versions of Java. I don't believe the implementation details are set in stone, and so potentially an upgrade to a futureJava version could cause problems.

另一个需要担心的 (!) 问题是 Java 早期/晚期版本之间实现的可能变化。我不相信实现细节是一成不变的,因此升级到未来的Java 版本可能会导致问题。

Bottom line is, I wouldn't rely on the implementation of hashCode().

最重要的是,我不会依赖hashCode().

Perhaps you can highlight what problem you're actually trying to solve by using this mechanism, and that will highlight a more suitable approach.

也许您可以通过使用此机制突出显示您实际尝试解决的问题,这将突出显示更合适的方法。

回答by sleske

As said above, in general you should not rely on the hash code of a class remaining the same. Note that even subsequent runs of the same applicationon the same VMmay produce different hash values. AFAIK the Sun JVM's hash function calculates the same hash on every run, but that's not guaranteed.

如上所述,通常您不应该依赖保持不变的类的哈希码。请注意,即使随后在同一 VM上运行同一应用程序也可能产生不同的哈希值。AFAIK Sun JVM 的哈希函数在每次运行时计算相同的哈希值,但这并不能保证。

Note that this is not theoretical. The hash function for java.lang.String was changedin JDK1.2 (the old hash had problems with hierarchical strings like URLs or file names, as it tended to produce the same hash for strings which only differed at the end).

请注意,这不是理论上的。java.lang.String 的散列函数在 JDK1.2中发生了变化(旧的散列在诸如 URL 或文件名之类的分层字符串方面存在问题,因为它倾向于为字符串生成相同的散列,只是最后不同)。

java.lang.String is a special case, as the algorithm of its hashCode() is (now) documented, so you can probably rely on that. I'd still consider it bad practice. If you need a hash algorithm with special, documented properties, just write one :-).

java.lang.String 是一个特例,因为它的 hashCode() 算法(现在)已经记录在案,所以你可以依赖它。我仍然认为这是不好的做法。如果您需要具有特殊记录属性的哈希算法,只需编写一个:-)。

回答by ReneS

Just to answer your question and not to continue any discussions. The Apache Harmony JDK implementation seems to use a different algorithm, at least it looks totally different:

只是为了回答你的问题,而不是继续任何讨论。Apache Harmony JDK 实现似乎使用了不同的算法,至少看起来完全不同:

Sun JDK

太阳JDK

public int hashCode() {
    int h = hash;
    if (h == 0) {
        int off = offset;
        char val[] = value;
        int len = count;

        for (int i = 0; i < len; i++) {
            h = 31*h + val[off++];
        }
        hash = h;
    }
    return h;
}

Apache Harmony

阿帕奇和谐

public int hashCode() {
    if (hashCode == 0) {
        int hash = 0, multiplier = 1;
        for (int i = offset + count - 1; i >= offset; i--) {
            hash += value[i] * multiplier;
            int shifted = multiplier << 5;
            multiplier = shifted - multiplier;
        }
        hashCode = hash;
    }
    return hashCode;
}

Feel free to check it yourself...

自己去查查就好了。。。

回答by ReneS

I found something about JDK 1.0 and 1.1 and >= 1.2:

我发现了一些关于 JDK 1.0 和 1.1 以及 >= 1.2 的信息:

In JDK 1.0.x and 1.1.x the hashCode function for long Strings worked by sampling every nth character. This pretty well guaranteed you would have many Strings hashing to the same value, thus slowing down Hashtable lookup. In JDK 1.2 the function has been improved to multiply the result so far by 31 then add the next character in sequence. This is a little slower, but is much better at avoiding collisions. Source: http://mindprod.com/jgloss/hashcode.html

在 JDK 1.0.x 和 1.1.x 中,长字符串的 hashCode 函数通过每第 n 个字符进行采样来工作。这很好地保证了您将有许多字符串散列到相同的值,从而减慢 Hashtable 查找速度。在 JDK 1.2 中,该函数已改进为将迄今为止的结果乘以 31,然后按顺序添加下一个字符。这有点慢,但在避免碰撞方面要好得多。来源:http: //mindprod.com/jgloss/hashcode.html

Something different, because you seem to need a number: How about using CRC32 or MD5 instead of hashcode and you are good to go - no discussions and no worries at all...

有所不同,因为您似乎需要一个数字:如何使用 CRC32 或 MD5 而不是哈希码,您很高兴 - 无需讨论,也无需担心...

回答by Sam Barnum

If you're worried about changes and possibly incompatibly VMs, just copy the existing hashcode implementation into your own utility class, and use that to generate your hashcodes .

如果您担心更改和可能不兼容的 VM,只需将现有的哈希码实现复制到您自己的实用程序类中,然后使用它来生成您的哈希码。

回答by Lourdes

The hashcode will be calculated based on the ASCII values of the characters in the String.

哈希码将根据字符串中字符的 ASCII 值计算。

This is the implementation in the String Class is as follows

这是在String类中的实现如下

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        hash = h = isLatin1() ? StringLatin1.hashCode(value)
                              : StringUTF16.hashCode(value);
    }
    return h;
}

Collisions in hashcode are unavoidable. For example, the strings "Ea" and "FB" give the same hashcode as 2236

哈希码中的冲突是不可避免的。例如,字符串“Ea”和“FB”给出与 2236 相同的哈希码