Java hashcode() 字符串冲突

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9946671/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 23:00:27  来源:igfitidea点击:

Java hashcode() strings collision

javahashcodecollision

提问by ckd

I do not know much about hashcodes. I found this code which prints the collisions.

我对哈希码了解不多。我找到了这个打印碰撞的代码。

Can you please tell me what are collisions and how to reduce it? Why should we use hashcodes?

你能告诉我什么是碰撞以及如何减少碰撞吗?为什么要使用哈希码?

public static int getHash(String str, int limit)
{
    int hashCode = Math.abs(str.hashCode()%(limit));
    return hashCode;
}

/**
 * @param args
 */
public static void main(String[] args)
{
    int hashLimit = 10000;
    int stringsLimit = 10000;
    String[] arr = new String[hashLimit];
    List<String> test = new ArrayList<String>();
    Random r = new Random(2);
    for ( int i = 0 ; i < stringsLimit ; i++ )
    {
        StringBuffer buf = new StringBuffer("");
        for ( int j = 0 ; j < 10 ; j++ )
        {
            char c = (char)(35+60*r.nextDouble());
            buf.append(c);
        }
        test.add(buf.toString());
        //System.out.println(buf.toString());
    }
    int collisions = 0;
    for ( String curStr : test )
    {
        int hashCode = getHash(curStr,hashLimit);
        if ( arr[hashCode] != null && !arr[hashCode].equals(curStr) )
        {
            System.out.println("collision of ["+arr[hashCode]+"] ("+arr[hashCode].hashCode()+" = "+hashCode+") with ["+curStr+"] ("+curStr.hashCode()+" = "+hashCode+")");
            collisions++;
        }
        else
        {
            arr[hashCode] = curStr;
        }
    }
    System.out.println("Collisions: "+collisions);
}

回答by Jon Skeet

Can you please tell me what are collisions and how to reduce it?

你能告诉我什么是碰撞以及如何减少碰撞吗?

Collisions are when two non-equal objects have the same hash code. They're a fact of life - you need to deal with it.

冲突是指两个不相等的对象具有相同的哈希码。它们是生活中的事实——你需要处理它。

Why should we use hashcodes?

为什么要使用哈希码?

Because they make it quick to look up values by key, basically. A hash table can use a hash code to very quickly get the set of possible key matches down to a very small set(often just one), at which point you need to check for actualkey equality.

因为它们基本上可以快速按键查找值。哈希表可以使用哈希码非常快速地将可能的键匹配集缩小到一个非常小的集合(通常只有一个),此时您需要检查实际的键是否相等。

You should neverassume that two hash codes being equal means the objects they were derived from are equal. Only the reverse is true: assuming a correct implementation, if two objects give differenthash codes, then they are notequal.

永远不应该假设两个哈希码相等意味着它们派生自的对象是相等的。只有相反的情况才是正确的:假设一个正确的实现,如果两个对象给出不同的哈希码,那么它们就不相等。

回答by Adamski

To answer the other part of your question: To reduce the chance of collisions you should implement a hashing algorithm that provides an even distribution of hash codes over the set of possible inputs.

回答问题的另一部分:为了减少冲突的机会,您应该实施一种散列算法,该算法在可能的输入集上提供均匀分布的散列码。

For example, supposing you implemented a naive hashCode()method for hashing MyStringinstances:

例如,假设您hashCode()为散列MyString实例实现了一个简单的方法:

public class MyString {
  private final char[] arr;

  // Constructor and other methods.

  public int hashCode() {
    return arr.length == 0 ? 0 : (int) arr[0];
  }
}

In this example only the first characteris used to create the hash code. Therefore, if you were to hash the strings: "apple", "anaconda", "anecdote" they would all produce the same hash value. A more efficient hash code would inspect all the letters in the character array to determine a hash code value, which would hopefully reduce the chance of a collision.

在此示例中,仅使用第一个字符来创建哈希码。因此,如果您要散列字符串:“apple”、“anaconda”、“anecdote”,它们都会产生相同的散列值。更有效的哈希码将检查字符数组中的所有字母以确定哈希码值,这有望减少冲突的机会。

回答by Andreas Dolk

We have a "collision" if two differentnon-equalobjects have the same hashcode. This maybe a problem, for example when try to use both objects as keys in a Hashmap.

如果两个不同的不相等对象具有相同的哈希码,我们就会发生“冲突” 。这可能是一个问题,例如当尝试将两个对象用作 Hashmap 中的键时。