关于在 Java 中实现我自己的 HashMap 的问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14568631/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 16:43:29  来源:igfitidea点击:

Questions about implementing my own HashMap in Java

javahashmap

提问by Per John

I am working on an assignment where I have to implement my own HashMap. In the assignment text it is being described as an Array of Lists, and whenever you want to add an element the place it ends up in the Array is determined by its hashCode. In my case it is positions from a spreadsheet, so I have just taken columnNumber + rowNumber and then converted that to a String and then to an int, as the hashCode, and then I insert it that place in the Array. It is of course inserted in the form of a Node(key, value), where the key is the position of the cell and the value is the value of the cell.

我正在完成一项必须实现自己的 HashMap 的作业。在赋值文本中,它被描述为一个列表数组,无论何时你想添加一个元素,它在数组中的位置由它的 hashCode 决定。在我的例子中,它是来自电子表格的位置,所以我刚刚取了 columnNumber + rowNumber,然后将其转换为 String,然后转换为 int,作为 hashCode,然后我将它插入到数组中的那个位置。当然是以Node(key, value)的形式插入的,key是cell的位置,value是cell的值。

But I must say I do not understand why we need an Array of Lists, because if we then end up with a list with more than one element, will it not increase the look up time quite considerably? So should it not rather be an Array of Nodes?

但是我必须说我不明白为什么我们需要一个列表数组,因为如果我们最终得到一个包含多个元素的列表,它会不会大大增加查找时间?那么它不应该是一个节点数组吗?

Also I have found this implementation of a HashMap in Java:

我还发现了 Java 中 HashMap 的这种实现:

public class HashEntry {
      private int key;
      private int value;

      HashEntry(int key, int value) {
            this.key = key;
            this.value = value;
      }     

      public int getKey() {
            return key;
      }

      public int getValue() {
            return value;
      }
}

public class HashMap {
  private final static int TABLE_SIZE = 128;

  HashEntry[] table;

  HashMap() {
        table = new HashEntry[TABLE_SIZE];
        for (int i = 0; i < TABLE_SIZE; i++)
              table[i] = null;
  }

  public int get(int key) {
        int hash = (key % TABLE_SIZE);
        while (table[hash] != null && table[hash].getKey() != key)
              hash = (hash + 1) % TABLE_SIZE;
        if (table[hash] == null)
              return -1;
        else
              return table[hash].getValue();
  }

  public void put(int key, int value) {
        int hash = (key % TABLE_SIZE);
        while (table[hash] != null && table[hash].getKey() != key)
              hash = (hash + 1) % TABLE_SIZE;
        table[hash] = new HashEntry(key, value);
  }
}

So is it correct that the put method, looks first at the table[hash], and if that is not empty and if what is in there has not got the key, being inputted in the method put, then it moves on to table[(hash + 1) % TABLE_SIZE]. But if it is the same key it simply overwrites the value. So is that correctly understood? And is it because the get and put method use the same method of looking up the place in the Array, that given the same key they would end up at the same place in the Array?

那么 put 方法首先查看 table[hash] 是否正确,如果它不是空的,如果里面的东西没有得到键,在方法 put 中输入,那么它移动到 table[ (哈希 + 1)% TABLE_SIZE]。但如果它是相同的键,它只会覆盖该值。那么这样理解正确吗?是不是因为 get 和 put 方法使用相同的方法查找数组中的位置,所以给定相同的键,它们最终会出现在数组中的相同位置?

I know these questions might be a bit basic, but I have spend quite some time trying to get this sorted out, why any help would be much appreciated!

我知道这些问题可能有点基础,但我花了很多时间试图解决这个问题,为什么任何帮助都会受到赞赏!

Edit

编辑

So now I have tried implementing the HashMap myself via a Node class, which just constructs a node with a key and a corresponding value, it has also got a getHashCode method, where I just concatenate the two values on each other.

所以现在我尝试通过 Node 类自己实现 HashMap,它只构造一个带有键和相应值的节点,它还有一个 getHashCode 方法,我只是将两个值相互连接起来。

I have also constructed a SinglyLinkedList (part of a previous assignment), which I use as the bucket.

我还构建了一个 SinglyLinkedList(先前任务的一部分),用作存储桶。

And my Hash function is simply hashCode % hashMap.length.

而我的哈希函数只是 hashCode % hashMap.length。

Here is my own implementation, so what do you think of it?

这是我自己的实现,你怎么看?

package spreadsheet; 

public class HashTableMap {

  private SinglyLinkedListMap[] hashArray;
  private int size;


  public HashTableMap() {
    hashArray = new SinglyLinkedListMap[64];
    size = 0;  
  }


  public void insert(final Position key, final Expression value) {

      Node node = new Node(key, value); 
      int hashNumber = node.getHashCode() % hashArray.length;       
      SinglyLinkedListMap bucket = new SinglyLinkedListMap();
      bucket.insert(key, value);
      if(hashArray[hashNumber] == null) {
        hashArray[hashNumber] = bucket;
        size++; 
      }
      if(hashArray[hashNumber] != null) {
        SinglyLinkedListMap bucket2 = hashArray[hashNumber];
        bucket2.insert(key, value);
        hashArray[hashNumber] = bucket2;
        size++; 
      }
      if (hashArray.length == size) {
          SinglyLinkedListMap[] newhashArray = new SinglyLinkedListMap[size * 2];
      for (int i = 0; i < size; i++) {
          newhashArray[i] = hashArray[i];
      }
      hashArray = newhashArray;
    }
  } 

  public Expression lookUp(final Position key) {
      Node node = new Node(key, null); 
      int hashNumber = node.getHashCode() % hashArray.length;
      SinglyLinkedListMap foundBucket = hashArray[hashNumber];
      return foundBucket.lookUp(key); 
  }
 }


The look up time should be around O(1), so I would like to know if that is the case? And if not how can I improve it, in that regard?

查找时间应该在 O(1) 左右,所以我想知道是否是这种情况?如果不是,我该如何改进它,在这方面?

回答by Patricia Shanahan

You have to have some plan to deal with hash collisions, in which two distinct keys fall in the same bucket, the same element of your array.

您必须制定一些计划来处理哈希冲突,其中两个不同的键落在同一个桶中,即数组的同一个元素。

One of the simplest solutions is to keep a list of entries for each bucket.

最简单的解决方案之一是为每个存储桶保留一个条目列表。

If you have a good hashing algorithm, and make sure the number of buckets is bigger than the number of elements, you should end up with most buckets having zero or one items, so the list search should not take long. If the lists are getting too long it is time to rehash with more buckets to spread the data out.

如果你有一个好的散列算法,并确保桶的数量大于元素的数量,你应该最终得到大多数桶有零个或一个项目,所以列表搜索应该不会花很长时间。如果列表变得太长,则是时候用更多的桶重新散列以分散数据。

回答by Daniel Kaplan

It really depends on how good your hashcode method is. Lets say you tried to make it as bad as possible: You made hashcode return 1 every time. If that were the case, you'd have an array of lists, but only 1 element of the array would have any data in it. That element would just grow to have a huge list in it.

这实际上取决于您的哈希码方法有多好。假设您试图让它尽可能糟糕:您每次都让哈希码返回 1。如果是这种情况,您将拥有一个列表数组,但该数组中只有 1 个元素会包含任何数据。该元素只会增长到在其中包含一个巨大的列表。

If you did that, you'd have a really inefficient hashmap. But, if your hashcode were a little better, it'd distribute the objects into many different array elements and as a result it'd be much more efficient.

如果你这样做了,你就会得到一个非常低效的哈希图。但是,如果您的哈希码更好一点,它会将对象分布到许多不同的数组元素中,因此效率会更高。

The most ideal case (which often isn't achievable) is to have a hashcode method that returns a unique number no matter what object you put into it. If you could do that, you wouldn't ever need an array of lists. You could just use an array. But since your hashcode isn't "perfect" it's possible for two different objects to have the same hashcode. You need to be able to handle that scenario by putting them in a list at the same array element.

最理想的情况(通常无法实现)是拥有一个哈希码方法,无论您放入什么对象,该方法都会返回唯一的数字。如果你能做到这一点,你就永远不需要列表数组。你可以只使用一个数组。但是由于您的哈希码并不“完美”,因此两个不同的对象可能具有相同的哈希码。您需要能够通过将它们放在同一数组元素的列表中来处理这种情况。

But, if your hashcode method was "pretty good" and rarely had collisions, you rarely would have more than 1 element in the list.

但是,如果您的哈希码方法“非常好”并且很少发生冲突,那么列表中的元素很少会超过 1 个。

回答by Joop Eggen

class SpreadSheetPosition {
    int column;
    int row;

    @Override
    public int hashCode() {
        return column + row;
    }
}

class HashMap {
    private Liat[] buckets = new List[N];

    public void put(Object key, Object value) {
        int keyHashCode = key.hashCode();
        int bucketIndex = keyHashCode % N;
        ...
    }
}

Compare having N lists, with having just one list/array. For searching in a list one has to traverse possibly the entire list. By using an array of lists, one at least reduces the single lists. Possibly even getting a list of one or zero elements (null).

比较有 N 个列表和只有一个列表/数组。为了在列表中搜索,必须遍历整个列表。通过使用列表数组,至少可以减少单个列表。甚至可能获得一个或零个元素(空)的列表。

If the hashCode()is as unique as possible the chance for an immediate found is high.

如果hashCode()尽可能独特,立即找到的机会就很高。

回答by Mel Nicholson

The Listsare often referred to as buckets and are a way of dealing with collisions. When two data elements have the same hash code mod TABLE SIZE they collide, but both must be stored.

Lists通常被称为桶,并处理冲突的方式。当两个数据元素具有相同的哈希码 mod TABLE SIZE 时,它们会发生冲突,但两者都必须存储。

A worse kind of collision is two different data point having the same key-- this is disallowed in hash tables and one will overwrite the others. If you just add row to column, then (2,1) and (1,2) will both have a key of 3, which means they cannot be stored in the same hash table. If you concatenated the strings together without a separator then the problem is with (12,1) versus (1, 21) --- both have key "121" With a separator (such as a comma) all the keys will be distinct.

一种更糟糕的冲突是两个不同的数据点具有相同的key- 这在哈希表中是不允许的,一个会覆盖其他的。如果只是向列添加行,则 (2,1) 和 (1,2) 的键都为 3,这意味着它们不能存储在同一个哈希表中。如果您在没有分隔符的情况下将字符串连接在一起,那么问题在于 (12,1) 与 (1, 21) ---两者都有键 "121" 使用分隔符(例如逗号),所有键都将不同。

Distinct keys can land in the same buck if there hashcodes are the same mod TABLE_SIZE. Those lists are one way to store the two values in the same bucket.

如果哈希码是相同的 mod TABLE_SIZE,则不同的键可以落在同一个 buck 中。这些列表是将两个值存储在同一个存储桶中的一种方式。