java 为什么 ArrayList 以 1.5 的速度增长,而 Hashmap 却是 2?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5040753/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 09:09:10  来源:igfitidea点击:

Why ArrayList grows at a rate of 1.5, but for Hashmap it's 2?

javaarraylisthashmap

提问by Arnab Biswas

As per Sun Java Implementation, during expansion, ArrayList grows to 3/2 it's initial capacity whereas for HashMap the expansion rate is double. What is reason behind this?

根据 Sun Java 实现,在扩展期间,ArrayList 增长到初始容量的 3/2,而对于 HashMap,扩展率是原来的两倍。这背后的原因是什么?

As per the implementation, for HashMap, the capacity should always be in the power of two. That may be a reason for HashMap's behavior. But in that case the question is, for HashMap why the capacity should always be in power of two?

根据实现,对于 HashMap,容量应始终为 2 的幂。这可能是 HashMap 行为的原因。但在那种情况下,问题是,对于 HashMap 为什么容量应该总是 2 的幂?

采纳答案by Andreas Dolk

The expensive part at increasing the capacity of an ArrayList is copying the content of the backing array a new (larger) one.

增加 ArrayList 容量的昂贵部分是将后备数组的内容复制一个新的(更大的)。

For the HashMap, it is creating a new backing array and puttingall map entries in the new array. And, the higher the capacity, the lower the risk of collisions. This is more expensive and explains, why the expansion factor is higher. The reason for 1.5 vs. 2.0? I consider this as "best practise" or "good tradeoff".

对于 HashMap,它正在创建一个新的支持数组并将所有映射条目放入新数组中。而且,容量越高,发生碰撞的风险就越低。这更昂贵并解释了为什么膨胀系数更高。1.5 与 2.0 的原因?我认为这是“最佳实践”或“良好的权衡”。

回答by Ishtar

for HashMap why the capacity should always be in power of two?

对于 HashMap 为什么容量应该总是 2 的幂?

I can think of two reasons.

我能想到两个原因。

  1. You can quickly determine the bucket a hashcode goes in to. You only need a bitwise AND and no expensive modulo. int bucket = hashcode & (size-1);

  2. Let's say we have a grow factor of 1.7. If we start with a size 11, the next size would be 18, then 31. No problem. Right? But the hashcodes of Strings in Java, are calculated with a prime factor of 31. The bucket a string goes into,hashcode%31, is then determined only by the last character of the String. Bye bye O(1)if you store folders that all end in /. If you use a size of, for example, 3^n, the distribution will not get worse if you increase n. Going from size 3to 9, every element in bucket 2, will now go to bucket 2,5or 7, depending on the higher digit. It's like splitting each bucket in three pieces. So a size of integer growth factor would be preferred. (Off course this all depends on how you calculate hashcodes, but a arbitrary growth factor doesn't feel 'stable'.)

  1. 您可以快速确定哈希码进入的存储桶。您只需要按位 AND 而不需要昂贵的模。int bucket = hashcode & (size-1);

  2. 假设我们的增长因子为 1.7。如果我们从 11 码开始,下一个码是 18 码,然后是 31 码。没问题。对?但是 Java 中字符串的哈希码是用质因数 31 计算的。字符串进入的桶hashcode%31,然后仅由字符串的最后一个字符确定。再见,O(1)如果您存储的文件夹都以/. 例如,如果您使用 的大小,则如果增加3^n分布不会变得更糟n。从 size39,bucket 中的每个元素2现在都将转到 bucket 25或者7, 取决于较高的数字。这就像将每个桶分成三块。因此,整数增长因子的大小将是首选。(当然,这一切都取决于您如何计算哈希码,但任意增长因子感觉并不“稳定”。)

回答by Peter Lawrey

The way HashMap is designed/implemented its underlying number of buckets must be a power of 2 (even if you give it a different size, it makes it a power of 2), thus it grows by a factor of two each time. An ArrayList can be any size and it can be more conservative in how it grows.

HashMap 的设计/实现方式,其底层的桶数必须是 2 的幂(即使你给它一个不同的大小,它也会使它成为 2 的幂),因此它每次增长两倍。ArrayList 可以是任意大小,并且它的增长方式可以更加保守。

回答by samsonbek

The accepted answer is not actually giving exact response to the question, but comment from @user837703 to that answer is clearly explaining why HashMap grows by power of two.

接受的答案实际上并没有给出对该问题的确切答复,但是@user837703 对该答案的评论清楚地解释了 HashMap 以 2 的幂增长的原因。

I found this article, which explains it in detail http://coding-geek.com/how-does-a-hashmap-work-in-java/

我找到了这篇文章,它详细解释了http://coding-geek.com/how-does-a-hashmap-work-in-java/

Let me post fragment of it, which gives detailed answer to the question:

让我贴出它的片段,它给出了问题的详细答案:

// the function that returns the index of the bucket from the rehashed hash
static int indexFor(int h, int length) {
    return h & (length-1);
}

In order to work efficiently, the size of the inner array needs to be a power of 2, let's see why.

Imagine the array size is 17, the mask value is going to be 16 (size -1). The binary representation of 16 is 0…010000, so for any hash value H the index generated with the bitwise formula “H AND 16” is going to be either 16 or 0. This means that the array of size 17 will only be used for 2 buckets: the one at index 0 and the one at index 16, not very efficient…

But, if you now take a size that is a power of 2 like 16, the bitwise index formula is “H AND 15”. The binary representation of 15 is 0…001111 so the index formula can output values from 0 to 15 and the array of size 16 is fully used. For example:

  • if H = 952 , its binary representation is 0..01110111000, the associated index is 0…01000 = 8
  • if H = 1576 its binary representation is 0..011000101000, the associated index is 0…01000 = 8
  • if H = 12356146, its binary representation is 0..0101111001000101000110010, the associated index is 0…00010 = 2
  • if H = 59843, its binary representation is 0..01110100111000011, the associated index is 0…00011 = 3

This is why the array size is a power of two. This mechanism is transparent for the developer: if he chooses a HashMap with a size of 37, the Map will automatically choose the next power of 2 after 37 (64) for the size of its inner array.

为了高效工作,内部数组的大小需要是 2 的幂,让我们看看为什么。

假设数组大小为 17,掩码值为 16(大小为 -1)。16 的二进制表示是 0…010000,因此对于任何哈希值 H,使用按位公式“H AND 16”生成的索引将是 16 或 0。这意味着大小为 17 的数组将仅用于2 个桶:索引 0 的一个和索引 16 的一个,效率不高……

但是,如果您现在采用 2 的幂的大小,例如 16,则按位索引公式是“H AND 15”。15 的二进制表示为 0…001111,因此索引公式可以输出 0 到 15 的值,充分利用了大小为 16 的数组。例如:

  • 如果 H = 952 ,其二进制表示为 0..01110111000,则相关索引为 0…01000 = 8
  • 如果 H = 1576,其二进制表示为 0..011000101000,则相关索引为 0…01000 = 8
  • 如果 H = 12356146,则其二进制表示为 0..0101111001000101000110010,相关联的索引为 0...00010 = 2
  • 如果 H = 59843,则其二进制表示为 0..01110100111000011,相关联的索引为 0…00011 = 3

这就是数组大小是 2 的幂的原因。这个机制对开发者来说是透明的:如果他选择了一个大小为37的HashMap,Map会自动选择37(64)之后的下一个2的幂作为其内部数组的大小。

回答by fmucar

A general rule to avoid collisions on Maps is to keep to load factor max at around 0.75 To decrease possibility of collisions and avoid expensive copying process HashMap grows at a larger rate.

避免 Maps 冲突的一般规则是将最大加载因子保持在 0.75 左右,以减少冲突的可能性并避免昂贵的复制过程 HashMap 以更大的速度增长。

Also as @Peter says, it must be a power of 2.

同样正如@Peter 所说,它必须是 2 的幂。

回答by Heiko Rupp

Hashing takes advantage of distributing data evenly into buckets. The algorithm tries to prevent multiple entries in the buckets ("hash collisions"), as they will decrease performance.

散列利用将数据均匀分布到桶中。该算法试图防止存储桶中的多个条目(“哈希冲突”),因为它们会降低性能。

Now when the capacity of a HashMap is reached, size is extended and existing data is re-distributed with the new buckets. If the size-increas would be too small, this re-allocation of space and re-dsitribution would happen too often.

现在,当达到 HashMap 的容量时,会扩展大小并使用新存储桶重新分配现有数据。如果尺寸增加太小,这种空间的重新分配和重新分配会经常发生。

回答by Peter Knego

I can't give you a reason why this is so (you'd have to ask Sun developers), but to see how this happens take a look at source:

我无法告诉您为什么会这样(您必须询问 Sun 开发人员),但要了解这是如何发生的,请查看源代码:

  1. HashMap: Take a look at how HashMap resizes to new size (sourceline 799)

         resize(2 * table.length);
    
  2. ArrayList: source, line 183:

    int newCapacity = (oldCapacity * 3)/2 + 1;
    
  1. HashMap:看看 HashMap 如何调整到新的大小(源代码行 799)

         resize(2 * table.length);
    
  2. ArrayList: source,第 183 行:

    int newCapacity = (oldCapacity * 3)/2 + 1;
    

Update:I mistakenly linked to sources of Apache Harmony JDK - changed it to Sun's JDK.

更新:我错误地链接到 Apache Harmony JDK 的来源 - 将其更改为 Sun 的 JDK。