java 哈希桶的数量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10379433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
number of hash buckets
提问by Learner
In the HashMapdocumentation, it is mentioned that:
在HashMap文档中,提到:
- the initial capacityis simply the capacity at the time the hash table is created
- the capacityis the number of buckets in the hash table.
- 的初始容量是简单地在时间中创建哈希表中的容量
- 的容量是在哈希表中桶的数量。
Now suppose we have intial capacity of 16 (default), and if we keep adding elements to 100 nos, the capacity of hashmap is 100 * loadfactor.
现在假设我们的初始容量为 16(默认),如果我们继续向 100 个元素添加元素,hashmap 的容量为 100 * loadfactor。
Will the number of hash buckets is 100 or 16?
哈希桶的数量是 100 还是 16?
Edit:
From the solution I read: buckets are more than the elements added.
Taking this as view point: so if we add Strings as key, we will get one element/bucket resulting in a lot of space consumption/complexity, is my understanding right ?
编辑:
从我读到的解决方案中:桶比添加的元素多。以此为观点:所以如果我们添加字符串作为键,我们将得到一个元素/桶,导致大量空间消耗/复杂性,我的理解对吗?
回答by Adam Mihalcin
Neither 100 nor 16 buckets. Most likely there will be 256 buckets, but this isn't guaranteed by the documentation.
既不是 100 也不是 16 桶。很可能会有 256 个存储桶,但文档并不能保证这一点。
From the updated documentation link:
从更新的文档链接:
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twicethe number of buckets.
负载因子是衡量哈希表在其容量自动增加之前允许达到多满的指标。当哈希表中的条目数超过负载因子和当前容量的乘积时,重新哈希表(即重建内部数据结构),使哈希表具有大约两倍的桶数。
(emphasis mine)
(强调我的)
So, if we ignore the word "approximately" above, we determine that whenever the hash table becomes 75% full (or whichever load factor you specify in the constructor), the number of hash buckets doubles. That means that the number of buckets doubles whenever you insert the 12th, 24th, 48th, and 96th elements, leaving a total of 256 buckets.
因此,如果我们忽略上面的“大约”这个词,我们会确定每当哈希表满 75%(或您在构造函数中指定的任何负载因子)时,哈希桶的数量就会加倍。这意味着每当您插入第 12 个、第 24 个、第 48 个和第 96 个元素时,桶的数量就会翻倍,总共剩下 256 个桶。
However, as I emphasized in the documentation snippet, the number is approximatelytwice the previous size, so it may not be exactly 256. In fact, if the second-to-last doubling is replaced with a slightly larger increase, the last doubling may never happen, so the final hash table may be as small as 134 buckets, or may be larger than 256 elements.
然而,正如我在文档片段中所强调的,这个数字大约是之前大小的两倍,所以它可能不完全是 256。 事实上,如果倒数第二个加倍被替换为稍微大一点的增加,那么最后一个加倍可能永远不会发生,所以最终的哈希表可能小到 134 个桶,或者可能大于 256 个元素。
N.B. I arrived at the 134 number because it's the smallest integer N
such that 0.75 * N > 100
.
注意我得到了 134 数字,因为它是最小的整数N
,使得0.75 * N > 100
.
回答by Thomas
Looking at the source code of HashMap
we see the following:
查看源代码,HashMap
我们看到以下内容:
threshold = capacity * loadfactor
size = number of elements in the map
if( size >= threshold ) {
double capacity
}
Thus, if the initial capacity is 16 and your load factor is 0.75 (the default), the initial threshold will be 12. If you add the 12th element, the capacity rises to 32 with a threshold of 24. The next step would be capacity 64 and threshold 48 etc. etc.
因此,如果初始容量为 16 且您的负载因子为 0.75(默认值),则初始阈值为 12。如果添加第 12 个元素,容量将上升至 32,阈值为 24。下一步将是容量64 和阈值 48 等等。
So with 100 elements, you should have a capacity of 256 and a threshold of 192.
因此,对于 100 个元素,您应该拥有 256 个容量和 192 个阈值。
Note that this applies only to the standard values. If you know the approximate number of elements your map will contain you should create it with a high enough initial capacity in order to prevent the copying around when the capacity is increased.
请注意,这仅适用于标准值。如果您知道您的地图将包含的元素的大致数量,您应该使用足够高的初始容量创建它,以防止在容量增加时进行复制。
Update:
更新:
A word on the capacity: it will always be a power of two, even if you define a different initial capacity. The hashmap will then set the capacity to the smallest power of 2 that is greater than or equal to the provided initial capacity.
关于容量的一句话:它始终是 2 的幂,即使您定义了不同的初始容量。然后哈希图将容量设置为大于或等于提供的初始容量的 2 的最小幂。
回答by Nandkumar Tekale
From your link :
从您的链接:
When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method.
当哈希表中的条目数超过负载因子和当前容量的乘积时,通过调用 rehash 方法将容量大致翻倍。
That means if we have initial capacity 16 and when it exceeds, capacity will be increased by 32, next time by 64 and so on.
这意味着如果我们的初始容量为 16,当它超过时,容量将增加 32,下次增加 64,依此类推。
In your case, you are adding 100 nos. So when you come to 16th number, size will be added by 32 so now total size 48. Again you keep adding till 48th number now size will be increased by 64. Thus, in your case, total size of bucket is 112.
在您的情况下,您要添加 100 个号码。因此,当您到达第 16 个数字时,大小将增加 32,因此现在总大小为 48。再次继续添加直到第 48 个数字,现在大小将增加 64。因此,在您的情况下,桶的总大小为 112。
回答by delicateLatticeworkFever
You are going to have at leastone bucket per actual item. If you add items beyond 16, the table must be resized and rehashed.
你将不得不至少按实际项目一斗。如果您添加的项目超过 16 个,则必须调整表格大小并重新散列。
Now suppose we have intial capacity of 16 (default), and if we keep adding elements to 100 nos, the capacity of hashmap is 100 * loadfactor.
现在假设我们的初始容量为 16(默认),如果我们继续向 100 个元素添加元素,hashmap 的容量为 100 * loadfactor。
Actually it says:
其实它说:
If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.
如果初始容量大于最大条目数除以负载因子,则不会发生重新哈希操作。
Ie, if there is a maximum of 100 items and the capacity is 100/0.75 = 133, then no re-hashing should ever occur. Notice that this implies that even if the table is not full, it may have to be rehashed when close to full. So the ideal initial capacity to set using the default load factor, if you expect <=100 items, is ~135+.
即,如果最多有 100 个项目并且容量为 100/0.75 = 133,则不应发生重新散列。请注意,这意味着即使表未满,在接近满时也可能需要重新散列。因此,使用默认加载因子设置的理想初始容量(如果您期望 <=100 个项目)是 ~135+。
回答by Sumit Singh
In doc
在文档中
When the number of entries in the hash table exceeds the product of the
load factor and the current capacity, the capacity is roughly doubled by
calling the rehash method.
threshold=product of the load factor and the current capacity
Lets try.. initial size of hashmap is 16
and default load factor is 0.75
so 1st threshold is 12 so adding 12 element next capacity will be..
(16*2) =32
2st threshold is 24 so after adding 24th element next capacity will be (32*2)=64
让我们试试..hashmap 的初始大小是 16
并且默认负载因子是0.75
所以第一个阈值是 12 所以添加 12 个元素下一个容量将是.. (16*2) =32
第二个阈值是 24 所以在添加第 24 个元素后下一个容量将是(32*2)=64
and so on..
等等..