Java HashMap 初始化参数(load/initialcapacity)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/434989/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 14:34:02  来源:igfitidea点击:

HashMap initialization parameters (load / initialcapacity)

javacollectionshashmap

提问by Ran Biron

What values should I pass to create an efficient HashMap/ HashMapbased structures for N items?

我应该传递什么值来为 N 个项目创建一个高效HashMap/HashMap基于结构的结构?

In an ArrayList, the efficient number is N (N already assumes future grow). What should be the parameters for a HashMap? ((int)(N * 0.75d), 0.75d)? More? Less? What is the effect of changing the load factor?

在 a 中ArrayList,有效数字是 N(N 已经假设未来增长)。a 的参数应该是什么HashMap?((int)(N * 0.75d), 0.75d)?更多的?较少的?改变负载因子有什么影响?

采纳答案by Yuval Adam

Regarding the load factor, I'll simply quote from the HashMap javadoc:

关于负载因子,我将简单地引用HashMap javadoc

As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. Higher values decrease the space overhead but increase the lookup cost (reflected in most of the operations of the HashMap class, including get and put). The expected number of entries in the map and its load factor should be taken into account when setting its initial capacity, so as to minimize the number of rehash operations. If the initial capacity is greater than the maximum number of entries divided by the load factor, no rehash operations will ever occur.

作为一般规则,默认负载因子 (.75) 提供了时间和空间成本之间的良好折衷。较高的值会减少空间开销,但会增加查找成本(反映在 HashMap 类的大多数操作中,包括 get 和 put)。在设置其初始容量时,应考虑地图中的预期条目数及其负载因子,以尽量减少重新哈希操作的次数。如果初始容量大于最大条目数除以负载因子,则不会发生重新哈希操作。

Meaning, the load factor should not be changed from .75, unless you have some specific optimization you are going to do. Initial capacity is the only thing you want to change, and set it according to your Nvalue - meaning (N / 0.75) + 1, or something in that area. This will ensure that the table will always be large enough and no rehashing will occur.

意思是,负载因子不应从 更改.75,除非您要进行一些特定的优化。初始容量是您唯一要更改的内容,并根据您的N值进行设置 - 含义(N / 0.75) + 1,或该区域的某些内容。这将确保表总是足够大并且不会发生重新散列。

回答by Andrzej Doyle

In an ArrayList, the efficient number is N (N already assumes future grow).

在 ArrayList 中,有效数字是 N(N 已经假设未来增长)。

Erm, no it doesn't, unless I misunderstand what you're saying here. When you pass an integer into the Arraylist constructor, it will create an underlying array of exactly that size. If it turns out you need even a single extra element, the ArrayList will need to resize the underlying array when you next call add(), causing this call to take a lot longer than it usually would.

呃,不,它没有,除非我误解了你在这里所说的。当您将一个整数传递给 Arraylist 构造函数时,它将创建一个完全相同大小的底层数组。如果事实证明您甚至需要一个额外的元素,则 ArrayList 将需要在您下次调用 add() 时调整底层数组的大小,从而导致此调用比通常花费的时间更长。

If on the other hand you're talking about your value of N taking into account growth - then yes, if you can guarantee the value will never go above this then calling such an Arraylist constructor is appropriate. And in this case, as pointed out by Hank, the analogous constructor for a map would be N and 1.0f. This should perform reasonably even if you do happen to exceed N (though if you expect this to occur on a regular basis, you may wish to pass in a larger number for the initial size).

另一方面,如果您在谈论考虑增长的 N 值 - 那么是的,如果您可以保证该值永远不会超过此值,那么调用这样的 Arraylist 构造函数是合适的。在这种情况下,正如 Hank 所指出的,映射的类似构造函数将是 N 和 1.0f。即使您碰巧超过 N,这也应该合理地执行(尽管如果您希望这种情况定期发生,您可能希望为初始大小传入更大的数字)。

The load factor, in case you weren't aware, is the point at which the map will have its capacity increased, as a fraction of the total capacity.

如果您不知道,负载因子是地图容量增加的点,作为总容量的一部分。

Edit: Yuval is probably right that it's a better idea to leave the load factor around 0.75 for a general purpose map. A load factor of 1.0 would perform brilliantly if your keys had sequential hashcodes (such as sequential integer keys), but for anything else you will likely run into collisions with the hash buckets, meaning that lookups take longer for some elements. Creating more buckets than is strictly necessary will reduce this chance of collision, meaning there's more chance of elements being in their own buckets and thus being retrievable in the shortest amount of time. As the docs say, this is a time vs space tradeoff. If either is particularly important to you (as shown by a profiler rather than prematurely optimising!) you can emphasize that; otherwise, stick with the default.

编辑:Yuval 可能是对的,对于通用地图,将负载因子保留在 0.75 左右是一个更好的主意。如果您的键具有连续的哈希码(例如连续的整数键),则负载因子 1.0 会表现出色,但对于其他任何事情,您可能会遇到与哈希桶的冲突,这意味着某些元素的查找时间更长。创建比严格需要的更多的桶将减少这种碰撞的机会,这意味着元素在它们自己的桶中的机会更多,因此可以在最短的时间内检索。正如文档所说,这是一个时间与空间的权衡。如果其中任何一个对您特别重要(如分析器所示,而不是过早优化!),您可以强调这一点;否则,坚持使用默认值。

回答by Zarkonnen

It's also notable that having a HashMap on the small side makes hash collisions more likely, which can slow down lookup. Hence, if you really worry about the speed of the map, and less about its size, it might be worth making it a bit too large for the data it needs to hold. Since memory is cheap, I typically initialise HashMaps for a known number of items with

同样值得注意的是,在较小的一侧使用 HashMap 会使哈希冲突的可能性更大,这会减慢查找速度。因此,如果你真的担心地图的速度,而不是它的大小,那么对于它需要保存的数据来说,让它有点过大可能是值得的。由于内存很便宜,我通常为已知数量的项目初始化 HashMaps

HashMap<Foo> myMap = new HashMap<Foo>(numberOfElements * 2);

Feel free to disagree, in fact I'd quite like to have this idea verified or thrown out.

随意不同意,事实上我很想验证或抛弃这个想法。

回答by grayger

Referring to HashMap source code will help.

参考 HashMap 源代码会有所帮助。

If the number of entries reaches threshold(capacity * load factor), rehashing is done automatically. That means too small load factor can incur frequent rehashing as entries grow.

如果条目数达到阈值(容量 * 负载因子),则会自动进行重新散列。这意味着太小的负载因子会随着条目的增长而导致频繁的重新散列。

回答by Laurence Vanhelsuwe

For very large HashMaps in critical systems, where getting the initial capacity wrong can be very problematic, you may need empirical information to determine how best to initialize your Map.

对于关键系统中非常大的 HashMap,初始容量错误可能会非常成问题,您可能需要经验信息来确定如何最好地初始化您的 Map。

CollectionSpy (collectionspy.com) is a new Java profiler which lets you see in the blink of an eye which HashMaps are close to needing rehashing, how many times they have been rehashed in the past, and more. An ideal tool to determine safe initial capacity arguments to capacity-based container constructors.

CollectionSpy ( collectionspy.com) 是一个新的 Java 分析器,它让您可以在眨眼间看到哪些 HashMap 接近需要重新哈希,它们过去被重新哈希了多少次,等等。确定基于容量的容器构造函数的安全初始容量参数的理想工具。

回答by NotEdible

The answer Yuval gave is only correct for Hashtable. HashMap uses power-of-two buckets, so for HashMap, Zarkonnen is actually correct. You can verify this from the source code:

Yuval 给出的答案仅适用于 Hashtable。HashMap 使用二的幂的桶,所以对于 HashMap,Zarkonnen 实际上是正确的。您可以从源代码中验证这一点:

  // Find a power of 2 >= initialCapacity
  int capacity = 1;
  while (capacity < initialCapacity)
  capacity <<= 1;

So, although the load factor of 0.75f is still the same between Hashtable and HashMap, you should use an initial capacity n*2 where n is the number of elements you plan on storing in the HashMap. This will ensure the fastest get/put speeds.

因此,尽管 Hashtable 和 HashMap 的负载因子 0.75f 仍然相同,但您应该使用初始容量 n*2,其中 n 是您计划在 HashMap 中存储的元素数。这将确保最快的获取/放置速度。

回答by lv2program

It's safe in most cases of Listand Mapinitialization to make the Listor Mapwith the following size params.

这是在大多数情况下,安全ListMap初始化,使ListMap有以下尺寸PARAMS。

List<T>(numElements + (numElements / 2));
Map<T,T>(numElements + (numElements / 2));

this follows the .75rule as well as saves a little overhead over the * 2operation described above.

这遵循0.75规则,并且比上述* 2操作节省了一点开销。

回答by Mark Rhodes

I ran some unit teststo see if these answers were correct and it turned out that using:

我运行了一些单元测试,看看这些答案是否正确,结果是使用:

(int) Math.ceil(requiredCapacity / loadFactor);

as the initial capacity gives what you want for either a HashMapor a Hashtable. By "what you want" I mean that adding requiredCapacityelements to the map won't cause the array which it's wrapping to resize and the array won't be larger than required. Since the default load capacity is 0.75, initializing a HashMap like so works:

因为初始容量为 aHashMap或 a提供了您想要的Hashtable。“你想要什么”我的意思是向requiredCapacity地图添加元素不会导致它包装的数组调整大小,并且数组不会大于所需的大小。由于默认负载容量为 0.75,因此像这样初始化 HashMap 可以工作:

... = new HashMap<KeyType, ValueType>((int) Math.ceil(requiredCapacity / 0.75));

Since a HashSet is effectively just a wrapper for a HashMap, the same logic also applies there, i.e. you can construct a HashSet efficiently like this:

由于 HashSet 实际上只是 HashMap 的包装器,因此同样的逻辑也适用于那里,即您可以像这样有效地构造 HashSet:

.... = new HashSet<TypeToStore>((int) Math.ceil(requiredCapacity / 0.75));

@Yuval Adam's answer is correct for all cases except where (requiredCapacity / 0.75)is a power of 2, in which case it allocates too much memory.
@NotEdible's answer uses too much memory in many cases, as the HashMap's constructor itself deals with the issues that it want the maps array to have a size which is a power of 2.

@Yuval Adam 的答案在所有情况下都是正确的,除了 where(requiredCapacity / 0.75)是 2 的幂,在这种情况下它分配了太多内存。
@NotEdible 的答案在许多情况下使用了太多内存,因为 HashMap 的构造函数本身处理的问题是它希望 map 数组的大小为 2 的幂。

回答by linqu

In the guava librariesfrom Google there is a function that creates a HashMap optimized for a expected number of items: newHashMapWithExpectedSize

在谷歌的番石榴库中,有一个函数可以创建一个针对预期数量的项目优化的 HashMap:newHashMapWithExpectedSize

from the docs:

从文档:

Creates a HashMap instance, with a high enough "initial capacity" that it should hold expectedSize elements without growth ...

创建一个 HashMap 实例,具有足够高的“初始容量”,它应该可以容纳 expectedSize 元素而不会增长......