java ConcurrentHashMap 构造函数参数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1573901/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 17:08:32  来源:igfitidea点击:

ConcurrentHashMap constructor parameters?

javahashcodeconcurrenthashmap

提问by non sequitor

I am wondering about the parameters for constructing a ConcurrentHashMap:

我想知道构造一个的参数ConcurrentHashMap

  • initialCapacityis 16 by default (understood).
  • loadFactoris 0.75 by default.
  • concurrencyLevelis 16 by default.
  • initialCapacity默认为 16(理解)。
  • loadFactor默认为 0.75。
  • concurrencyLevel默认为 16。

My questions are:

我的问题是:

  • What criteria should be used to adjust loadFactorup or down?
  • How do we establish the number of concurrently updating threads?
  • What criteria should be used to adjust concurrencyLevelup or down?
  • 应该使用什么标准来loadFactor向上或向下调整?
  • 我们如何建立并发更新线程的数量?
  • 应该使用什么标准来concurrencyLevel向上或向下调整?

Additionally:

此外:

  • What are the hallmarks of a goodhashcode implementation? (If an SO question addresses this, just link to it.)
  • 一个好的哈希码实现的标志是什么?(如果 SO 问题解决了这个问题,只需链接到它。)

Thank you!

谢谢!

采纳答案by Neil Coffey

The short answer: set "initial capacity" to roughly how many mappings you expect to put in the map, and leave the other parameters at their default.

简短的回答:将“初始容量”设置为您希望放入映射中的大致映射数,并将其他参数保留为默认值。

Long answer:

长答案:

  • load factor is the ratio between the number of "buckets" in the map and the number of expected elements;

  • 0.75 is usually a reasonable compromise-- as I recall, it means that with a good hash function, on average we expect about 1.6 redirects to find an element in the map (or around that figure);

    • changing the load factor changes the compromise between more redirects to find an element but less wasted space-- put 0.75 is really usually a good value;

    • in principle, set ConcurrencyLevel to the number of concurrent threads you expect to have modifying the map, although overestimating this doesn't appear to have a bad effect other than wasting memory (I wrote a little on ConcurrentHashMap performancea while ago in case you're interested)

  • 负载因子是地图中“桶”的数量与预期元素数量之间的比率;

  • 0.75 通常是一个合理的折衷——我记得,这意味着使用一个好的散列函数,平均我们期望大约 1.6 次重定向来找到地图中(或该数字周围)的元素;

    • 改变负载因子会改变更多重定向以找到元素但更少浪费空间之间的折衷——放置 0.75 通常确实是一个很好的值;

    • 原则上,将 ConcurrencyLevel 设置为您希望修改映射的并发线程数,尽管高估这似乎除了浪费内存之外没有坏影响(我 不久前写了一些关于ConcurrentHashMap 性能的文章,以防万一)重新感兴趣)

Informally, your hash function should essentially aim to have as much "randomness" in the bits as possible. Or more strictly, the hash code for a given element should give each bit a roughly 50% chance of being set. It's actually easier to illustrate this with an example: again, you may be interested in some stuff I wrote about how the String hash function worksand associated hash function guidelines. Feedback is obvioulsy welcome on any of this stuff.

非正式地,您的散列函数本质上应该旨在使位中具有尽可能多的“随机性”。或者更严格地说,给定元素的哈希码应该给每个位大约 50% 的机会被设置。用一个例子来说明这一点实际上更容易:同样,你可能对我写的一些关于String 散列函数如何工作和相关散列函数指南的内容感兴趣。对这些东西的任何反馈都是明显欢迎的。

One thing I also mention at some point is that you don't have to be too paranoid in practice: if your hash function produces a "reasonable" amount of randomness in someof the bits, then it will often be OK. In the worst case, sticking representative pieces of data into a string and taking the hash code of the string actually doesn't work so badly.

我在某些时候还提到的一件事是,您在实践中不必过于偏执:如果您的哈希函数在某些位中产生“合理”数量的随机性,那么通常没问题。在最坏的情况下,将有代表性的数据插入一个字符串并获取该字符串的哈希码实际上并没有那么糟糕。

回答by Yishai

Load Factor is primarily related to the quality of the hash function. The closer to zero the load factor the less likely there are to be collisions even if the hash function isn't so great. The trade off is that the memory footprint is larger. In other words, the HashMap isn't distributing the entries in seperate buckets for each seperate hashcode, it is grouping them by a proximity, so the more buckets it has, the more spread out the distribution, the less likely that there are collisions.

Load Factor 主要与散列函数的质量有关。负载因子越接近零,即使散列函数不是那么好,发生冲突的可能性也越小。代价是内存占用更大。换句话说,HashMap 不是为每个单独的哈希码分配单独的桶中的条目,而是按邻近度对它们进行分组,因此它拥有的桶越多,分布越分散,发生冲突的可能性就越小。

So the bottom line is you fiddle with load factor to improve lookup time or reduce memory, according to your needs and the objects you are storing in the Map.

因此,底线是根据您的需要和您在 Map 中存储的对象,您可以调整负载因子以改善查找时间或减少内存。

ConcurrencyLevel really depends on your application. If you only have two or three threads running in the application, there you go. If you are an application server with an arbitrary number of threads, then you need to understand what your load capacity is and what point you want to optimize for.

ConcurrencyLevel 实际上取决于您的应用程序。如果应用程序中只有两个或三个线程在运行,那么就可以了。如果您是具有任意数量线程的应用程序服务器,那么您需要了解您的负载能力是什么以及您想要优化的点。

A good quality hashcode implementation provides as wide a distribution across potential values of the object as possible with the least number of collisions, while honoring the contract. In other words, it allows the HashMap (or Set as the case may be) to distribute the objects into separate buckets making lookups faster.

一个高质量的哈希码实现提供尽可能广泛的对象潜在值的分布,冲突次数最少,同时遵守合同。换句话说,它允许 HashMap(或 Set,视情况而定)将对象分布到单独的桶中,​​从而加快查找速度。

回答by Jim Garrison

loadFactor: controls when the implementation decides to resize the hashtable. Too high a value will waste space; too low a value will result in expensive resize operations.

loadFactor:控制实现何时决定调整哈希表的大小。太高的值会浪费空间;太低的值将导致昂贵的调整大小操作。

concurrencyLevel: tells the implementation to try to optimize for the given number of writing threads. According to the API docs, being off by up to a factor of 10 shouldn't have much effect on performance.

concurrencyLevel:告诉实现尝试针对给定数量的写入线程进行优化。根据 API 文档,最多降低 10 倍应该不会对性能产生太大影响。

The allowed concurrency among update operations is guided by the optional concurrencyLevel constructor argument (default 16), which is used as a hint for internal sizing. The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention. But overestimates and underestimates within an order of magnitude do not usually have much noticeable impact.

更新操作之间允许的并发性由可选的 concurrencyLevel 构造函数参数(默认为 16)指导,用作内部调整的提示。该表在内部进行了分区,以尝试允许指定数量的并发更新而不会发生争用。因为散列表中的放置本质上是随机的,所以实际的并发性会有所不同。理想情况下,您应该选择一个值来容纳尽可能多的线程同时修改表。使用明显高于您需要的值会浪费空间和时间,而明显较低的值会导致线程争用。但是在一个数量级内的高估和低估通常不会产生太大的影响。

A good hashcode implementation will distribute the hash values uniformly over any interval. If the set of keys is known in advance it is possible to define a "perfect" hash function that creates a unique hash value for each key.

一个好的哈希码实现将在任何时间间隔内均匀分布哈希值。如果预先知道这组键,就可以定义一个“完美”的散列函数,为每个键创建一个唯一的散列值。

回答by Kevin

loadFactor is set to 0.75 by default, what criteria should be used to adjust this up or down?

loadFactor 默认设置为 0.75,应该使用什么标准来向上或向下调整?

You need some background in how hash maps work before you can understand how this works. The map is essentially a series of buckets. Each value in the map gets put in to a bucket depending on what its hash code is. The loadFactor means, if the buckets are more than 75% full, the Map should be resized

在理解哈希映射的工作原理之前,您需要了解哈希映射的工作原理。地图本质上是一系列桶。映射中的每个值都根据其哈希码放入一个桶中。loadFactor 意味着,如果桶的满度超过 75%,则应该调整 Map 的大小

concurrencyLevel is set to 16 by default, how do we establish the number of concurrently updating threads? What criteria should be used to adjust this up or down?

concurrencyLevel 默认设置为16,我们如何建立并发更新线程的数量?应该使用什么标准来向上或向下调整?

This is asking how many threads to you expect to modify the Map concurrently (simultaneously)

这是询问您希望同时(同时)修改 Map 的线程数

For hash codes, see Joshua Bloch's Effective Java

有关哈希码,请参阅 Joshua Bloch 的Effective Java