Java 性能 ConcurrentHashmap 与 HashMap

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1378310/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 11:21:47  来源:igfitidea点击:

Performance ConcurrentHashmap vs HashMap

javacollectionshashmap

提问by Mauli

How is the performance of ConcurrentHashMap compared to HashMap, especially .get() operation (I'm especially interested for the case of only few items, in the range between maybe 0-5000)?

与 HashMap 相比,ConcurrentHashMap 的性能如何,尤其是 .get() 操作(我对只有少数项目的情况特别感兴趣,范围可能在 0-5000 之间)?

Is there any reason not to use ConcurrentHashMap instead of HashMap?

有什么理由不使用 ConcurrentHashMap 而不是 HashMap 吗?

(I know that null values aren't allowed)

(我知道不允许空值)

Update

更新

just to clarify, obviously the performance in case of actual concurrent access will suffer, but how compares the performance in case of no concurrent access?

只是为了澄清,显然在实际并发访问的情况下性能会受到影响,但是在没有并发访问的情况下如何比较性能?

采纳答案by Atais

I was really surprised to find this topic to be so old and yet no one has yet provided any tests regarding the case. Using ScalaMeterI have created tests of add, getand removefor both HashMapand ConcurrentHashMapin two scenarios:

我真的很惊讶地发现这个话题这么老了,而且还没有人提供任何关于这个案例的测试。使用ScalaMeter我创建的测试addgetremove为双方HashMapConcurrentHashMap在两种情况下:

  1. using single thread
  2. using as many threads as I have cores available. Note that because HashMapis not thread-safe, I simply created separate HashMapfor each thread, but used one, shared ConcurrentHashMap.
  1. 使用单线程
  2. 使用尽可能多的线程,因为我有可用的内核。请注意,因为HashMap不是线程安全的,我只是HashMap为每个线程单独创建了一个,但使用了一个 shared ConcurrentHashMap

Code is available on my repo.

代码可在我的 repo 上找到

The results are as follows:

结果如下:

  • X axis (size) presents number of elements written to the map(s)
  • Y axis (value) presents time in milliseconds
  • X 轴(大小)表示写入地图的元素数量
  • Y 轴(值)以毫秒为单位表示时间

Add method Get method Remove method

添加方法 获取方法 移除方法

The summary

摘要

  • If you want to operate on your data as fast as possible, use all the threads available. That seems obvious, each thread has 1/nth of the full work to do.

  • If you choose a single thread access use HashMap, it is simply faster. For addmethod it is even as much as 3x more efficient. Only getis faster on ConcurrentHashMap, but not much.

  • When operating on ConcurrentHashMapwith many threads it is similarly effective to operating on separate HashMapsfor each thread. So there is no need to partition your data in different structures.

  • 如果您想尽快对数据进行操作,请使用所有可用线程。这似乎很明显,每个线程都有 1/n 的全部工作要做。

  • 如果您选择单线程访问 use HashMap,它只会更快。对于add方法,它的效率甚至提高了 3 倍。只有get在 上更快ConcurrentHashMap,但不多。

  • 当操作ConcurrentHashMap多个线程时,HashMaps对每个线程单独操作同样有效。因此无需将您的数据划分为不同的结构。

To sum up, the performance for ConcurrentHashMapis worse when you use with single thread, but adding more threads to do the work will definitely speed-up the process.

综上所述,ConcurrentHashMap当你使用单线程时,性能会更差,但添加更多线程来完成工作肯定会加快进程。

Testing platform

AMD FX6100, 16GB Ram
Xubuntu 16.04, Oracle JDK 8 update 91, Scala 2.11.8

测试平台

AMD FX6100、16GB Ram
Xubuntu 16.04、Oracle JDK 8 更新 91、Scala 2.11.8

回答by Brian Agnew

I would recommend you measure it, since (for one reason) there maybe some dependence on the hashing distribution of the particular objects you're storing.

我建议您测量它,因为(出于一个原因)可能对您存储的特定对象的散列分布有一定的依赖性。

回答by oxbow_lakes

What answer are you expecting here?

你在这里期待什么答案?

It is obviously going to dependon the number of reads happening at the same timeas writes and how long a normal map must be "locked" on a write operation in your app (and whether you would make use of the putIfAbsentmethod on ConcurrentMap). Any benchmark is going to be largely meaningless.

显然要依赖于大量的读取发生在同一时间的写入与多久法线贴图必须在你的应用程序上的写操作“锁定”(以及是否会利用的putIfAbsent有关方法ConcurrentMap)。任何基准在很大程度上都将毫无意义。

回答by Vitaly

It's not clear what your mean. If you need thread safeness, you have almost no choice - only ConcurrentHashMap. And it's definitely have performance/memory penalties in get() call - access to volatile variables and lock if you're unlucky.

不清楚你的意思。如果您需要线程安全,您几乎别无选择——只有 ConcurrentHashMap。而且它在 get() 调用中肯定会有性能/内存损失——如果你不走运,可以访问易失性变量和锁定。

回答by Robert Christie

The standard hashmap provides no concurrency protection whereas the concurrent hashmap does. Before it was available, you could wrap the hashmap to get thread safe access but this was coarse grain locking and meant all concurrent access got serialised which could really impact performance.

标准 hashmap 不提供并发保护,而并发 hashmap 提供。在它可用之前,您可以包装 hashmap 以获得线程安全访问,但这是粗粒度锁定,意味着所有并发访问都被序列化,这可能会真正影响性能。

The concurrent hashmap uses lock stripping and only locks items that affected by a particular lock. If you're running on a modern vm such as hotspot, the vm will try and use lock biasing, coarsaning and ellision if possible so you'll only pay the penalty for the locks when you actually need it.

并发散列图使用锁剥离并且只锁定受特定锁影响的项目。如果您在诸如热点之类的现代虚拟机上运行,​​虚拟机将尽可能尝试使用锁定偏置、粗调和省略号,因此您只需在实际需要时才为锁定支付罚款。

In summary, if your map is going to be accesaed by concurrent threads and you need to guarantee a consistent view of it's state, use the concurrent hashmap.

总之,如果您的映射将被并发线程访问,并且您需要保证其状态的一致视图,请使用并发哈希映射。

回答by Bill Michell

Thread safety is a complex question. If you want to make an object thread safe, do it consciously, and document that choice. People who use your class will thank you if it is thread safe when it simplifies their usage, but they will curse you if an object that once was thread safe becomes not so in a future version. Thread safety, while really nice, is not just for Christmas!

线程安全是一个复杂的问题。如果您想让对象线程安全,请有意识地进行,并记录该选择。如果你的类在简化他们的使用时是线程安全的,那么使用你的类的人会感谢你,但是如果一个曾经是线程安全的对象在未来的版本中变得不是线程安全的,他们会诅咒你。线程安全虽然非常好,但不仅仅适用于圣诞节!

So now to your question:

所以现在你的问题:

ConcurrentHashMap (at least in Sun's current implementation) works by dividing the underlying map into a number of separate buckets. Getting an element does not require any locking per se, but it does use atomic/volatile operations, which implies a memory barrier (potentially very costly, and interfering with other possible optimisations).

ConcurrentHashMap(至少在Sun 的当前实现中)通过将底层映射划分为多个单独的桶来工作。获取元素本身不需要任何锁定,但它确实使用原子/易失性操作,这意味着内存屏障(可能非常昂贵,并干扰其他可能的优化)。

Even if all the overhead of atomic operations can be eliminated by the JIT compiler in a single-threaded case, there is still the overhead of deciding which of the buckets to look in - admittedly this is a relatively quick calculation, but nevertheless, it is impossible to eliminate.

即使在单线程情况下,JIT 编译器可以消除所有原子操作的开销,仍然存在决定要查看哪个桶的开销 - 诚然,这是一个相对较快的计算,但无论如何,它是无法消除。

As for deciding which implementation to use, the choice is probably simple.

至于决定使用哪种实现,选择可能很简单。

If this is a static field, you almost certainly want to use ConcurrentHashMap, unless testing shows this is a real performance killer. Your class has different thread safety expectations from the instances of that class.

如果这是一个静态字段,您几乎肯定要使用 ConcurrentHashMap,除非测试表明这是一个真正的性能杀手。您的类与该类的实例具有不同的线程安全期望。

If this is a local variable, then chances are a HashMap is sufficient - unless you know that references to the object can leak out to another thread. By coding to the Map interface, you allow yourself to change it easily later if you discover a problem.

如果这是一个局部变量,那么 HashMap 就足够了 - 除非您知道对对象的引用可能会泄漏到另一个线程。通过对 Map 界面进行编码,您可以在以后发现问题时轻松更改它。

If this is an instance field, and the class hasn't been designed to be thread safe, then document it as not thread safe, and use a HashMap.

如果这是一个实例字段,并且该类尚未设计为线程安全的,则将其记录为非线程安全的,并使用 HashMap。

If you know that this instance field is the only reason the class isn't thread safe, and are willing to live with the restrictions that promising thread safety implies, then use ConcurrentHashMap, unless testing shows significant performance implications. In that case, you might consider allowing a user of the class to choose a thread safe version of the object somehow, perhaps by using a different factory method.

如果您知道此实例字段是该类不是线程安全的唯一原因,并且愿意接受承诺线程安全所暗示的限制,那么请使用 ConcurrentHashMap,除非测试显示出显着的性能影响。在这种情况下,您可能会考虑允许类的用户以某种方式选择对象的线程安全版本,也许是通过使用不同的工厂方法。

In either case, document the class as being thread safe (or conditionally thread safe) so people who use your class know they can use objects across multiple threads, and people who edit your class know that they must maintain thread safety in future.

在任何一种情况下,将类记录为线程安全(或有条件线程安全),以便使用您的类的人知道他们可以跨多个线程使用对象,而编辑您的类的人知道他们将来必须保持线程安全。

回答by Harisankar Krishna Swamy

In the case of a 1000 element hash table using 10 locks for whole table saves close to half the time when 10000 threads are inserting and 10000 threads are deleting from it.

对于 1000 个元素的哈希表,对整个表使用 10 个锁可以节省接近 10000 个线程插入和 10000 个线程从中删除的时间的一半。

The interesting run time difference is here

有趣的运行时差在这里

Always use Concurrent data structure. except when the downside of striping (mentioned below) becomes a frequent operation. In that case you will have to acquire all the locks? I read that the best ways to do this is by recursion.

始终使用并发数据结构。除非条带化的缺点(如下所述)成为频繁操作。在那种情况下,您将不得不获取所有锁?我读到最好的方法是递归。

Lock striping is useful when there is a way of breaking a high contention lock into multiple locks without compromising data integrity. If this is possible or not should take some thought and is not always the case. The data structure is also the contributing factor to the decision. So if we use a large array for implementing a hash table, using a single lock for the entire hash table for synchronizing it will lead to threads sequentially accessing the data structure. If this is the same location on the hash table then it is necessary but, what if they are accessing the two extremes of the table.

当有一种方法可以将高争用锁分解为多个锁而不影响数据完整性时,锁条带很有用。如果这是可能的,应该考虑一下,并非总是如此。数据结构也是决策的促成因素。所以如果我们用一个大数组来实现一个哈希表,对整个哈希表使用一个锁进行同步,就会导致线程顺序访问数据结构。如果这是哈希表上的相同位置,那么这是必要的,但是,如果它们正在访问表的两个极端呢?

The down side of lock striping is it is difficult to get the state of the data structure that is affected by striping. In the example the size of the table, or trying to list/enumerate the whole table may be cumbersome since we need to acquire all of the striped locks.

锁条带化的缺点是很难获得受条带化影响的数据结构的状态。在示例中,表的大小或尝试列出/枚举整个表可能很麻烦,因为我们需要获取所有条带锁。

回答by Jean-Michel

Of course a Map without any lock system wins against one with thread-safe behavior which needs more work. The point of the Concurrent one is to be thread safe without using synchronized so to be faster than HashTable. Same graphics would would be very interesting for ConcurrentHashMap vs Hashtable (which is synchronized).

当然,没有任何锁定系统的 Map 会胜过需要更多工作的具有线程安全行为的 Map。Concurrent 的重点是在不使用 synchronized 的情况下是线程安全的,因此比 HashTable 更快。对于 ConcurrentHashMap 与 Hashtable(同步),相同的图形会非常有趣。