什么是最有效的 Java Collections 库?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/629804/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the most efficient Java Collections library?
提问by Frank
What is the most efficient Java Collections library?
什么是最有效的 Java Collections 库?
A few years ago, I did a lot of Java and had the impression back then that troveis the best (most efficient) Java Collections implementation. But when I read the answers to the question "Most useful free Java libraries?" I noticed that troveis hardly mentioned. So which Java Collections library is best now?
几年前,我做了很多 Java,当时的印象是trove是最好(最有效)的 Java Collections 实现。但是当我阅读“最有用的免费 Java 库?”这个问题的答案时,我注意到几乎没有提到trove。那么现在哪个 Java Collections 库最好?
UPDATE:To clarify, I mostly want to know what library to use when I have to store millions of entries in a hash table etc. (need a small runtime and memory footprint).
更新:澄清一下,我主要想知道当我必须在哈希表等中存储数百万个条目时使用什么库(需要小的运行时和内存占用)。
采纳答案by Jon Skeet
From inspection, it looks like Trove is just a library of collections for primitive types - it's not like it's meant to be adding a lot of functionality over the normal collections in the JDK.
从检查来看,Trove 看起来只是一个原始类型集合库——它并不意味着要在 JDK 中的普通集合上添加很多功能。
Personally (and I'm biased) I love Guava(including the former Google Java Collections project). It makes various tasks (including collections) a lot easier, in a way which is at least reasonably efficient. Given that collection operations rarely form a bottleneck in my code (in my experience) this is "better" than a collections API which may be more efficient but doesn't make my code as readable.
就个人而言(而且我有偏见),我喜欢Guava(包括以前的 Google Java Collections 项目)。它使各种任务(包括集合)变得更加容易,至少以一种合理有效的方式。鉴于集合操作很少在我的代码中形成瓶颈(以我的经验),这比集合 API“更好”,集合 API 可能更有效但不会使我的代码具有可读性。
Given that the overlap between Trove and the Guava is pretty much nil, perhaps you could clarify what you're actually looking for from a collections library.
鉴于 Trove 和 Guava 之间的重叠几乎为零,也许您可以从集合库中澄清您实际要寻找的内容。
回答by Yuval Adam
java.util
java.util
Sorry for the obvious answer, but for most uses, the default Java Collectionsare more than sufficient.
抱歉给出了明显的答案,但对于大多数用途来说,默认的Java 集合已经绰绰有余了。
回答by Andreas Petersson
ConcurrentHashMapas well as the java.util.concurrent
package should be mentioned, if you plan to use the HashMap in multiple threads. small memory footprint is assued, since this is part of standard java.
java.util.concurrent
如果您打算在多个线程中使用 HashMap,则应提及ConcurrentHashMap以及包。小内存占用是有保证的,因为这是标准 java 的一部分。
回答by duffymo
Depends on how we define "efficient".
取决于我们如何定义“高效”。
Every data structure has its own Big-Oh behavior for reading, writing, iterating, memory footprint, etc. A linked list in one library is likely to be the same as any other. And a hash map will be faster for reading O(1) than a linked list O(n).
每个数据结构都有自己的读取、写入、迭代、内存占用等的 Big-Oh 行为。一个库中的链表可能与任何其他库相同。哈希映射读取 O(1) 比链表 O(n) 更快。
But when I read the answers to the question "Most useful free Java libraries?" I noticed that trove is hardly mentioned.
但是当我阅读“最有用的免费 Java 库?”这个问题的答案时。我注意到 trove 几乎没有被提及。
This doesn't sound like "most efficient". It sounds like "most popular" to me.
这听起来不像“最有效”。对我来说,这听起来像是“最受欢迎”。
Just some feedback - I've never heard of it, and I don't know anyone who has used it. Collections built into the JDK, Google, or Apache Commons are well-known to me.
只是一些反馈 - 我从未听说过它,而且我不知道有谁使用过它。内置于 JDK、Google 或 Apache Commons 中的集合对我来说是众所周知的。
回答by duffymo
Trove offers a few advantages.
Trove 提供了一些优势。
- smaller memory footprint, it doesn't used Map.Entry objects
- you can use hash strategies instead keys for maps, this saves memory and means you don't need to define a new key each time you want to cache an object on a new set of its attributes
- it has primitive collection types
- think it has some form of internal iterator
- 较小的内存占用,它不使用 Map.Entry 对象
- 您可以使用哈希策略代替映射的键,这可以节省内存,并且意味着您不需要在每次想要在一组新属性上缓存对象时定义新键
- 它具有原始集合类型
- 认为它有某种形式的内部迭代器
That said, a lot has been done to improve jdk collections since trove was written.
也就是说,自从编写 trove 以来,已经做了很多工作来改进 jdk 集合。
It's the hashing strategies that make it appealing to me though... Google for trove and read their overview.
不过,正是散列策略吸引了我……谷歌搜索宝库并阅读他们的概述。
回答by sstock
As other commentators have noticed, the definition of "efficient" casts a wide net. However no one has yet mentioned the Javolution library.
正如其他评论员所注意到的,“高效”的定义撒下了一张大网。然而,还没有人提到Javolution 库。
Some of the highlights:
一些亮点:
- Javolution classes are fast, very fast (e.g. Text insertion/deletion in O[Log(n)] instead of O[n] for standard StringBuffer/StringBuilder).
- All Javolution classes are hard real-time compliant and have highly deterministic behavior (in the microsecond range). Furthermore (unlike the standard library), Javolution is RTSJ safe (no memory clash or memory leak when used with Java Real-Time extension).
- Javolution's real-time collection classes (map, list, table and set) can be used in place of most standard collection classes and provide additional functionality.
- The Javolution collections provide concurrency guarantees to make implementation of parallel algorithms easier.
- Javolution 类很快,非常快(例如,在 O[Log(n)] 中插入/删除文本,而不是标准 StringBuffer/StringBuilder 中的 O[n])。
- 所有 Javolution 类都是硬实时兼容的,并且具有高度确定性的行为(在微秒范围内)。此外(与标准库不同),Javolution 是 RTSJ 安全的(与 Java 实时扩展一起使用时不会发生内存冲突或内存泄漏)。
- Javolution 的实时集合类(地图、列表、表和集合)可用于代替大多数标准集合类并提供附加功能。
- Javolution 集合提供并发保证,使并行算法的实现更容易。
The Javolution distribution includes a benchmark suite so you can see how they stack up against other libraries/the built-in collections.
Javolution 发行版包含一个基准套件,因此您可以查看它们与其他库/内置集合的对比情况。
回答by Alex Miller
Some collection libs to consider:
一些需要考虑的集合库:
- Java collections in java.util
- Trove
- Google Collectionslibrary
- Apache Commons Collections
- High-scale libfrom Cliff Click
- Doug Lea's collectionslib - no longer supported and mostly rebuilt in JDK
- java.util 中的 Java 集合
- 宝藏
- 谷歌收藏库
- Apache Commons 集合
- 来自 Cliff Click 的高级库
- Doug Lea 的集合库 - 不再支持,大部分在 JDK 中重建
I would first and foremost reach for the JDK collection library. It covers most common things you need to do and is obviously already available to you.
我首先会接触 JDK 集合库。它涵盖了您需要做的最常见的事情,并且显然已经可供您使用。
Google Collections is probably the best high-quality library outside the JDK. It's heavily used and well supported.
Google Collections 可能是 JDK 之外最好的高质量库。它被大量使用并得到很好的支持。
Apache Commons Collections is older and suffers a bit from the "too many cooks" problem but has a lot of useful stuff as well.
Apache Commons Collections 比较老,有点“厨师太多”的问题,但也有很多有用的东西。
Trove has very specialized collections for cases like primitive keys/values. These days we find that on modern JDKs and with the Java 5+ collections and concurrent use cases, the JDK collections out-perform even the specialized Trove collections.
Trove 为原始键/值等情况提供了非常专业的集合。如今,我们发现在现代 JDK 和 Java 5+ 集合和并发用例中,JDK 集合的性能甚至超过了专门的 Trove 集合。
If you have really high concurrency use cases, you should definitely check out stuff like the NonBlockingHashMap in the high-scale lib, which is a lock-free implementation and can stomp on ConcurrentHashMap if you have the right use case for it.
如果你有非常高并发的用例,你绝对应该检查像高规模库中的 NonBlockingHashMap 这样的东西,它是一个无锁实现,如果你有合适的用例,可以踩到 ConcurrentHashMap。
回答by fred-o
If you want to store millions of records in a hash table, chances are that you will run into memory problems. This happened to me when I tried creating a map with 2.3 million String objects, for example. I went with BerkeleyDB, which is very mature and performs well. They have a Java API that wraps the Collections API, so you can easily create arbitrarily large maps with very little memory footprint. Access will be slower though (as it is stored on disk).
如果您想在哈希表中存储数百万条记录,很可能会遇到内存问题。例如,当我尝试使用 230 万个 String 对象创建地图时,就发生了这种情况。我选择了BerkeleyDB,它非常成熟且性能良好。它们有一个封装了 Collections API 的 Java API,因此您可以轻松创建任意大的映射,占用的内存很少。但是访问会变慢(因为它存储在磁盘上)。
Follow-up question: is there a decent (and efficient), well maintained, library for immutable collections? Clojure has excellent support for this, and it would be nice to have something similar for Java.
后续问题:是否有一个体面(且高效)、维护良好的不可变集合库?Clojure 对此有很好的支持,如果 Java 有类似的东西会很好。
回答by the.duckman
The question is (now) about storing lots of data, which can be represented using primitive types like int
, in a Map. Some of the answers here are very misleading in my opinion. Let's see why.
问题是(现在)关于int
在 Map 中存储大量数据,这些数据可以使用基本类型(如 )来表示。在我看来,这里的一些答案非常具有误导性。让我们看看为什么。
I modified the benchmark from troveto measure both runtime and memory consumption. I also added PCJto this benchmark, which is another collections library for primitive types (I use that one extensively). The 'official' trove benchmark does not compare IntIntMaps to Java Collection's Map<Integer, Integer>
, probably storing Integers
and storing ints
is not the same from a technical point of view. But a user might not care about this technical detail, he wants to store data representable with ints
efficiently.
我修改了trove的基准测试以测量运行时和内存消耗。我还在这个基准测试中添加了PCJ,它是另一个用于原始类型的集合库(我广泛使用它)。“官方” trove 基准测试没有将 IntIntMaps 与 Java Collection 的 进行比较Map<Integer, Integer>
,从技术角度来看,存储Integers
和存储可能ints
不一样。但是用户可能不关心这个技术细节,他想有效地存储可表示的数据ints
。
First the relevant part of the code:
首先是代码的相关部分:
new Operation() {
private long usedMem() {
System.gc();
return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
}
// trove
public void ours() {
long mem = usedMem();
TIntIntHashMap ours = new TIntIntHashMap(SET_SIZE);
for ( int i = dataset.size(); i-- > 0; ) {
ours.put(i, i);
}
mem = usedMem() - mem;
System.err.println("trove " + mem + " bytes");
ours.clear();
}
public void pcj() {
long mem = usedMem();
IntKeyIntMap map = new IntKeyIntOpenHashMap(SET_SIZE);
for ( int i = dataset.size(); i-- > 0; ) {
map.put(i, i);
}
mem = usedMem() - mem;
System.err.println("pcj " + mem + " bytes");
map.clear();
}
// java collections
public void theirs() {
long mem = usedMem();
Map<Integer, Integer> map = new HashMap<Integer, Integer>(SET_SIZE);
for ( int i = dataset.size(); i-- > 0; ) {
map.put(i, i);
}
mem = usedMem() - mem;
System.err.println("java " + mem + " bytes");
map.clear();
}
I assume the data comes as primitive ints
, which seems sane. But this implies a runtime penalty for java util, because of the auto-boxing, which is not neccessary for the primitive collections frameworks.
我假设数据是原始的ints
,这看起来很正常。但这意味着对 java util 的运行时惩罚,因为自动装箱对于原始集合框架来说不是必需的。
The runtime results (without gc()
calls, of course) on WinXP, jdk1.6.0_10:
gc()
WinXP,jdk1.6.0_10 上的运行时结果(当然没有调用):
100000 put operations 100000 contains operations java collections 1938 ms 203 ms trove 234 ms 125 ms pcj 516 ms 94 ms
While this might already seem drastic, this is not the reason to use such a framework.
虽然这可能看起来很激烈,但这并不是使用这样一个框架的原因。
The reason is memory performance. The results for a Map containing 100000 int
entries:
原因是内存性能。包含 100000int
个条目的 Map 的结果:
java collections oscillates between 6644536 and 7168840 bytes trove 1853296 bytes pcj 1866112 bytes
Java Collections needs more than three timesthe memory compared to the primitive collection frameworks. I.e. you can keep three times as much data in memory, without resorting to disk IO which lowers runtime performance by magnitudes. And this matters. Read highscalabilityto find out why.
与原始集合框架相比,Java 集合需要三倍以上的内存。也就是说,您可以在内存中保留三倍的数据,而无需求助于磁盘 IO,这会大大降低运行时性能。这很重要。阅读高可扩展性以找出原因。
In my experience high memory consumption is the biggest performance issue with Java, which of course results in worse runtime performance as well. Primitive collection frameworks can really help here.
根据我的经验,高内存消耗是 Java 最大的性能问题,这当然也会导致运行时性能变差。原始集合框架在这里确实可以提供帮助。
So: No, java.util is not the answer. And "adding functionality" to Java collections is not the point when asking about efficiency. Also the modern JDK collections do not"out-perform even the specialized Trove collections".
所以:不,java.util 不是答案。在询问效率时,向 Java 集合“添加功能”并不是重点。此外,现代 JDK 集合“甚至不会超过专门的 Trove 集合”。
Disclaimer: The benchmark here is far from complete, nor is it perfect. It is meant to drive home the point, which I have experienced in many projects. Primitive collections are useful enough to tolerate fishy API - ifyou work with lots of data.
免责声明:这里的基准测试还远未完成,也不完美。它旨在说明我在许多项目中都经历过的要点。如果您处理大量数据,原始集合足以容忍可疑的 API 。
回答by akuhn
To store millions of String
in a map, take a look at http://code.google.com/p/flatmap
要String
在地图中存储数百万,请查看http://code.google.com/p/flatmap