选择 Java 集合实现的经验法则?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Rule of thumb for choosing an implementation of a Java Collection?
提问by hydeph
Anyone have a good rule of thumb for choosing between different implementations of Java Collection interfaces like List, Map, or Set?
任何人都有一个很好的经验法则来在不同的 Java 集合接口实现(如 List、Map 或 Set)之间进行选择?
For example, generally why or in what cases would I prefer to use a Vector or an ArrayList, a Hashtable or a HashMap?
例如,通常为什么或在什么情况下我更喜欢使用 Vector 或 ArrayList、Hashtable 或 HashMap?
采纳答案by Stu Thompson
I've always made those decisions on a case by case basis, depending on the use case, such as:
我总是根据用例逐案做出这些决定,例如:
- Do I need the ordering to remain?
- Will I have null key/values? Dups?
- Will it be accessed by multiple threads
- Do I need a key/value pair
- Will I need random access?
- 我需要保留订单吗?
- 我会有空键/值吗?傻瓜?
- 是否会被多个线程访问
- 我需要一个键/值对吗
- 我需要随机访问吗?
And then I break out my handy 5th edition Java in a Nutshelland compare the ~20 or so options. It has nice little tables in Chapter five to help one figure out what is appropriate.
然后我在 Nutshell 中分解了我方便的第 5 版Java并比较了大约 20 个选项。它在第五章中有漂亮的小表格来帮助人们弄清楚什么是合适的。
Ok, maybe if I know off the cuff that a simple ArrayList or HashSet will do the trick I won't look it all up. ;) but if there is anything remotely complex about my indended use, you bet I'm in the book. BTW, I though Vector is supposed to be 'old hat'--I've not used on in years.
好吧,也许如果我知道一个简单的 ArrayList 或 HashSet 可以解决问题,我就不会全部查找了。;) 但如果我的预期用途有任何复杂的地方,你打赌我在书中。顺便说一句,我虽然 Vector 应该是“老帽子”——我已经很多年没用过了。
回答by ChrLipp
I really like this cheat sheet from Sergiy Kovalchuk's blog entry:
我真的很喜欢 Sergiy Kovalchuk 的博客条目中的这份备忘单:


More detailed was Alexander Zagniotov's flowchart, but unfortunately it is offline. However, the Wayback Machine has a copy of the blog:
更详细的是 Alexander Zagniotov 的流程图,但不幸的是它处于离线状态。但是,Wayback Machine 有一份博客副本:
回答by Jonathan
I'll assume you know the difference between a List, Set and Map from the above answers. Why you would choose between their implementing classes is another thing. For example:
我假设您从上述答案中知道 List、Set 和 Map 之间的区别。为什么要在它们的实现类之间进行选择是另一回事。例如:
List:
列表:
- ArrayListis quick on retrieving, but slow on inserting. It's good for an implementation that reads a lot but doesn't insert/remove a lot. It keeps its data in one continuous block of memory, so every time it needs to expand, it copies the whole array.
- LinkedListis slow on retrieving, but quick on inserting. It's good for an implementation that inserts/removes a lot but doesn't read a lot. It doesn't keep the entire array in one continuous block of memory.
- ArrayList检索速度快,但插入速度慢。这对于读取很多但不插入/删除很多的实现很有用。它将数据保存在一个连续的内存块中,因此每次需要扩展时,它都会复制整个数组。
- LinkedList检索速度慢,但插入速度快。这对于插入/删除很多但读取不多的实现很有用。它不会将整个数组保存在一个连续的内存块中。
Set:
放:
- HashSetdoesn't guarantee the order of iteration, and therefore is fastest of the sets. It has high overhead and is slower than ArrayList, so you shouldn't use it except for a large amount of data when its hashing speed becomes a factor.
- TreeSetkeeps the data ordered, therefore is slower than HashSet.
- HashSet不保证迭代的顺序,因此是最快的集合。它的开销很高,而且比 ArrayList 慢,所以除了大量数据,当它的散列速度成为一个因素时,你不应该使用它。
- TreeSet保持数据有序,因此比 HashSet 慢。
Map:The performance and behavior of HashMap and TreeMap are parallel to the Set implementations.
Map:HashMap 和 TreeMap 的性能和行为与 Set 实现并行。
Vector and Hashtable should not be used. They are synchronized implementations, before the release of the new Collection hierarchy, thus slow. If synchronization is needed, use Collections.synchronizedCollection().
不应使用向量和哈希表。它们是同步实现,在新的 Collection 层次结构发布之前,因此速度很慢。如果需要同步,请使用 Collections.synchronizedCollection()。
回答by Jason Cohen
Theoretically there are useful Big-Ohtradeoffs, but in practice these almost never matter.
从理论上讲,存在有用的Big-Oh权衡,但在实践中这些几乎无关紧要。
In real-world benchmarks, ArrayListout-performs LinkedListeven with big lists and with operations like "lots of insertions near the front." Academics ignore the fact that real algorithms have constant factors that can overwhelm the asymptotic curve. For example, linked-lists require an additional object allocation for every node, meaning slower to create a node and vastly worse memory-access characteristics.
在现实世界的基准测试中,即使使用大列表和“在前面进行大量插入”等操作也ArrayList表现出色LinkedList。学术界忽略了这样一个事实,即真正的算法具有可以压倒渐近曲线的常数因素。例如,链表需要为每个节点分配额外的对象,这意味着创建节点的速度较慢,并且内存访问特性更差。
My rule is:
我的规则是:
- Always start with ArrayList and HashSet and HashMap (i.e. not LinkedList or TreeMap).
- Type declarations should always be an interface (i.e. List, Set, Map) so if a profiler or code review proves otherwise you can change the implementation without breaking anything.
- 总是从 ArrayList 和 HashSet 和 HashMap 开始(即不是 LinkedList 或 TreeMap)。
- 类型声明应该始终是一个接口(即 List、Set、Map),因此如果分析器或代码证明不是这样,您可以在不破坏任何内容的情况下更改实现。
回答by Zizzencs
About your first question...
关于你的第一个问题...
List, Map and Set serve different purposes. I suggest reading about the Java Collections Framework at http://java.sun.com/docs/books/tutorial/collections/interfaces/index.html.
List、Map 和 Set 有不同的用途。我建议在http://java.sun.com/docs/books/tutorial/collections/interfaces/index.html阅读有关 Java 集合框架的信息。
To be a bit more concrete:
更具体一点:
- use List if you need an array-like data structure and you need to iterate over the elements
- use Map if you need something like a dictionary
- use a Set if you only need to decide if something belongs to the set or not.
- 如果需要类似数组的数据结构并且需要迭代元素,请使用 List
- 如果您需要字典之类的东西,请使用 Map
- 如果您只需要确定某物是否属于该集合,请使用 Set。
About your second question...
关于你的第二个问题...
The main difference between Vector and ArrayList is that the former is synchronized, the latter is not synchronized. You can read more about synchronization in Java Concurrency in Practice.
Vector和ArrayList的主要区别在于前者是同步的,后者是不同步的。您可以在Java Concurrency in Practice 中阅读有关同步的更多信息。
The difference between Hashtable (note that the T is not a capital letter) and HashMap is similiar, the former is synchronized, the latter is not synchronized.
Hashtable(注意T不是大写字母)和HashMap的区别是类似的,前者是同步的,后者是不同步的。
I would say that there are no rule of thumb for preferring one implementation or another, it really depends on your needs.
我想说,没有任何经验法则可以选择一种或另一种实现,这实际上取决于您的需求。
回答by Tom Hawtin - tackline
For non-sorted the best choice, more than nine times out of ten, will be: ArrayList, HashMap, HashSet.
对于未排序的最佳选择,十有八九是:ArrayList、HashMap、HashSet。
Vector and Hashtable are synchronised and therefore might be a bit slower. It's rare that you would want synchronised implementations, and when you do their interfaces are not sufficiently rich for thier synchronisation to be useful. In the case of Map, ConcurrentMap adds extra operations to make the interface useful. ConcurrentHashMap is a good implementation of ConcurrentMap.
Vector 和 Hashtable 是同步的,因此可能会慢一点。您很少需要同步实现,并且当您这样做时,它们的接口不够丰富,无法使它们的同步有用。在 Map 的情况下,ConcurrentMap 添加了额外的操作以使接口有用。ConcurrentHashMap 是 ConcurrentMap 的一个很好的实现。
LinkedList is almost never a good idea. Even if you are doing a lot of insertions and removal, if you are using an index to indicate position then that requires iterating through the list to find the correct node. ArrayList is almost always faster.
LinkedList 几乎从来都不是一个好主意。即使您进行了大量插入和删除操作,如果您使用索引来指示位置,那么也需要遍历列表以找到正确的节点。ArrayList 几乎总是更快。
For Map and Set, the hash variants will be faster than tree/sorted. Hash algortihms tend to have O(1) performance, whereas trees will be O(log n).
对于 Map 和 Set,散列变量将比树/排序更快。散列算法的性能往往是 O(1),而树的性能则是 O(log n)。
回答by Joe Liversedge
Lists allow duplicate items, while Sets allow only one instance.
列表允许重复项,而集合只允许一个实例。
I'll use a Map whenever I'll need to perform a lookup.
每当我需要执行查找时,我都会使用 Map。
For the specific implementations, there are order-preserving variations of Maps and Sets but largely it comes down to speed. I'll tend to use ArrayList for reasonably small Lists and HashSet for reasonably small sets, but there are many implementations (including any that you write yourself). HashMap is pretty common for Maps. Anything more than 'reasonably small' and you have to start worrying about memory so that'll be way more specific algorithmically.
对于特定的实现,有 Maps 和 Sets 的保序变体,但主要归结为速度。我倾向于将 ArrayList 用于相当小的列表,将 HashSet 用于相当小的集合,但有很多实现(包括您自己编写的任何实现)。HashMap 在 Map 中很常见。任何不仅仅是“合理小”的东西,你必须开始担心内存,这样在算法上就会更加具体。
This pagehas lotsof animated images along with sample code testing LinkedList vs. ArrayList if you're interested in hard numbers.
如果您对硬数字感兴趣,此页面包含大量动画图像以及测试 LinkedList 与 ArrayList 的示例代码。
EDIT:I hope the following links demonstrate how these things are really just items in a toolbox, you just have to think about what your needs are: See Commons-Collections versions of Map, Listand Set.
编辑:我希望以下链接展示了这些东西如何真正只是工具箱中的项目,您只需要考虑您的需求:请参阅Map、List和Set 的Commons-Collections 版本。
回答by Code_Mode
As suggested in other answers, there are different scenarios to use correct collection depending on use case. I am listing few points,
正如其他答案中所建议的,根据用例,有不同的场景可以使用正确的集合。我列出几点,
ArrayList:
数组列表:
- Most cases where you just need to store or iterate through a "bunch of things" and later iterate through them. Iterating is faster as its index based.
- Whenever you create an ArrayList, a fixed amount of memory is allocated to it and once exceeeded,it copies the whole array
- 大多数情况下,您只需要存储或迭代“一堆东西”,然后再迭代它们。由于基于索引,迭代速度更快。
- 每当您创建一个 ArrayList 时,都会为其分配固定数量的内存,一旦超过,它就会复制整个数组
LinkedList:
链表:
- It uses doubly linked list so insertion and deletion operation will be fast as it will only add or remove a node.
- Retrieving is slow as it will have to iterate through the nodes.
- 它使用双向链表,因此插入和删除操作将很快,因为它只会添加或删除节点。
- 检索很慢,因为它必须遍历节点。
HashSet:
哈希集:
Making other yes-no decisions about an item, e.g. "is the item a word of English", "is the item in the database?" , "is the item in this category?" etc.
Remembering "which items you've already processed", e.g. when doing a web crawl;
对某个项目做出其他是-否决定,例如“该项目是英语单词吗”、“该项目是否在数据库中?” , "该商品在这个类别中吗?" 等等。
记住“您已经处理过哪些项目”,例如在进行网络抓取时;
HashMap:
哈希映射:
- Used in cases where you need to say "for a given X, what is the Y"? It is often useful for implementing in-memory caches or indexes i.e key value pairs For example: For a given user ID, what is their cached name/User object?.
- Always go with HashMap to perform a lookup.
- 用于您需要说“对于给定的 X,Y 是什么”的情况?它通常用于实现内存缓存或索引,即键值对。例如:对于给定的用户 ID,它们的缓存名称/用户对象是什么?。
- 始终使用 HashMap 执行查找。
Vector and Hashtable are synchronized and therefore bit slower and If synchronization is needed, use Collections.synchronizedCollection(). Check Thisfor sorted collections. Hope this hepled.
Vector 和 Hashtable 是同步的,因此有点慢,如果需要同步,请使用 Collections.synchronizedCollection()。选中此以获取已排序的集合。希望这有帮助。
回答by Johnny
Well, it depends on what you need. The general guidelines are:
嗯,这取决于你需要什么。一般准则是:
Listis a collection where data is kept in order of insertion and each element got index.
List是一个集合,其中数据按插入顺序保存,每个元素都有索引。
Setis a bag of elements without duplication (if you reinsert the same element, it won't be added). Data doesn't have the notion of order.
Set是一个没有重复的元素包(如果你重新插入相同的元素,它不会被添加)。数据没有顺序的概念。
MapYou access and write your data elements by their key, which could be any possible object.
映射您通过键访问和写入数据元素,键可以是任何可能的对象。
Attribution: https://stackoverflow.com/a/21974362/2811258
归属:https: //stackoverflow.com/a/21974362/2811258
For more information about Java Collections, check out this article.
有关 Java 集合的更多信息,请查看这篇文章。
回答by user5044
I found Bruce Eckel's Thinking in Java to be very helpful. He compares the different collections very well. I used to keep a diagram he published showing the inheritance heirachy on my cube wall as a quick reference. One thing I suggest you do is keep in mind thread safety. Performance usually means not thread safe.
我发现 Bruce Eckel 的 Thinking in Java 非常有帮助。他很好地比较了不同的收藏品。我曾经在我的立方体墙上保存了一张他发表的图表,显示了继承层次作为快速参考。我建议你做的一件事是记住线程安全。性能通常意味着不是线程安全的。

