Java HashSet 与数组性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18706870/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java HashSet vs Array Performance
提问by donnyton
I have a collection of objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are (and the number won't change), and was wondering whether Array would have a notable performance advantage over HashSet for storing/retrieving said elements.
我有一组保证不同的对象(特别是,由唯一的整数 ID 索引)。我也确切地知道其中有多少(并且数字不会改变),并且想知道 Array 在存储/检索所述元素方面是否比 HashSet 具有显着的性能优势。
On paper, Array guarantees constant time insertion (since I know the size ahead of time) and retrieval, but the code for HashSet looks much cleaner and adds some flexibility, so I'm wondering if I'm losing anything performance-wise using it, at least, theoretically.
在纸面上,Array 保证恒定时间插入(因为我提前知道大小)和检索,但 HashSet 的代码看起来更清晰并增加了一些灵活性,所以我想知道我是否在使用它时丢失了任何性能方面的东西,至少,理论上。
采纳答案by JNL
Depends on your data;
取决于您的数据;
HashSet
gives you an O(1)
contains() method but doesn't preserve order.
HashSet
给你一个O(1)
contains() 方法,但不保留顺序。
ArrayList
contains() is O(n)
but you can control the order of the entries.
ArrayList
contains() 是O(n)
但您可以控制条目的顺序。
Array
if you need to insert anything in between, worst case can be O(n), since you will have to move the data down and make room for the insertion. In Set
, you can directly use SortedSet which too has O(n) too but with flexible operations.
Array
如果您需要在两者之间插入任何内容,最坏的情况可能是 O(n),因为您必须将数据向下移动并为插入腾出空间。在 中Set
,您可以直接使用SortedSet which too has O(n) too but with flexible operations.
I believe Set is more flexible.
我相信 Set 更灵活。
回答by auhuman
For Enterprise software Scalable, Maintainable and clean Code is lot better. So I go for HashSet.
对于企业软件,可扩展、可维护和干净的代码要好得多。所以我选择HashSet。
回答by Ahmed Adel Ismail
theoretically, and as SCJP6 Study guide says :D
理论上,正如 SCJP6 学习指南所说:D
arrays are faster than collections, and as said, most of the collections depend mainly on arrays (Maps are not considered Collection, but they are included in the Collections framework)
数组比集合更快,并且如上所述,大多数集合主要依赖于数组(映射不被视为集合,但它们包含在集合框架中)
if you guarantee that the size of your elements wont change, why get stuck in Objects built on Objects (Collections built on Arrays) while you can use the root objects directly (arrays)
如果你保证你的元素的大小不会改变,为什么要陷入基于对象的对象(基于数组的集合)而你可以直接使用根对象(数组)
回答by anguyen
It looks like you will want an HashMap that maps id's to counts. Particularly,
看起来您需要一个将 id 映射到计数的 HashMap。特别,
HashMap<Integer,Integer> counts=new HashMap<Integer,Integer>();
counts.put(uniqueID,counts.get(uniqueID)+1);
This way, you get amortized O(1) adds, contains and retrievals. Essentially, an array with unique id's associated with each object IS a HashMap. By using the HashMap, you get the added bonus of not having to manage the size of the array, not having to map the keys to an array index yourself AND constant access time.
这样,您就可以摊销 O(1) 添加、包含和检索。本质上,具有与每个对象关联的唯一 id 的数组是一个 HashMap。通过使用 HashMap,您可以获得额外的好处,即不必管理数组的大小,不必自己将键映射到数组索引以及常量访问时间。
回答by Adrian Shum
The choice greatly depends on what do you want to do with it.
选择很大程度上取决于你想用它做什么。
If it is what mentioned in your question:
如果是你的问题中提到的:
I have a collectionof objects that are guaranteed to be distinct (in particular, indexed by a unique integer ID). I also know exactly how many of them there are
我有一组保证不同的对象(特别是,由唯一的整数 ID 索引)。我也确切地知道有多少
If this is what you need to do, the you need neither of them. There is a size() method in Collectionfor which you can get the size of it, which mean how many of them there arein the collection.
如果这是你需要做的,你不需要它们。Collection 中有一个 size() 方法,您可以获得它的大小,这意味着集合中有多少个。
If what you mean for "collection of object" is not really a collection, and you need to choose a type of collection to store your objects for further processing, then you need to know, for different kind of collections, there are different capabilities and characteristic.
如果您所说的“对象集合”并不是真正的集合,并且您需要选择一种集合类型来存储您的对象以供进一步处理,那么您需要知道,对于不同类型的集合,有不同的功能和特征。
First, I believe to have a fair comparison, you should consider using ArrayList instead Array, for which you don't need to deal with the reallocation.
首先,我相信有一个公平的比较,您应该考虑使用ArrayList而不是Array,您不需要处理重新分配。
Then it become the choice of ArrayList vs HashSet, which is quite straight-forward:
然后它成为 ArrayList 与 HashSet 的选择,这是非常直接的:
Do you need a List or Set? They are for different purpose: Lists provide you indexed access, and iteration is in order of index. While Sets are mainly for you to keep a distinct set of data, and given its nature, you won't have indexed access.
你需要列表还是集合?它们用于不同的目的:列表为您提供索引访问,迭代是按索引顺序进行的。虽然 Sets 主要是让您保留一组不同的数据,但鉴于其性质,您将无法进行索引访问。
After you made your decision of List or Set to use, then it is a choice of List/Set implementation, normally for Lists, you choose from ArrayList and LinkedList, while for Sets, you choose between HashSet and TreeSet.
在您决定使用 List 还是 Set 之后,接下来就是 List/Set 实现的选择,通常对于 List,您可以从 ArrayList 和 LinkedList 中选择,而对于 Set,您可以在 HashSet 和 TreeSet 之间进行选择。
All the choice depends on what you would want to do with that collection of data. They performs differently on different action.
所有的选择都取决于您想用该数据集合做什么。他们在不同的动作上表现不同。
For example, an indexed access in ArrayList is O(1), in HashSet (though not meaningful) is O(n), (just for your interest, in LinkedList is O(n), in TreeSet is O(nlogn) )
例如,ArrayList 中的索引访问是 O(1),HashSet 中(虽然没有意义)是 O(n),(只是为了您的兴趣,LinkedList 中是 O(n),TreeSet 中是 O(nlogn) )
For adding new element, both ArrayList and HashSet is O(1) operation. Inserting in the middle is O(n) for ArrayList, while it doesn't make sense in HashSet. Both will suffer from reallocation, and both of them need O(n) for the reallocation (HashSet is normally slower in reallocation, because it involve calculation of hash for each element again).
对于添加新元素,ArrayList 和 HashSet 都是 O(1) 操作。中间插入对于ArrayList来说是O(n),而在HashSet中没有意义。两者都会受到重新分配的影响,并且它们都需要 O(n) 进行重新分配(HashSet 在重新分配时通常较慢,因为它再次涉及每个元素的哈希计算)。
To find if certain element exists in the collection, ArrayList is O(n) and HashSet is O(1).
要查找集合中是否存在某个元素,ArrayList 是 O(n),HashSet 是 O(1)。
There are still lots of operations you can do, so it is quite meaningless to discuss for performance without knowing what you want to do.
你可以做的操作还有很多,所以在不知道你想做什么的情况下讨论性能是很没有意义的。