Java HashSet vs ArrayList 包含性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32552307/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 12:45:07  来源:igfitidea点击:

HashSet vs ArrayList contains performance

javaarraylisthashset

提问by Jorge

When processing large amounts of data I often find myself doing the following:

在处理大量数据时,我经常发现自己在做以下事情:

HashSet<String> set = new HashSet<String> ();
//Adding elements to the set
ArrayList<String> list = new ArrayList<String> (set);

Something like "dumping" the contents of the set in the list. I usually do this since the elements I add often contain duplicates I want to remove, and this seems like an easy way to remove them.

类似于“转储”列表中集合的内容。我通常这样做是因为我添加的元素通常包含我想要删除的重复项,这似乎是删除它们的简单方法。

With only that objective in mind (avoiding duplicates) I could also write:

考虑到这个目标(避免重复),我还可以写:

ArrayList<String> list = new ArrayList<String> ();
// Processing here
if (! list.contains(element)) list.add(element);
//More processing here

And thus no need for "dumping" the set into the list. However, I'd be doing a small check before inserting each element (which I'm assuming HashSet does as well)

因此不需要将集合“转储”到列表中。但是,在插入每个元素之前我会做一个小检查(我假设 HashSet 也这样做)

Is any of the two possibilities clearly more efficient?

这两种可能性中的任何一种显然更有效吗?

采纳答案by Dici

The set will give much better performance (O(n)vs O(n^2)for the list), and that's normal because set membership (the containsoperation) is the very purposeof a set.

集合将提供更好的性能(O(n)O(n^2)列表相比),这是正常的,因为集合成员资格(contains操作)是集合的真正目的

Contains for a HashSetis O(1)compared to O(n)for a list, therefore you should never use a list if you often need to run contains.

对于包含HashSetO(1)比较O(n)的清单,因此,你不应该使用一个列表,如果你经常需要运行contains

回答by YoungHobbit

The ArrayListuses an array for storing the data. The ArrayList.containswill be of O(n) complexity. So essentially searching in array again and again will have O(n^2)complexity.

ArrayList使用用于存储数据的数组。该 ArrayList.contains会是O(n)的复杂性。所以本质上一次又一次地在数组中搜索将具有O(n^2)复杂性。

While HashSetuses hashing mechanism for storing the elements into their respective buckets. The operation of HashSetwill be faster for long list of values. It will reach the element in O(1).

虽然HashSet使用散列机制将元素存储到各自的桶中。HashSet对于很长的值列表,的操作会更快。它将到达 中的元素O(1)

回答by Peter Lawrey

If you don't need a list, I would just use a Set and this is the natural collection to use if order doesn't matter and you want to ignore duplicates.

如果您不需要列表,我将只使用 Set,如果顺序无关紧要并且您想忽略重复项,则这是使用的自然集合。

You can do both is you need a List without duplicates.

如果你需要一个没有重复的列表,你可以做到这两点。

private Set<String> set = new HashSet<>();
private List<String> list = new ArrayList<>();


public void add(String str) {
    if (set.add(str))
        list.add(str);
}

This way the list will only contain unique values, the original insertion order is preserved and the operation is O(1).

这样列表将只包含唯一值,原始插入顺序被保留并且操作是 O(1)。

回答by Prateek Paranjpe

You could add elements to the list itself. Then, to dedup -

您可以向列表本身添加元素。然后,重复数据删除 -

HashSet<String> hs = new HashSet<>(); // new hashset
hs.addAll(list); // add all list elements to hashset (this is the dedup, since addAll works as a union, thus removing all duplicates)
list.clear(); // clear the list
list.addAll(hs); // add all hashset elements to the list

If you just need a set with dedup, you can also use the addAll() on a different set, so that it will only have unique values.

如果您只需要一个带有 dedup 的集合,您还可以在不同的集合上使用 addAll(),这样它就只有唯一的值。

回答by urs86ro

I've made a test so please check the result:

我做了一个测试,所以请检查结果:

For SAME STRING items in a HashSet, TreeSet, ArrayList and LinkedList, here are the results for

对于 HashSet、TreeSet、ArrayList 和 LinkedList 中的 SAME STRING 项,以下是结果

  1. 50.000 UUIDs
    • SEARCHED ITEM : e608c7d5-c861-4603-9134-8c636a05a42b (index 25.000)
    • hashSet.contains(item) ? TRUE 0 ms
    • treeSet.contains(item) ? TRUE 0 ms
    • arrayList.contains(item) ? TRUE 2 ms
    • linkedList.contains(item) ? TRUE 3 ms
  2. 5.000.000 UUIDs
    • SEARCHED ITEM : 61fb2592-3186-4256-a084-6c96f9322a86 (index 25.000)
    • hashSet.contains(item) ? TRUE 0 ms
    • treeSet.contains(item) ? TRUE 0 ms
    • arrayList.contains(item) ? TRUE 1 ms
    • linkedList.contains(item) ? TRUE 2 ms
  3. 5.000.000 UUIDs
    • SEARCHED ITEM : db568900-c874-46ba-9b44-0e1916420120 (index 2.500.000)
    • hashSet.contains(item) ? TRUE 0 ms
    • treeSet.contains(item) ? TRUE 0 ms
    • arrayList.contains(item) ? TRUE 33 ms
    • linkedList.contains(item) ? TRUE 65 ms
  1. 50.000 个 UUID
    • 搜索项目:e608c7d5-c861-4603-9134-8c636a05a42b(索引 25.000)
    • hashSet.contains(item) ?真 0 毫秒
    • treeSet.contains(item) ?真 0 毫秒
    • arrayList.contains(item) ?真 2 毫秒
    • linkedList.contains(item) ?真 3 毫秒
  2. 5.000.000 个 UUID
    • 搜索项目:61fb2592-3186-4256-a084-6c96f9322a86(索引25.000)
    • hashSet.contains(item) ?真 0 毫秒
    • treeSet.contains(item) ?真 0 毫秒
    • arrayList.contains(item) ?真 1 毫秒
    • linkedList.contains(item) ?真 2 毫秒
  3. 5.000.000 个 UUID
    • 搜索项目:db568900-c874-46ba-9b44-0e1916420120(索引 2.500.000)
    • hashSet.contains(item) ?真 0 毫秒
    • treeSet.contains(item) ?真 0 毫秒
    • arrayList.contains(item) ?真 33 毫秒
    • linkedList.contains(item) ?真 65 毫秒

Based on above results, there is NOT a BIG difference of using array list vs set. Perhaps you can try to modify this code and replace the Stringwith your Objectand see the differences then...

基于以上结果,使用数组列表和集合没有太大区别。也许您可以尝试修改此代码并将字符串替换为您的对象,然后查看差异...

    public static void main(String[] args) {
        Set<String> hashSet = new HashSet<>();
        Set<String> treeSet = new TreeSet<>();
        List<String> arrayList = new ArrayList<>();
        List<String> linkedList = new LinkedList<>();

        List<String> base = new ArrayList<>();

        for(int i = 0; i<5000000; i++){
            if(i%100000==0) System.out.print(".");
            base.add(UUID.randomUUID().toString());
        }

        System.out.println("\nBase size : " + base.size());
        String item = base.get(25000);
        System.out.println("SEARCHED ITEM : " + item);

        hashSet.addAll(base);
        treeSet.addAll(base);
        arrayList.addAll(base);
        linkedList.addAll(base);

        long ms = System.currentTimeMillis();
        System.out.println("hashSet.contains(item) ? " + (hashSet.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
        System.out.println("treeSet.contains(item) ? " + (treeSet.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
        System.out.println("arrayList.contains(item) ? " + (arrayList.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
        System.out.println("linkedList.contains(item) ? " + (linkedList.contains(item)? "TRUE " : "FALSE") + (System.currentTimeMillis()-ms) + " ms");
    }