Java HashSet vs TreeSet vs LinkedHashSet 在添加重复值的基础上

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20116660/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 23:14:52  来源:igfitidea点击:

HashSet vs TreeSet vs LinkedHashSet on basis of adding duplicate value

javacollectionsset

提问by Prashant Shilimkar

I am learning heart of core java i.e. Collections. I would like to know what happens internally when we add duplicate element in HashSet, TreeSet, LinkedHashSet.

我正在学习核心 java ie 的核心Collections。我想知道当我们在HashSet, TreeSet, 中添加重复元素时内部会发生什么LinkedHashSet

Whether entry is replaced, ignored or exception is thrown and program terminates. And one sub question is, Which one has same or average time complexity for all its operations

条目是否被替换、忽略或抛出异常并终止程序。一个子问题是,哪一个对所有操作具有相同或平均的时间复杂度

Your response will be greatly appreciated.

您的回复将不胜感激。

采纳答案by constantlearner

TreeSet, LinkedHashSet and HashSet in Java are three Set implementation in collection framework and like many others they are also used to store objects. Main feature of TreeSet is sorting, LinkedHashSet is insertion order and HashSet is just general purpose collection for storing object. HashSet is implemented using HashMap in Java while TreeSet is implemented using TreeMap. TreeSet is a SortedSet implementation which allows it to keep elements in the sorted order defined by either Comparable or Comparator interface. Comparable is used for natural order sorting and Comparator for custom order sorting of objects, which can be provided while creating instance of TreeSet. Anyway before seeing difference between TreeSet, LinkedHashSet and HashSet, let's see some similarities between them:

Java 中的 TreeSet、LinkedHashSet 和 HashSet 是集合框架中的三个 Set 实现,与许多其他实现一样,它们也用于存储对象。TreeSet 的主要特点是排序,LinkedHashSet 是插入顺序,HashSet 只是用于存储对象的通用集合。Java 中HashSet 是使用HashMap 实现的,而TreeSet 是使用TreeMap 实现的。TreeSet 是一个 SortedSet 实现,它允许它按照 Comparable 或 Comparator 接口定义的排序顺序保留元素。Comparable 用于自然顺序排序,Comparator 用于自定义对象顺序排序,可在创建 TreeSet 实例时提供。无论如何,在看到 TreeSet、LinkedHashSet 和 HashSet 之间的区别之前,让我们看看它们之间的一些相似之处:

1) Duplicates : All three implements Set interface means they are not allowed to store duplicates.

1) Duplicates : 这三个都实现了 Set 接口,这意味着它们不允许存储重复项。

2) Thread safety : HashSet, TreeSet and LinkedHashSet are not thread-safe, if you use them in multi-threading environment where at least one Thread modifies Set you need to externally synchronize them.

2) 线程安全:HashSet、TreeSet 和 LinkedHashSet 不是线程安全的,如果在至少一个线程修改 Set 的多线程环境中使用它们,则需要外部同步它们。

3) Fail-Fast Iterator : Iterator returned by TreeSet, LinkedHashSet and HashSet are fail-fast Iterator. i.e. If Iterator is modified after its creation by any way other than Iterators remove() method, it will throw ConcurrentModificationException with best of effort. read more about fail-fast vs fail-safe Iterator here

3)Fail-Fast Iterator:TreeSet、LinkedHashSet、HashSet返回的迭代器是Fail-fast Iterator。即,如果 Iterator 在其创建后被 Iterators remove() 方法以外的任何方式修改,它将尽最大努力抛出 ConcurrentModificationException。在此处阅读有关快速故障与故障安全迭代器的更多信息

Now let's see difference between HashSet, LinkedHashSet and TreeSet in Java :

现在让我们看看 Java 中 HashSet、LinkedHashSet 和 TreeSet 之间的区别:

Performance and Speed : First difference between them comes in terms of speed. HashSet is fastest, LinkedHashSet is second on performance or almost similar to HashSet but TreeSet is bit slower because of sorting operation it needs to perform on each insertion. TreeSet provides guaranteed O(log(n)) time for common operations like add, remove and contains, while HashSet and LinkedHashSet offer constant time performance e.g. O(1) for add, contains and remove given hash function uniformly distribute elements in bucket.

性能和速度:它们之间的第一个区别在于速度。HashSet 是最快的,LinkedHashSet 在性能上次之,或者几乎与 HashSet 相似,但是 TreeSet 有点慢,因为它需要在每次插入时执行排序操作。TreeSet 为添加、删除和包含等常见操作提供保证的 O(log(n)) 时间,而 HashSet 和 LinkedHashSet 提供恒定时间性能,例如 O(1) 用于添加、包含和删除给定的哈希函数将元素均匀分布在桶中。

Ordering : HashSet does not maintain any order while LinkedHashSet maintains insertion order of elements much like List interface and TreeSet maintains sorting order or elements.

排序:HashSet 不维护任何顺序,而 LinkedHashSet 维护元素的插入顺序,很像 List 接口,而 TreeSet 维护排序顺序或元素。

Internal Implementation : HashSet is backed by an HashMap instance, LinkedHashSet is implemented using HashSet and LinkedList while TreeSet is backed up by NavigableMap in Java and by default it uses TreeMap.

内部实现:HashSet 由 HashMap 实例支持,LinkedHashSet 使用 HashSet 和 LinkedList 实现,而 TreeSet 由 Java 中的 NavigableMap 支持,默认情况下使用 TreeMap。

null : Both HashSet and LinkedHashSet allows null but TreeSet doesn't allow null and throw java.lang.NullPointerException when you will insert null into TreeSet. Since TreeSet uses compareTo() method of respective elements to compare them which throws NullPointerException while comparing with null, here is an example:

null :HashSet 和 LinkedHashSet 都允许 null 但 TreeSet 不允许 null 并在您将 null 插入 TreeSet 时抛出 java.lang.NullPointerException 。由于 TreeSet 使用了各个元素的 compareTo() 方法来比较它们,在与 null 比较时抛出 NullPointerException,这里是一个例子:

TreeSet cities
Exception in thread "main" java.lang.NullPointerException
        at java.lang.String.compareTo(String.java:1167)
        at java.lang.String.compareTo(String.java:92)
        at java.util.TreeMap.put(TreeMap.java:545)
        at java.util.TreeSet.add(TreeSet.java:238)

Comparison : HashSet and LinkedHashSet uses equals() method in Java for comparison but TreeSet uses compareTo() method for maintaining ordering. That's why compareTo() should be consistent to equals in Java. failing to do so break general contact of Set interface i.e. it can permit duplicates.

比较:HashSet 和 LinkedHashSet 使用 Java 中的 equals() 方法进行比较,但 TreeSet 使用 compareTo() 方法来维护排序。这就是为什么 compareTo() 应该与 Java 中的 equals 一致。不这样做会破坏 Set 接口的一般联系,即它可以允许重复。

Use can use below link to see internal implementation http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashSet.java#HashSet.add%28java.lang.Object%29

使用可以使用下面的链接查看内部实现 http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/HashSet.java#HashSet.add%28java .lang.Object%29

From the source code 
Hashset hases Hashmap to store the data and LinkedHashSet extends Hashset and hence uses same add method of Hashset But TreeSet uses NavigableMap to store the data

Source: http://javarevisited.blogspot.com/2012/11/difference-between-treeset-hashset-vs-linkedhashset-java.html#ixzz2lGo6Y9mm

来源:http: //javarevisited.blogspot.com/2012/11/difference-between-treeset-hashset-vs-linkedhashset-java.html#ixzz2lGo6Y9mm

回答by user2485429

回答by Ben B

tldr: Repeat values are ignored by these collections.

tldr:这些集合会忽略重复值。

I haven't seen a complete answer to the bolded part of the question, what EXACTLY happens to duplicates? Does it overwrite the old object or ignore the new one? Consider this example of an object where one field determines equality but there is extra data that could vary:

我还没有看到问题粗体部分的完整答案,重复项究竟会发生什么?它会覆盖旧对象还是忽略新对象?考虑一个对象的示例,其中一个字段确定相等,但有额外的数据可能会有所不同:

public class MyData implements Comparable {
    public final Integer valueDeterminingEquality;
    public final String extraData;

    public MyData(Integer valueDeterminingEquality, String extraData) {
        this.valueDeterminingEquality = valueDeterminingEquality;
        this.extraData = extraData;
    }

    @Override
    public boolean equals(Object o) {
        return valueDeterminingEquality.equals(((MyData) o).valueDeterminingEquality);
    }

    @Override
    public int hashCode() {
        return valueDeterminingEquality.hashCode();
    }

    @Override
    public int compareTo(Object o) {
        return valueDeterminingEquality.compareTo(((MyData)o).valueDeterminingEquality);
    }
}

This unit test shows that duplicate values are ignored by all 3 collections:

此单元测试显示所有 3 个集合都忽略了重复值:

import org.junit.Test;
import org.junit.runner.RunWith;
import org.junit.runners.Parameterized;

import java.util.*;

import static org.hamcrest.CoreMatchers.is;
import static org.hamcrest.MatcherAssert.assertThat;

@RunWith(Parameterized.class)
public class SetRepeatedItemTest {
    private final Set<MyData> testSet;

    public SetRepeatedItemTest(Set<MyData> testSet) {
        this.testSet = testSet;
    }

    @Parameterized.Parameters
    public static Collection<Object[]> data() {
        return Arrays.asList(new Object[][] {
                { new TreeSet() }, { new HashSet() }, { new LinkedHashSet()}
        });
    }

    @Test
    public void testTreeSet() throws Exception {
        testSet.add(new MyData(1, "object1"));
        testSet.add(new MyData(1, "object2"));
        assertThat(testSet.size(), is(1));
        assertThat(testSet.iterator().next().extraData, is("object1"));
    }
}

I also looked into the implementation of TreeSet, which we know uses TreeMap... In TreeSet.java:

我还研究了 TreeSet 的实现,我们知道它使用 TreeMap... 在 TreeSet.java 中:

public boolean add(E var1) {
    return this.m.put(var1, PRESENT) == null;
}

Instead of showing TreeMap's entire put method, here's the relevant search loop:

这里没有显示 TreeMap 的整个 put 方法,而是相关的搜索循环:

parent = t;
cmp = k.compareTo(t.key);
if (cmp < 0)
        t = t.left;
else if (cmp > 0)
        t = t.right;
else
    return t.setValue(value);
} while (t != null);

so if cmp == 0, ie we've found a duplicate entry, we return early instead of adding a child at the end of the loop. The call to setValue doesn't actually do anything, because TreeSet is using dummy data for the value here, the important thing is that the key doesn't change. If you look into HashMap you'll see the same behavior.

所以如果 cmp == 0,即我们发现了重复的条目,我们会提前返回而不是在循环的末尾添加一个孩子。对 setValue 的调用实际上并没有做任何事情,因为 TreeSet 在这里使用了虚拟数据作为值,重要的是键不会改变。如果您查看 HashMap,您会看到相同的行为。

回答by Rich

I haven't found much hard data on the differences, so I ran a benchmark for the 3 cases.

我没有找到太多关于差异的硬数据,所以我对 3 个案例进行了基准测试。

It appears that HashSet is about 4 times faster than TreeSet when adding (under certain circumstances, this will probably vary according to the exact characteristics of your data etc.).

添加时 HashSet 似乎比 TreeSet 快约 4 倍(在某些情况下,这可能会根据您的数据的确切特征等而有所不同)。

# Run complete. Total time: 00:22:47

Benchmark                                                     Mode  Cnt  Score   Error  Units
DeduplicationWithSetsBenchmark.deduplicateWithHashSet        thrpt  200  7.734 ? 0.133  ops/s
DeduplicationWithSetsBenchmark.deduplicateWithLinkedHashSet  thrpt  200  7.100 ? 0.171  ops/s
DeduplicationWithSetsBenchmark.deduplicateWithTreeSet        thrpt  200  1.983 ? 0.032  ops/s

Here is the benchmark code:

这是基准代码:

package my.app;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

import java.util.Comparator;
import java.util.HashSet;
import java.util.LinkedHashSet;
import java.util.Random;
import java.util.Set;
import java.util.TreeSet;

public class DeduplicationWithSetsBenchmark {

    static Item[] inputData = makeInputData();

    @Benchmark
    public int deduplicateWithHashSet() {
        return deduplicate(new HashSet<>());
    }

    @Benchmark
    public int deduplicateWithLinkedHashSet() {
        return deduplicate(new LinkedHashSet<>());
    }

    @Benchmark
    public int deduplicateWithTreeSet() {
        return deduplicate(new TreeSet<>(Item.comparator()));
    }

    private int deduplicate(Set<Item> set) {
        for (Item i : inputData) {
            set.add(i);
        }
        return set.size();
    }

    public static void main(String[] args) throws RunnerException {

        // Verify that all 3 methods give the same answers:
        DeduplicationWithSetsBenchmark x = new DeduplicationWithSetsBenchmark();
        int count = x.deduplicateWithHashSet();
        assert(count < inputData.length);
        assert(count == x.deduplicateWithLinkedHashSet());
        assert(count == x.deduplicateWithTreeSet());


        Options opt = new OptionsBuilder()
            .include(DeduplicationWithSetsBenchmark.class.getSimpleName())
            .forks(1)
            .build();

        new Runner(opt).run();
    }

    private static Item[] makeInputData() {
        int count = 1000000;
        Item[] acc = new Item[count];
        Random rnd = new Random();

        for (int i=0; i<count; i++) {
            Item item = new Item();
            // We are looking to include a few collisions, so restrict the space of the values
            item.name = "the item name " + rnd.nextInt(100);
            item.id = rnd.nextInt(100);
            acc[i] = item;
        }
        return acc;
    }

    private static class Item {
        public String name;
        public int id;

        public String getName() {
            return name;
        }

        public int getId() {
            return id;
        }

        @Override
        public boolean equals(Object obj) {
            Item other = (Item) obj;

            return name.equals(other.name) && id == other.id;
        }

        @Override
        public int hashCode() {
            return name.hashCode() * 13 + id;
        }

        static Comparator<Item> comparator() {
            return Comparator.comparing(Item::getName, Comparator.naturalOrder())
                .thenComparing(Item::getId, Comparator.naturalOrder());
        }
    }
}