Java 哈希集与树集

Question

提问by heymatthew

I've always loved trees, that nice O(n*log(n))and the tidiness of them. However, every software engineer I've ever known has asked me pointedly why I would use a TreeSet. From a CS background, I don't think it matters all that much which you use, and I don't care to mess around with hash functions and buckets (in the case of Java).

我一直很喜欢树木，它们的美丽O(n*log(n))和整洁。然而，我认识的每一位软件工程师都尖锐地问我为什么要使用TreeSet. 从 CS 背景来看，我认为你使用什么并不重要，我也不关心散列函数和存储桶（在的情况下Java）。

In which cases should I use a HashSetover a TreeSet?

在哪些情况下我应该使用 a HashSetover a TreeSet？

Answer 1

采纳答案by sactiw

HashSet is much faster than TreeSet (constant-time versus log-time for most operations like add, remove and contains) but offers no ordering guarantees like TreeSet.

HashSet 比 TreeSet 快得多（对于大多数操作，如添加、删除和包含，常量时间与日志时间）但不提供像 TreeSet 那样的排序保证。

HashSet

哈希集

the class offers constant time performance for the basic operations (add, remove, contains and size).
it does not guarantee that the order of elements will remain constant over time
iteration performance depends on the initial capacityand the load factorof the HashSet.
- It's quite safe to accept default load factor but you may want to specify an initial capacity that's about twice the size to which you expect the set to grow.

该类为基本操作（添加、删除、包含和大小）提供恒定的时间性能。
它不保证元素的顺序会随着时间的推移保持不变
迭代性能取决于HashSet的初始容量和负载因子。
- 接受默认负载因子是非常安全的，但您可能希望指定一个初始容量，该容量大约是您期望集增长到的大小的两倍。

TreeSet

树集

guarantees log(n) time cost for the basic operations (add, remove and contains)
guarantees that elements of set will be sorted (ascending, natural, or the one specified by you via its constructor) (implements SortedSet)
doesn't offer any tuning parameters for iteration performance
offers a few handy methods to deal with the ordered set like first(), last(), headSet(), and tailSet()etc

保证基本操作（添加、删除和包含）的 log(n) 时间成本
保证 set 的元素将被排序（升序、自然或您通过其构造函数指定的元素）（实现SortedSet）
不为迭代性能提供任何调整参数
提供了一些方便的方法来处理的有序集合一样first()，last()，headSet()，和tailSet()等

Important points:

要点：

Both guarantee duplicate-free collection of elements
It is generally faster to add elements to the HashSet and then convert the collection to a TreeSet for a duplicate-free sorted traversal.
None of these implementations are synchronized. That is if multiple threads access a set concurrently, and at least one of the threads modifies the set, it must be synchronized externally.
LinkedHashSetis in some sense intermediate between HashSetand TreeSet. Implemented as a hash table with a linked list running through it, however,it provides insertion-ordered iteration which is not same as sorted traversal guaranteed by TreeSet.

两者都保证元素的无重复集合
通常将元素添加到 HashSet 然后将集合转换为 TreeSet 以进行无重复排序遍历会更快。
这些实现都不是同步的。也就是说，如果多个线程同时访问一个集合，并且至少有一个线程修改了该集合，则必须在外部进行同步。
LinkedHashSet在某种意义上介于HashSet和之间TreeSet。实现为一个带有链表的哈希表，但是，它提供了插入顺序迭代，这与 TreeSet 保证的排序遍历不同。

So a choice of usage depends entirely on your needs but I feel that even if you need an ordered collection then you should still prefer HashSet to create the Set and then convert it into TreeSet.

因此，用法的选择完全取决于您的需要，但我觉得即使您需要一个有序集合，您仍然应该更喜欢 HashSet 来创建 Set 然后将其转换为 TreeSet。

e.g. SortedSet<String> s = new TreeSet<String>(hashSet);

例如 SortedSet<String> s = new TreeSet<String>(hashSet);

Answer 2

回答by duffymo

HashSetis O(1) to access elements, so it certainly does matter. But maintaining order of the objects in the set isn't possible.

HashSet是 O(1) 来访问元素，所以它当然很重要。但是不可能保持集合中对象的顺序。

TreeSetis useful if maintaining an order(In terms of values and not the insertion order) matters to you. But, as you've noted, you're trading order for slower time to access an element: O(log n) for basic operations.

TreeSet如果维护订单（根据值而不是插入顺序）对您很重要，则很有用。但是，正如您所指出的，您正在交易订单以获得更慢的访问元素的时间：基本操作的 O(log n)。

From the javadocs for TreeSet:

来自javadocs 的TreeSet：

This implementation provides guaranteed log(n) time cost for the basic operations (add, removeand contains).

此实现为基本操作（add，remove和contains）提供保证的 log(n) 时间成本。

Answer 3

回答by JasonTrue

If you aren't inserting enough elements to result in frequent rehashings (or collisions, if your HashSet can't resize), a HashSet certainly gives you the benefit of constant time access. But on sets with lots of growth or shrinkage, you may actually get better performance with Treesets, depending on the implementation.

如果您没有插入足够多的元素来导致频繁的重新散列（或冲突，如果您的 HashSet 无法调整大小），那么 HashSet 肯定会给您带来恒定时间访问的好处。但是在有大量增长或收缩的集合上，根据实现，您实际上可能会使用 Treesets 获得更好的性能。

Amortized time can be close to O(1) with a functional red-black tree, if memory serves me. Okasaki's book would have a better explanation than I can pull off. (Or see his publication list)

如果没有记错的话，使用功能性红黑树的摊销时间可以接近 O(1)。Okasaki 的书会比我能得到更好的解释。（或见他的出版清单）

Answer 4

回答by Joseph Weissman

HashSet implementations are, of course, much much faster -- less overhead because there's no ordering. A good analysis of the various Set implementations in Java is provided at http://java.sun.com/docs/books/tutorial/collections/implementations/set.html.

HashSet 的实现当然要快得多——因为没有排序，所以开销更少。http://java.sun.com/docs/books/tutorial/collections/implementations/set.html提供了对 Java 中各种 Set 实现的良好分析。

The discussion there also points out an interesting 'middle ground' approach to the Tree vs Hash question. Java provides a LinkedHashSet, which is a HashSet with an "insertion-oriented" linked list running through it, that is, the last element in the linked list is also the most recently inserted into the Hash. This allows you to avoid the unruliness of an unordered hash without incurring the increased cost of a TreeSet.

那里的讨论还指出了一种有趣的“中间立场”方法来解决树与哈希问题。Java提供了一个LinkedHashSet，它是一个“面向插入”的链表贯穿其中的HashSet，即链表的最后一个元素也是最近插入到Hash中的元素。这允许您避免无序散列的不规则性，而不会增加 TreeSet 的成本。

Answer 5

回答by Kathy Van Stone

The reason why most use HashSetis that the operations are (on average) O(1) instead of O(log n). If the set contains standard items you will not be "messing around with hash functions" as that has been done for you. If the set contains custom classes, you have to implement hashCodeto use HashSet(although Effective Java shows how), but if you use a TreeSetyou have to make it Comparableor supply a Comparator. This can be a problem if the class does not have a particular order.

大多数使用的原因HashSet是操作（平均）是 O(1) 而不是 O(log n)。如果该集合包含标准项目，您将不会像已经为您完成的那样“弄乱哈希函数”。如果该集合包含自定义类，则必须实现hashCode才能使用HashSet（尽管 Effective Java 显示了如何使用），但如果使用 a TreeSet，则必须创建它Comparable或提供Comparator. 如果类没有特定顺序，这可能是一个问题。

I have sometimes used TreeSet(or actually TreeMap) for very small sets/maps (< 10 items) although I have not checked to see if there is any real gain in doing so. For large sets the difference can be considerable.

我有时使用TreeSet（或实际上TreeMap）用于非常小的集合/地图（< 10 个项目），尽管我没有检查这样做是否有任何实际收益。对于大型集合，差异可能相当大。

Now if you need the sorted, then TreeSetis appropriate, although even then if updates are frequent and the need for a sorted result is infrequent, sometimes copying the contents to a list or an array and sorting them can be faster.

现在，如果您需要排序，那么TreeSet是合适的，尽管即使更新频繁并且对排序结果的需求很少，有时将内容复制到列表或数组并对其进行排序可能会更快。

Answer 6

回答by Nicholas Jordan

Message Edit ( complete rewrite) When order does not matter, that's when. Both should give Log(n) - it would be of utility to see if either is over five percent faster than the other. HashSet can give O(1) testing in a loop should reveal whether it is.

消息编辑（完全重写）当顺序无关紧要时。两者都应该给出 Log(n) - 看看其中一个是否比另一个快 5% 以上将是有用的。HashSet 可以在循环中进行 O(1) 测试，以显示它是否是。

Answer 7

回答by subhash laghate

The TreeSetis one of two sorted collections (the other being TreeMap). It uses a Red-Black tree structure (but you knew that), and guarantees that the elements will be in ascending order, according to natural order. Optionally, you can construct a TreeSet with a constructor that lets you give the collection your own rules for what the order should be (rather than relying on the ordering defined by the elements' class) by using a Comparable or Comparator

该TreeSet中是两个排序集合（另一个是TreeMap中）之一。它使用红黑树结构（但您知道这一点），并保证元素按照自然顺序按升序排列。或者，您可以使用构造函数构造一个 TreeSet，该构造函数允许您使用 Comparable 或 Comparator 为集合提供自己的顺序规则（而不是依赖于元素的类定义的顺序）

and A LinkedHashSetis an ordered version of HashSet that maintains a doubly-linked List across all elements. Use this class instead of HashSet when you care about the iteration order. When you iterate through a HashSet the order is unpredictable, while a LinkedHashSet lets you iterate through the elements in the order in which they were inserted

和A LinkedHashSet是维护所有元素双链表的HashSet的有序版本。当您关心迭代顺序时，请使用此类而不是 HashSet。当您遍历 HashSet 时，顺序是不可预测的，而 LinkedHashSet 允许您按照元素插入的顺序遍历元素

Answer 8

回答by Carl Andersen

One advantage not yet mentioned of a TreeSetis that its has greater "locality", which is shorthand for saying (1) if two entries are nearby in the order, a TreeSetplaces them near each other in the data structure, and hence in memory; and (2) this placement takes advantage of the principle of locality, which says that similar data is often accessed by an application with similar frequency.

a 尚未提及的一个优点TreeSet是它具有更大的“局部性”，这是说 (1) 如果两个条目在顺序中靠近的简写，aTreeSet在数据结构中将它们彼此靠近，因此在内存中；(2) 这种放置利用了局部性原则，即相似的数据经常被具有相似频率的应用程序访问。

This is in contrast to a HashSet, which spreads the entries all over memory, no matter what their keys are.

这与 a 形成对比HashSet，后者将条目散布在整个内存中，无论它们的键是什么。

When the latency cost of reading from a hard drive is thousands of times the cost of reading from cache or RAM, and when the data really is accessed with locality, the TreeSetcan be a much better choice.

当从硬盘读取的延迟成本是从缓存或 RAM 读取的成本的数千倍时，并且当数据确实是局部访问时，这TreeSet可能是一个更好的选择。

Answer 9

回答by gli00001

import java.util.HashSet;
import java.util.Set;
import java.util.TreeSet;

public class HashTreeSetCompare {

    //It is generally faster to add elements to the HashSet and then
    //convert the collection to a TreeSet for a duplicate-free sorted
    //Traversal.

    //really? 
    O(Hash + tree set) > O(tree set) ??
    Really???? Why?



    public static void main(String args[]) {

        int size = 80000;
        useHashThenTreeSet(size);
        useTreeSetOnly(size);

    }

    private static void useTreeSetOnly(int size) {

        System.out.println("useTreeSetOnly: ");
        long start = System.currentTimeMillis();
        Set<String> sortedSet = new TreeSet<String>();

        for (int i = 0; i < size; i++) {
            sortedSet.add(i + "");
        }

        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useTreeSetOnly: " + (end - start));
    }

    private static void useHashThenTreeSet(int size) {

        System.out.println("useHashThenTreeSet: ");
        long start = System.currentTimeMillis();
        Set<String> set = new HashSet<String>();

        for (int i = 0; i < size; i++) {
            set.add(i + "");
        }

        Set<String> sortedSet = new TreeSet<String>(set);
        //System.out.println(sortedSet);
        long end = System.currentTimeMillis();

        System.out.println("useHashThenTreeSet: " + (end - start));
    }
}

Answer 10

回答by SuReN

1.HashSet allows null object.

1.HashSet 允许空对象。

2.TreeSet will not allow null object. If you try to add null value it will throw a NullPointerException.

2.TreeSet 不允许空对象。如果您尝试添加空值，它将抛出 NullPointerException。

3.HashSet is much faster than TreeSet.

3.HashSet比TreeSet快得多。

e.g.

例如

 TreeSet<String> ts = new TreeSet<String>();
 ts.add(null); // throws NullPointerException

 HashSet<String> hs = new HashSet<String>();
 hs.add(null); // runs fine