Java 添加到集合然后对其进行排序或添加到已排序的集合中是否更快?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3607593/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 02:33:50  来源:igfitidea点击:

Is it faster to add to a collection then sort it, or add to a sorted collection?

javasortingcollections

提问by gutch

If I have a Maplike this:

如果我有Map这样的:

HashMap<Integer, ComparableObject> map;

and I want to obtain a collection of values sorted using natural ordering, which method is fastest?

并且我想获取使用自然排序排序的值的集合,哪种方法最快?

(A)

(一种)

Create an instance of a sortable collection like ArrayList, add the values, then sort it:

创建一个可排序集合的实例,例如ArrayList,添加值,然后对其进行排序:

List<ComparableObject> sortedCollection = new ArrayList<ComparableObject>(map.values());
Collections.sort(sortedCollection);

(B)

(二)

Create an instance of an ordered collection like TreeSet, then add the values:

创建一个有序集合的实例,如TreeSet,然后添加值:

Set<ComparableObject> sortedCollection = new TreeSet<ComparableObject>(map.values());

Note that the resulting collection is never modified, so the sorting only needs to take place once.

请注意,生成的集合永远不会被修改,因此排序只需要进行一次。

采纳答案by fasseg

TreeSet has a log(n)time complexity guarantuee for add()/remove()/contains()methods. Sorting an ArrayListtakes n*log(n)operations, but add()/get()takes only 1operation.

TreeSetlog(n)add()/remove()/contains()方法有 时间复杂度保证。排序ArrayList需要n*log(n)操作,但add()/get()只需要1操作。

So if you're mainly retrieving, and don't sort often, ArrayListis the better choice. If you sort often but dont retrieve that much TreeSetwould be a better choice.

因此,如果您主要是检索,并且不经常排序,ArrayList则是更好的选择。如果您经常排序但不检索那么多TreeSet将是更好的选择。

回答by BarsMonster

Theoretically, sorting at the end should be faster. Maintaining sorted state through the process could involve additional CPU time.

理论上,最后排序应该更快。在整个过程中保持排序状态可能需要额外的 CPU 时间。

From the CS points of view, both operations are NlogN, but 1 sort should have lower constant.

从 CS 的角度来看,这两个操作都是 NlogN,但是 1 排序应该具有较低的常量。

回答by locka

Be sure to read my comment about TreeSet at the bottom if you choose to implement B)

如果您选择实现 B),请务必阅读底部关于 TreeSet 的评论

If your app only does occasional sorts but iterates through it a lot, I'd say you're best off using a straightforward unsorted list. Sort it the once and then benefit from faster iteration. Iteration is especially fast on an array list.

如果您的应用程序只是偶尔进行排序,但会对其进行大量迭代,我会说您最好使用简单的未排序列表。对它进行一次排序,然后从更快的迭代中受益。在数组列表上迭代特别快。

However if you want sort order to be guaranteed all of the time or you are possibly adding / removing elements frequently then use a sorted collection and take the hit on iteration.

但是,如果您希望始终保证排序顺序,或者您可能经常添加/删除元素,那么请使用排序集合并在迭代中取得成功。

So in your case I would say A) is the better option. The list is sorted once, doesn't change and therefore benefits from being an array. Iteration should be very fast, especially if you knowits an ArrayList and can directly use the ArrayList.get() instead of an Iterator.

所以在你的情况下,我会说 A) 是更好的选择。列表排序一次,不会改变,因此从数组中受益。迭代应该非常快,特别是如果你知道它是一个 ArrayList 并且可以直接使用 ArrayList.get() 而不是 Iterator。

I'd also add that TreeSet by definition is a Set which means objects are unique. A TreeSet determines equality by using compareTo on your Comparator / Comparable. You could easily find yourself missing data if you try to add two objects whose compareTo returns a value of 0. e.g. adding "C", "A", "B", "A" to a TreeSet will return "A", "B", "C"

我还要补充一点,根据定义,TreeSet 是一个 Set,这意味着对象是唯一的。TreeSet 通过在 Comparator / Comparable 上使用 compareTo 来确定相等性。如果您尝试添加两个 compareTo 返回值为 0 的对象,您很容易发现自己丢失了数据。例如,将“C”、“A”、“B”、“A”添加到 TreeSet 将返回“A”、“B” “, “C”

回答by Sean Patrick Floyd

Why not use the best of both worlds? If you are never using it again, sort using a TreeSet and initialize an ArrayList with the contents

为什么不使用两全其美的呢?如果您不再使用它,请使用 TreeSet 进行排序并使用内容初始化 ArrayList

List<ComparableObject> sortedCollection = 
    new ArrayList<ComparableObject>( 
          new TreeSet<ComparableObject>(map.values()));


EDIT:

编辑:

I have created a benchmark (you can access it at pastebin.com/5pyPMJav) to test the three approaches (ArrayList + Collections.sort, TreeSet and my best of both worlds approach) and mine always wins. The test file creates a map with 10000 elements, the values of which have an intentionally awful comparator, and then each of the three strategies get a chance to a) sort the data and b) iterate over it. Here is some sample output (you can test it yourselves):

我创建了一个基准测试(您可以在pastebin.com/5pyPMJav访问它)来测试三种方法(ArrayList + Collections.sort、TreeSet 和我的两全其美的方法),我的总是获胜。测试文件创建了一个包含 10000 个元素的映射,这些元素的值有一个故意糟糕的比较器,然后这三种策略中的每一种都有机会 a) 对数据进行排序和 b) 对其进行迭代。这是一些示例输出(您可以自己测试):

EDIT: I have added an aspect that logs calls to Thingy.compareTo(Thingy) and I have also added a new Strategy based on PriorityQueues that is much faster than either of the previous solutions (at least in sorting).

编辑:我添加了一个方面来记录对 Thingy.compareTo(Thingy) 的调用,并且我还添加了一个基于 PriorityQueues 的新策略,它比以前的任何一个解决方案都快得多(至少在排序方面)。

compareTo() calls:123490
Transformer ArrayListTransformer
    Creation: 255885873 ns (0.255885873 seconds) 
    Iteration: 2582591 ns (0.002582591 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer TreeSetTransformer
    Creation: 199893004 ns (0.199893004 seconds) 
    Iteration: 4848242 ns (0.004848242 seconds) 
    Item count: 10000

compareTo() calls:121665
Transformer BestOfBothWorldsTransformer
    Creation: 216952504 ns (0.216952504 seconds) 
    Iteration: 1604604 ns (0.001604604 seconds) 
    Item count: 10000

compareTo() calls:18819
Transformer PriorityQueueTransformer
    Creation: 35119198 ns (0.035119198 seconds) 
    Iteration: 2803639 ns (0.002803639 seconds) 
    Item count: 10000

Strangely, my approach performs best in iteration (I would have thought there would be no differences to the ArrayList approach in iteration, do I have a bug in my benchmark?)

奇怪的是,我的方法在迭代中表现最好(我原以为在迭代中与 ArrayList 方法没有区别,我的基准测试中有错误吗?)

Disclaimer: I know this is probably an awful benchmark, but it helps get the point across to you and I certainly did not manipulate it to make my approach win.

免责声明:我知道这可能是一个糟糕的基准测试,但它有助于让你明白这一点,我当然没有操纵它来使我的方法获胜。

(The code has a dependency to apache commons / lang for the equals / hashcode / compareTo builders, but it should be easy to refactor it out)

(对于equals/hashcode/compareTo 构建器,该代码依赖于apache commons/lang,但它应该很容易重构出来)

回答by u290629

Collections.sortuses mergeSort which has O(nlog n).

Collections.sort使用具有 O(nlog n) 的 mergeSort。

TreeSethas Red-Black tree underlying, basic operations has O(logn). Hence n elements has also O(nlog n).

TreeSet底层有红黑树,基本操作有O(logn)。因此 n 个元素也有 O(nlog n)。

So both are same big O algorithm.

所以两者都是相同的大 O 算法。

回答by George Lords of Castle

Inserting in a SortedSet is O(log(n)) (BUT! the current n and not the final n). Inserting in a List is 1.

插入 SortedSet 是 O(log(n)) (但是!当前的 n 而不是最后的 n)。在列表中插入是 1。

Sorting in a SortedSet is already included in inserting, so it is 0. Sorting in a List is O(n*log(n)).

SortedSet 中的排序已经包含在插入中,所以它是 0。List 中的排序是 O(n*log(n))。

So SortedSet total complexity is O(n * k), k < log(n) for all cases but the last. Instead, List total complexity is O(n * log(n) + n), so O(n * log(n)).

所以 SortedSet 的总复杂度是 O(n * k), k < log(n) 对于除最后一种情况外的所有情况。相反,List 的总复杂度是 O(n * log(n) + n),所以 O(n * log(n))。

So, SortedSet mathematically has the best performance. But in the end, you have a Set instead of a List (because SortedList doesn't exist) and Set provides you fewer features than List. So in my opinion, the best solution for available features and performance is the one proposed by Sean Patrick Floyd:

因此,SortedSet 在数学上具有最好的性能。但最终,您拥有的是 Set 而不是 List(因为 SortedList 不存在)并且 Set 为您提供的功能比 List 少。所以在我看来,可用功能和性能的最佳解决方案是 Sean Patrick Floyd 提出的解决方案:

  • use a SortedSet for inserting,
  • put the SortedSet as a parameter for creating a List to return.
  • 使用 SortedSet 进行插入,
  • 将 SortedSet 作为参数用于创建要返回的 List。