Java 为什么 Collections.sort 使用 Mergesort 而 Arrays.sort 不使用?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32334319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 12:21:30  来源:igfitidea点击:

Why does Collections.sort use Mergesort but Arrays.sort does not?

javaarrayssortingcollectionsjava-8

提问by Quest Monger

I am using JDK-8 (x64). For Arrays.sort(primitives) I found the following in the Java documentation:

我正在使用 JDK-8 (x64)。对于Arrays.sort(原语),我在 Java 文档中发现了以下内容:

The sorting algorithm is a Dual-Pivot Quicksortby Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch.`

该排序算法是一个双枢轴快速排序弗拉基米尔·Yaroslavskiy,乔恩·本特利,以及约书亚Bloch.`

For Collections.sort(objects) I found this "Timsort":

对于Collections.sort(对象),我发现了这个“Timsort”:

This implementation is a stable, adaptive, iterative mergesort... This implementation dumps the specified list into an array, sorts the array, and iterates over the list resetting each element from the corresponding position in the array.

这个实现是一个稳定的、自适应的、迭代的归并排序……这个实现将指定的列表转储到一个数组中,对数组进行排序,并遍历列表,从数组中的相应位置重置每个元素。

If Collections.sortuses an array, why doesn't it just call Arrays.sortor use dual-pivot QuickSort? Why use Mergesort?

如果Collections.sort使用数组,为什么不直接调用Arrays.sort或使用双枢轴QuickSort?为什么使用合并排序

采纳答案by Holger

The API guarantees a stablesorting which Quicksortdoesn't offer. However, when sorting primitive valuesby their natural order you won't notice a difference as primitive values have no identity. Therefore, Quicksortcan used for primitive arrays and will be used when it is considered more efficient1.

该API可以保证一个稳定的排序,其快速排序不提供。但是,当按自然顺序对原始值进行排序时,您不会注意到差异,因为原始值没有标识。因此,Quicksort可以用于原始数组,并且会在被认为更有效时使用1。

For objects you may notice, when objects with different identity which are deemed equal according to their equalsimplementation or the provided Comparatorchange their order. Therefore, Quicksortis not an option. So a variant of MergeSortis used, the current Java versions use TimSort. This applies to both, Arrays.sortand Collections.sort, though with Java?8, the Listitself may override the sort algorithms.

对于对象,您可能会注意到,当具有不同标识的对象根据其equals实现或提供的对象被视为相等时,它们会Comparator更改它们的顺序。因此,Quicksort不是一种选择。所以一个变种归并时,当前的Java版本使用TimSort。这适用于两者,Arrays.sort并且Collections.sort,尽管对于 Java?8,它List本身可能会覆盖排序算法。



1 The efficiency advantage of Quicksortis needing less memory when done in-place. But it has a dramatic worst case performance and can't exploit runs of pre-sorted data in an array, which TimSortdoes.

1 Quicksort的效率优势是就地完成时需要更少的内存。但它具有戏剧性的最坏情况性能,并且无法利用数组中预先排序的数据运行,而TimSort 可以做到这一点。

Therefore, the sorting algorithms were reworked from version to version, while staying in the now-misleadingly named class DualPivotQuicksort. Also, the documentation didn't catch up, which shows, that it is a bad idea in general, to name an internally used algorithm in a specification, when not necessary.

因此,排序算法从一个版本到另一个版本进行了重新设计,同时保留在现在具有误导性的命名类中DualPivotQuicksort。此外,文档没有跟上,这表明,在没有必要时,在规范中命名内部使用的算法通常是一个坏主意。

The current situation (including Java?8 to Java 11) is as follows:

目前情况(包括Java?8到Java 11)如下:

  • Generally, the sorting methods for primitive arrays will use Quicksortonly under certain circumstances. For larger arrays, they will try to identify runs of pre-sorted data first, like TimSortdoes, and will merge them when the number of runs does not exceed a certain threshold. Otherwise they will fall back to Quicksort, but with an implementation that will fall back to Insertion sortfor small ranges, which does not only affect small arrays, but also quick sort's recursion.
  • sort(char[],…)and sort(short[],…)add another special case, to use Counting sortfor arrays whose length exceeds a certain threshold
  • Likewise, sort(byte[],…)will use Counting sort, but with a much smaller threshold, which creates the biggest contrast to the documentation, as sort(byte[],…)never uses Quicksort. It only uses Insertion sortfor small arrays and Counting sortotherwise.
  • 通常,原始数组的排序方法仅在某些情况下才会使用Quicksort。对于较大的数组,他们会像TimSort一样,首先尝试识别预排序数据的运行,并在运行次数不超过某个阈值时合并它们。否则,它们将回退到Quicksort,但其实现将回退到小范围的插入排序,这不仅会影响小数组,还会影响快速排序的递归。
  • sort(char[],…)sort(short[],…)添加另一个特殊情况,对长度超过某个阈值的数组使用计数排序
  • 同样,sort(byte[],…)将使用Counting sort,但阈值要小得多,这与文档形成了最大的对比,因为sort(byte[],…)从不使用 Quicksort。它只对小数组使用插入排序,否则使用计数排序

回答by Luiggi Mendoza

I don't know about the documentation, but the implementation of java.util.Collections#sortin Java 8 (HotSpot) goes like this:

我不知道文档,但java.util.Collections#sort在 Java 8 (HotSpot) 中的实现是这样的:

@SuppressWarnings({"unchecked", "rawtypes"})
public static <T> void sort(List<T> list, Comparator<? super T> c) {
    list.sort(c);
}

And List#sorthas this implementation:

List#sort有这个实现:

@SuppressWarnings({"unchecked", "rawtypes"})
default void sort(Comparator<? super E> c) {
    Object[] a = this.toArray();
    Arrays.sort(a, (Comparator) c);
    ListIterator<E> i = this.listIterator();
    for (Object e : a) {
        i.next();
        i.set((E) e);
    }
}

So, in the end, Collections#sortuses Arrays#sort(of object elements) behind the scenes. This implementation uses merge sort or tim sort.

所以,最后,在幕后Collections#sort使用Arrays#sort(对象元素)。此实现使用归并排序或 tim 排序。

回答by Puce

According to the Javadoc, only primitive arrays are sorted using Quicksort. Object arrays are sorted with a Mergesort as well.

根据 Javadoc,只有原始数组使用 Quicksort 进行排序。对象数组也使用 Mergesort 进行排序。

So Collections.sort seems to use the same sorting algorithm as Arrays.sort for Objects.

所以 Collections.sort 似乎对对象使用了与 Arrays.sort 相同的排序算法。

Another question would be why a different sort algorithm is used for primitive arrays than for Object arrays?

另一个问题是为什么原始数组与对象数组使用不同的排序算法?

回答by cogitoboy

As stated across many of the answers.

正如许多答案中所述。

The Quicksort is used by Arrays.sort for sorting primitive collections because stability isn't required (you won't know or care if two identical ints were swapped in the sort)

Arrays.sort 使用 Quicksort 对原始集合进行排序,因为不需要稳定性(您不会知道或关心是否在排序中交换了两个相同的整数)

MergeSort or more specifically Timsort is used by Arrays.sort for sorting collections of objects. Stability is required. Quicksort does not provide for stability, Timsort does.

Arrays.sort 使用 MergeSort 或更具体地说 Timsort 对对象集合进行排序。需要稳定。快速排序不提供稳定性,Timsort 提供。

Collections.sort delegates to Arrays.sort which is why you see the javadoc referencing the MergeSort.

Collections.sort 委托给 Arrays.sort,这就是为什么您会看到引用 MergeSort 的 javadoc。

回答by Krutik

Quick Sort has two major drawbacks when it comes to merge sort:

快速排序在归并排序方面有两个主要缺点:

  • It's not stable while it comes to non primitive.
  • It doesn't guarantee n log n performance.
  • 当涉及到非原始时,它并不稳定。
  • 它不保证 n log n 性能。

Stability is a non-issue for primitive types, as there is no notion of identity as distinct from (value) equality.

稳定性对于原始类型来说不是问题,因为没有区别于(值)相等的身份概念。

Stability is a big deal when sorting arbitrary objects. It's a nice side benefit that Merge Sort guarantees n log n (time) performance no matter what the input. That's why merge sort is selected to provide a stable sort (Merge Sort) to sort object references.

对任意对象进行排序时,稳定性很重要。无论输入如何,合并排序都能保证 n log n(时间)性能,这是一个很好的附带好处。这就是为什么选择归并排序来提供一种稳定的排序(Merge Sort)来对对象引用进行排序。