Java 7 是否将 Tim Sort 用于方法 Arrays.Sort?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4018332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is Java 7 using Tim Sort for the Method Arrays.Sort?
提问by Osvaldo
I can't find the documentation of Java 7, I can only find about the Java 6, which is still quick or merge. Does anyone know how to find the documentation of the method Arrays.sort
in Java 7?
我找不到Java 7的文档,我只能找到关于Java 6的,它仍然很快或合并。有谁知道如何Arrays.sort
在 Java 7 中找到该方法的文档?
回答by Michael Borgwardt
Java 7 uses Dual-Pivot Quicksort for primitives and TimSort for objects.
Java 7 对原语使用双枢轴快速排序,对对象使用 TimSort。
According to the Java 7 API doc for primitives:
Implementation note: The sorting algorithm is a Dual-Pivot Quicksort by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm offers O(n log(n)) performance on many data sets that cause other quicksorts to degrade to quadratic performance, and is typically faster than traditional (one-pivot) Quicksort implementations.
实施说明:排序算法是 Vladimir Yaroslavskiy、Jon Bentley 和 Joshua Bloch 的 Dual-Pivot Quicksort。该算法在许多数据集上提供 O(n log(n)) 性能,导致其他快速排序降级为二次性能,并且通常比传统(单轴)快速排序实现更快。
According to the Java 7 API doc for objects:
The implementation was adapted from Tim Peters's list sort for Python ( TimSort). It uses techiques from Peter McIlroy's "Optimistic Sorting and Information Theoretic Complexity", in Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474, January 1993.
该实现改编自 Tim Peters 的 Python 列表排序 (TimSort)。它使用来自 Peter McIlroy 的“乐观排序和信息理论复杂性”的技术,在第四届 ACM-SIAM 离散算法研讨会论文集,第 467-474 页,1993 年 1 月。
Timsortis a hybrid "merge sort and insertion sort."
Timsort是一种混合的“合并排序和插入排序”。
Not sure if this is much different from what it was in Java 6, forArrays.sort JDK6:
不确定这是否与 Java 6 中的Arrays.sort JDK6有很大不同:
a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993)
调整后的快速排序,改编自 Jon L. Bentley 和 M. Douglas McIlroy 的“Engineering a Sort Function”,Software-Practice and Experience,Vol。23(11) P. 1249-1265(1993 年 11 月)
For Object[] or collections (Collections.sort()) merge sort is used.
对于 Object[] 或集合 (Collections.sort()) 使用合并排序。
回答by Jonny Heggheim
Yes, Java 7 will use Timsort for Arrays.sort. Here is the commit: http://hg.openjdk.java.net/jdk7/jdk7/jdk/rev/bfd7abda8f79
是的,Java 7 将为 Arrays.sort 使用 Timsort。这是提交:http: //hg.openjdk.java.net/jdk7/jdk7/jdk/rev/bfd7abda8f79
回答by BeeOnRope
Yes! ... and also no.
是的!......也没有。
Summary
概括
At the time of writing (2016), in current Open JDK 0implementations Tim Sort is generally used for sorting arrays of objects (i.e., Arrays.sort(Object[])
and friends) - but for primitive arrays(the remainder of the Arrays.sort
methods) a variety of other methods are used.
在撰写本文时(2016 年),在当前的 Open JDK 0实现中,Tim Sort 通常用于对对象数组(即Arrays.sort(Object[])
和朋友)进行排序- 但对于原始数组(Arrays.sort
方法的其余部分),则使用了多种其他方法.
For primitives, the heuristics choose among a varietyof sorting methods such as quicksort, merge sort, counting sort3. depending on the data being sorted. Most of these decisions are simply made up-front based on the type and size of the array being sorted, but for int
and long
elements the decision is actually adaptive based on the measured sortedness of the array. So you have adaptation/introspection (the heuristics to pick an algorithm) on top of adaptation/introspection (TimSort or similar merge sort) in many cases!
对于原语,启发式在多种排序方法中进行选择,例如快速排序、归并排序、计数排序3。取决于被排序的数据。这些决策中的大多数只是根据正在排序的数组的类型和大小预先做出的,但是对于int
和long
元素,决策实际上是基于测量的数组排序度进行自适应的。因此,在许多情况下,您在适应/内省(TimSort 或类似的合并排序)之上进行了适应/内省(选择算法的启发式)!
Details
细节
Tim Sort is used for most sorts of objects, such as Arrays.sort(Object[] a)
, unless the user has specifically requested the legacy behavior by setting system property java.util.Arrays.useLegacyMergeSort
to true.
Tim Sort 用于大多数类型的对象,例如Arrays.sort(Object[] a)
,除非用户通过将系统属性设置java.util.Arrays.useLegacyMergeSort
为 true 来特别请求遗留行为。
For primitives, the situation is more complex. At least as of JDK 8 (version 1.8.0_111
) a variety of heurstics are used depending on the size of the arrays being sorted, the primitive type and the measured "sortedness" of the array. Here's an overview:
对于基元,情况更为复杂。至少从 JDK 8(版本1.8.0_111
)开始,根据要排序的数组的大小、原始类型和测量的数组“排序度”,使用了各种启发式方法。这是一个概述:
- For all primitive types other than bytes1, arrays of less than 47 elements are simply sorted using insertion sort (see
DualPivotQuicksort.INSERTION_SORT_THRESHOLD
). This threshold is alsoused when sorting sub-arrays that arise when merge or quicksort are used and the size of the subarray falls below the threshold. So some form of insertion sort will be used in all sorts, and for small arrays it is the only algorithm used. - For primitive types
byte
,short
andchar
, a counting sortis used for largish arrays. This is a simple sort that takesO(n + range)
time, whererange
is the total number of byte (256) or short/char (65536) values. The sort involves allocating an underlying array ofrange
values, so it is only used when the number of elements to sort is a significant fraction of the total range. In particular, it is used for byte arrays greater than 29 elements (i.e. ~11% of the range), and short/char arrays greater than 3200 elements (~5% of the range). - For byte arrays, one of the two approaches above is always used.
- For
int
andlong
arrays above the insertion sort threshold and forshort
/char
arrays both above the insertion sort threshold and below the counting sort threshold, one of two algorithms may be used: dual pivot quicksort, or merge sort. Which one is used depends on a measure of the sortedness of the array: the input is divided into runsof ascending or descending elements. If the number of such runs is greater than 66, then the array is considered mostly unsorted and is sorted with dual-pivot quicksort. Otherwise, the array is considered mostly sorted, and mergesort is used (using the already enumerated runs as a starting point).
- 对于字节1以外的所有原始类型,少于 47 个元素的数组使用插入排序(请参阅 参考资料
DualPivotQuicksort.INSERTION_SORT_THRESHOLD
)进行简单排序。该阈值还排序中出现时合并子阵列时使用或快速排序的使用和所述子阵列的大小低于该阈值。所以某种形式的插入排序将用于所有排序,对于小数组,它是唯一使用的算法。 - 对于原始类型
byte
,short
和char
,计数排序用于较大的数组。这是一个需要O(n + range)
时间的简单排序,其中range
是字节 (256) 或短/字符 (65536) 值的总数。排序涉及分配range
值的底层数组,因此仅当要排序的元素数量占总范围的很大一部分时才使用它。特别是,它用于大于 29 个元素(即范围的约 11%)的字节数组和大于 3200 个元素(范围的约 5%)的短/字符数组。 - 对于字节数组,始终使用上述两种方法之一。
- 对于
int
和long
上述插入排序阈值和用于阵列short
/char
可使用上述两种插入排序阈值且低于所述计数排序阈值阵列,两种算法之一:双枢轴快速排序,或合并排序。使用哪一个取决于数组排序的度量:输入分为升序或降序元素的运行。如果此类运行的次数大于 66,则该数组被视为大部分未排序,并使用双枢轴快速排序进行排序。否则,数组被认为是大部分排序的,并使用归并排序(使用已经枚举的运行作为起点)。
The ideaof finding runs and then using mergesort to sort them is in fact very similar to TimSort, although there are some differences. So at least for some parameters, the JDK is using a run-aware mergesort, but for many other combinations of parameters it is using a different algorithm, and at least 5 distinct algorithms are used in total!
查找运行然后使用归并排序对其进行排序的想法实际上与 TimSort 非常相似,尽管存在一些差异。因此,至少对于某些参数,JDK 使用的是运行感知归并排序,但是对于许多其他参数组合,它使用的是不同的算法,并且总共使用了至少 5 种不同的算法!
Rationale
基本原理
The reasoning behind the different sort behavior of Object[]
versus primitive is probably at least two-fold:
Object[]
与原始类型的不同排序行为背后的推理可能至少有两个方面:
1) Sorts of Object[]
are required to be stable: objects which sort equally will appear in the same order as the input. For primitive arrays, no such concept exists: primitives are fully defined by their value, so there is no distinction between a stable and an unstable sort. This allows primitive sorts to dispense with the need for stable algorithms in favor of speed.
1) 排序Object[]
需要稳定:排序相同的对象将以与输入相同的顺序出现。对于原始数组,不存在这样的概念:原始数组完全由它们的值定义,因此稳定排序和不稳定排序之间没有区别。这允许原始排序不需要稳定的算法以支持速度。
2) Sorts of Object[]
need to involve the Object.compare()
method, which may be arbitrarily complex and expensive. Even if the compare()
method is simple, there will generally be method call overhead unless the entire sort method can be inlined2. So sorts of Object[]
will generally be biased towards minimizing total comparisons, even at the cost of some additional algorithmic complexity.
2)Object[]
需要涉及Object.compare()
方法,这可能是任意复杂和昂贵的。即使compare()
方法很简单,通常也会有方法调用开销,除非可以内联整个 sort 方法2。因此Object[]
,即使以一些额外的算法复杂性为代价,通常也会倾向于最小化总比较。
Sorts of primitive arrays, on the other hand, just directly compare primitive values which typically takes on the order of a cycle or two. In this case, the algorithm should be optimized considering both the cost of comparisons and the surrounding algorithm, since the are likely to be of the same magnitude.
另一方面,原始数组的排序只是直接比较原始值,这些原始值通常采用一两个循环的顺序。在这种情况下,应该考虑比较的成本和周围的算法来优化算法,因为它们可能具有相同的量级。
0At least for versions between Java 7 and Java 9, and of course this also includes Oracle's JDK as it is based on Open JDK. It is likelythat other implementations use a similar approach, but I haven't checked.
0至少对于 Java 7 和 Java 9 之间的版本,当然这也包括 Oracle 的 JDK,因为它基于 Open JDK。这是可能的是,其他实现使用类似的方法,但我没有检查。
1For byte arrays, the insertion sort threshold is effectively 29 elements since that's the lower cutoff above which counting sort is used.
1对于字节数组,插入排序阈值实际上是 29 个元素,因为这是使用计数排序的下限。
2This seems unlikely, since it is quite large.
2这似乎不太可能,因为它相当大。
3Counting sort is only used for values with relatively limited range of 16-bits or less: byte
, short
or char
.
3计数排序仅用于 16 位或更少范围相对有限的值:byte
、short
或char
。