Java Collections.sort(nodes) 使用什么排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/753237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What sort does Java Collections.sort(nodes) use?
提问by Kyle Jones
I think it is MergeSort, which is O(n log n).
我认为它是 MergeSort,它是 O(n log n)。
However, the following output disagrees:
但是,以下输出不同意:
-1,0000000099000391,0000000099000427
1,0000000099000427,0000000099000346
5,0000000099000391,0000000099000346
1,0000000099000427,0000000099000345
5,0000000099000391,0000000099000345
1,0000000099000346,0000000099000345
I am sorting a nodelist of 4 nodes by sequence number, and the sort is doing 6 comparisons. I am puzzled because 6 > (4 log(4)). Can someone explain this to me?
我正在按序列号对 4 个节点的节点列表进行排序,并且排序进行了 6 次比较。我很困惑,因为 6 > (4 log(4))。谁可以给我解释一下这个?
P.S. It is mergesort, but I still don't understand my results.
Thanks for the answers everyone. Thank you Tom for correcting my math.
谢谢大家的回答。谢谢汤姆纠正我的数学。
采纳答案by Andy Mikula
O(n log n) doesn't mean that the number of comparisons will be equal to or less than n log n, just that the time taken will scaleproportionally to n log n. Try doing tests with 8 nodes, or 16 nodes, or 32 nodes, and checking out the timing.
为O(n log n)的,并不意味着比较的数量将等于或小于N日志N,只是花费的时间将规模比例为N日志N。尝试使用 8 个节点、16 个节点或 32 个节点进行测试,并检查时间。
回答by tpdi
You sorted four nodes, so you didn't get merge sort; sort switched to insertion sort.
你排序了四个节点,所以你没有得到归并排序;排序切换到插入排序。
In Java, the Arrays.sort() methods use merge sort or a tuned quicksort depending on the datatypes and for implementation efficiency switch to insertion sort when fewer than seven array elements are being sorted.(Wikipedia, emphasis added)
在 Java 中,Arrays.sort() 方法根据数据类型使用归并排序或调整过的快速排序,并且当对少于七个数组元素进行排序时,为了实现效率切换到插入排序。(维基百科,加了重点)
Arrays.sort is used indirectly by the Collections classes.
Arrays.sort 由 Collections 类间接使用。
A recently accepted bug report indicates that the Sun implementation of Java will use Python's timsortin the future: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124
最近接受的错误报告表明,Java 的 Sun 实现将来会使用 Python 的timsort:http: //bugs.sun.com/bugdatabase/view_bug.do?bug_id=6804124
(The timsort monograph, linked above, is well worth reading.)
(上面链接的 timsort 专着非常值得一读。)
回答by Varkhan
An algorithm A(n) that processes an amount of data n is in O(f(n)), for some function f, if there exist two strictly positive constants C_inf and C_sup such that:
处理数据量 n 的算法 A(n) 的复杂度为 O(f(n)),对于某些函数 f,如果存在两个严格的正常数 C_inf 和 C_sup 使得:
C_inf . f(n) < ExpectedValue(OperationCount(A(n))) < C_sup . f(n)
C_inf 。f(n) < ExpectedValue(OperationCount(A(n))) < C_sup 。f(n)
Two things to note:
有两点需要注意:
The actual constants C could be anything, and dodepend on the relative costs of operations (depending on the language, the VM, the architecture, or your actual definition of an operation). On some platforms, for instance, + and * have the same cost, on some other the later is an order of magnitude slower.
The quantity ascribed as "in O(f(n))" is an expectedoperation count, based on some probably arbitrary model of the data you are dealing with. For instance, if your data is almost completely sorted, a merge-sort algorithm is going to be mostly O(n), not O(n . Log(n)).
实际的常数C可被任何东西,不要依赖于操作的相对成本(根据语言,虚拟机,建筑,或者操作的实际定义)。例如,在某些平台上,+ 和 * 具有相同的成本,在其他一些平台上,后者慢一个数量级。
归为“在 O(f(n)) 中”的数量是预期的操作计数,基于您正在处理的数据的一些可能的任意模型。例如,如果您的数据几乎完全排序,则合并排序算法将主要是 O(n),而不是 O(n . Log(n))。
回答by Neil Coffey
I've written some stuff you may be interested in about the Java sort algorithm and taken some performance measurements of Collections.sort(). The algorithm at present is a mergesort with an insertion sortonce you get down to a certain size of sublists (N.B. this algorithm is very probably going to change in Java 7).
我已经写了一些您可能对 Java 排序算法感兴趣的内容,并对 Collections.sort()进行了一些性能测量。目前的算法是一个合并排序,一旦你达到一定大小的子列表(注意这个算法很可能会在 Java 7 中改变),就会使用插入排序。
You should really take the Big O notation as an indication of how the algorithm will scale overall; for a particular sort, the precise time will deviate from the time predicted by this calculation (as you'll see on my graph, the two sort algorithms that are combined each have different performance characteristics, and so the overall time for a sort is a bit more complex).
您真的应该将 Big O 符号作为算法整体扩展方式的指示;对于特定排序,精确时间将与此计算预测的时间不同(正如您在我的图表中所见,组合的两种排序算法各自具有不同的性能特征,因此排序的总时间为有点复杂)。
That said, as a rough guide, for every time you double the number of elements, if you multiply the expected time by 2.2, you won't be far out. (It doesn't make much sense really to do this for very small lists of a few elements, though.)
也就是说,作为一个粗略的指南,每次将元素数量增加一倍,如果将预期时间乘以 2.2,就不会太远。(不过,对于包含几个元素的非常小的列表,这样做并没有多大意义。)