javascript Mergesort - 自底向上比自顶向下快吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10153393/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Mergesort - Is Bottom-Up faster than Top-Down?
提问by arthurakay
I've been reading "Algorithms, 4th Ed" by Sedgewick & Wayne, and along the way I've been implementing the algorithms discussed in JavaScript.
我一直在阅读 Sedgewick & Wayne 的“算法,第 4 版”,并且一直在实现 JavaScript 中讨论的算法。
I recently took the mergesort examples provided in the book to compare top-down and bottom-up approaches... but I'm finding that bottom-up is running faster (I think). See my analysis on my blog. - http://www.akawebdesign.com/2012/04/13/javascript-mergesort-top-down-vs-bottom-up/
我最近使用了书中提供的归并排序示例来比较自顶向下和自底向上的方法……但我发现自底向上的运行速度更快(我认为)。在我的博客上查看我的分析。- http://www.akawebdesign.com/2012/04/13/javascript-mergesort-top-down-vs-bottom-up/
I have not been able to find any discussion that says one method of mergesort should be faster than the other. Is my implementation (or analysis) flawed?
我没有找到任何讨论说一种归并排序方法应该比另一种更快。我的实施(或分析)是否有缺陷?
Note: my analysis measures the iterative loops of the algorithm, not strictly the array compares/moves. Perhaps this is flawed or irrelevant?
注意:我的分析衡量算法的迭代循环,而不是严格意义上的数组比较/移动。也许这是有缺陷的或不相关的?
EDIT: My analysis didn't actually time the speed, so my statement about it running "faster" is a bit misleading.I am tracking the "iterations" through the recursive method (top-down) and the for loops (bottom-up) - and bottom-up appears to use fewer iterations.
编辑:我的分析实际上并没有计算速度,所以我关于它运行“更快”的说法有点误导。我正在通过递归方法(自上而下)和 for 循环(自下而上)跟踪“迭代”——而自下而上似乎使用更少的迭代。
采纳答案by majinnaibu
If by faster you mean fewer "iterations" then yes. If you're wondering about execution time maybe.
如果更快是指更少的“迭代”,那么是的。如果你想知道执行时间也许。
The reason is some of those 21,513 iterations are doing more than the 22,527 iterations.
原因是这 21,513 次迭代中的一些比 22,527 次迭代做得更多。
From looking at the source it seems like some of the leaf nodes in your diagram are being sorted together not individually resulting in fewer merges and sorts but them taking longer.
从源代码来看,您的图中的某些叶节点似乎被排序在一起,而不是单独排序,从而导致合并和排序更少,但它们需要更长的时间。
回答by Christian Rinderknecht
I have not been able to find any discussion that says one method of mergesort should be faster than the other.
我没有找到任何讨论说一种归并排序方法应该比另一种更快。
Bottom-up and top-down merge sorts, as well as other variants, have been well studied during the 90s. In a nutshell, if you measure the cost as the number of comparisons of individual keys, the best costs are the same (~ (n lg n)/2), the worst cost of top-down is lower than or equal to the worst case of bottom-up (but both ~ n lg n) and the average cost of top-down is lower than or equal to the average case of bottom-up (but both ~ n lg n), where "lg n" is the binary logarithm. The differences stem from the linear terms. Of course, if n=2^p, the two variants are in fact exactly the same. This means that, comparison-wise, top-down is always better than bottom-up. Furthermore, it has been proved that the "half-half" splitting strategy of top-down merge sort is optimal. The research papers are from Flajolet, Golin, Panny, Prodinger, Chen, Hwang and Sedgewick.
自下而上和自上而下的归并排序以及其他变体在 90 年代得到了很好的研究。简而言之,如果用单个key的比较次数来衡量cost,最好的cost是一样的(~(n lg n)/2),top-down的最坏的cost低于或等于最坏的自下而上的情况(但都 ~ n lg n)和自顶向下的平均成本低于或等于自下而上的平均情况(但都 ~ n lg n),其中“lg n”是二进制对数。差异源于线性项。当然,如果n=2^p,这两个变体其实是完全一样的。这意味着,在比较方面,自上而下总是比自下而上好。此外,已经证明自顶向下归并排序的“一半一半”分裂策略是最优的。研究论文来自 Flajolet、Golin、Panny、
Here is what I came up in my book Design and Analysis of Purely Functional Programs(College Publications, UK), in Erlang:
这是我在 Erlang的《纯函数程序的设计和分析》(英国大学出版社)一书中提出的内容:
tms([X|T=[_|U]]) -> cutr([X],T,U);
tms(T) -> T.
cutr(S,[Y|T],[_,_|U]) -> cutr([Y|S],T,U);
cutr(S, T, U) -> mrg(tms(S),tms(T)).
mrg( [], T) -> T;
mrg( S, []) -> S;
mrg(S=[X|_],[Y|T]) when X > Y -> [Y|mrg(S,T)];
mrg( [X|S], T) -> [X|mrg(S,T)].
Note that this is nota stable sort. Also, in Erlang (and OCaml), you need to use aliases(ALIAS=...) in the patterns if you want to save memory. The trick here is to find the middle of the list without knowing its length. This is done by cutr/3 which handles two pointers to the input list: one is incremented by one and the other by two, so when the second reaches the end, the first one is in the middle. (I learnt this from a paper by Olivier Danvy.) This way, you don't need to keep track of the length and you don't duplicate the cells of the second half of the list, so you only need (1/2)n lg n extra space, versus n lg n. This is not well known.
请注意,这不是一个稳定的排序。此外,在 Erlang(和 OCaml)中,如果您想节省内存,则需要在模式中使用别名(ALIAS=...)。这里的技巧是在不知道长度的情况下找到列表的中间部分。这是由 cutr/3 完成的,它处理两个指向输入列表的指针:一个增加一个,另一个增加两个,所以当第二个到达末尾时,第一个在中间。(我从 Olivier Danvy 的一篇论文中了解到这一点。)这样,您不需要跟踪长度,也不需要复制列表后半部分的单元格,因此您只需要 (1/2 )n lg n 额外空间,相对于 n lg n。这不是众所周知的。
It is often claimed that the bottom-up variant is preferable for functional languages or linked list (Knuth, Panny, Prodinger), but I don't think this is true.
人们经常声称自下而上的变体更适合函数式语言或链表(Knuth、Panny、Prodinger),但我认为这不是真的。
I was puzzled like you by the lack of discussion on merge sorts, so I did my own research and wrote a large chapter about it. I am currently preparing a new edition with more material on merge sorts.
我和你一样对合并排序缺乏讨论感到困惑,所以我做了自己的研究并写了一大章。我目前正在准备一个新版本,其中包含更多关于合并排序的材料。
By the way, there are other variants: queue merge sort and on-line merge sort (I discuss the latter in my book).
顺便说一下,还有其他变体:队列归并排序和在线归并排序(我在书中讨论了后者)。
[EDIT: As the measure for the cost is the number of comparisons, there is no difference between choosing an array versus a linked list. Of course, if you implement the top-down variant with linked lists, you have to be clever, as you don't necessarily know the number of keys, but you'll need to traverse a least half the keys, each time, and reallocate, in total (1/2)n lg n cells (if you are clever). Bottom-up merge sort with linked lists actually requires more extra memory, n lg n + n cells. So, even with linked lists, the top-down variant is the best choice. As far as the length of the program goes, your mileage may vary, but in a functional language, top-down merge sort can be made shorter than bottom-up, if stability is not required. There are some papers that discuss implementations issues of merge sort, like in-place (for which you need arrays), or stability etc. For instance, A Meticulous Analysis of Mergesort Programs, by Katajainen and Larsson Traff (1997).]
[编辑:由于成本的衡量标准是比较次数,因此选择数组与链接列表之间没有区别。当然,如果你用链表实现自顶向下的变体,你必须聪明,因为你不一定知道键的数量,但你每次至少需要遍历一半的键,并且重新分配,总共 (1/2)n lg n 个单元格(如果你很聪明的话)。链表的自底向上归并排序实际上需要更多的额外内存,n lg n + n 个单元。因此,即使使用链表,自顶向下的变体也是最佳选择。就程序的长度而言,您的里程可能会有所不同,但在函数式语言中,如果不需要稳定性,自顶向下归并排序可以比自底向上更短。有一些论文讨论了归并排序的实现问题,对合并排序程序的细致分析,作者 Katajainen 和 Larsson Traff (1997)。]
回答by Nikunj Banka
I had asked the same question on coursera class forums for the 2012 August edition of this course. Professor Kevin wayne (of Princeton) replied that in many cases recursion is faster than iteration because of caching improved performances.
我曾在 2012 年 8 月版的课程论坛上问过同样的问题。凯文韦恩教授(普林斯顿大学)回答说,在许多情况下,由于缓存改进了性能,递归比迭代更快。
So the short answer that I got at that time was that top down merge sort will be faster than bottom up merge sort because of caching reasons.
所以我当时得到的简短回答是,由于缓存的原因,自上而下的归并排序会比自下而上的归并排序快。
Please note that the class was taught in Java programming language(not Javascript).
请注意,该课程是用 Java 编程语言(不是 Javascript)教授的。