list 插入项目或将它们添加到排序列表后对列表进行排序是否更快

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/168891/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 01:18:01  来源:igfitidea点击:

Is it faster to sort a list after inserting items or adding them to a sorted list

algorithmsortinglist

提问by Steve

If I have a sorted list (say quicksort to sort), if I have a lot of values to add, is it better to suspend sorting, and add them to the end, then sort, or use binary chop to place the items correctly while adding them. Does it make a difference if the items are random, or already more or less in order?

如果我有一个排序列表(比如快速排序来排序),如果我有很多要添加的值,最好暂停排序,将它们添加到最后,然后排序,或者使用二分法正确放置项目,同时添加它们。如果项目是随机的,或者已经或多或少地按顺序排列,这会有所不同吗?

采纳答案by comingstorm

If you add enough items that you're effectively building the list from scratch, you should be able to get better performance by sorting the list afterwards.

如果您添加了足够多的项目以有效地从头开始构建列表,那么您应该能够通过随后对列表进行排序来获得更好的性能。

If items are mostly in order, you can tweak both incremental update and regular sorting to take advantage of that, but frankly, it usually isn't worth the trouble. (You also need to be careful of things like making sure some unexpected ordering can't make your algorithm take much longer, q.v. naive quicksort)

如果项目大部分是有序的,您可以调整增量更新和定期排序以利用这一点,但坦率地说,这通常不值得麻烦。(你还需要小心一些事情,比如确保一些意外的排序不会让你的算法花费更长的时间,qv naive quicksort)

Both incremental update and regular list sort are O(N log N) but you can get a better constant factor sorting everything afterward (I'm assuming here that you've got some auxiliary datastructure so your incremental update can access list items faster than O(N)...). Generally speaking, sorting all at once has a lot more design freedom than maintaining the ordering incrementally, since incremental update has to maintain a complete order at all times, but an all-at-once bulk sort does not.

增量更新和常规列表排序都是 O(N log N) 但是你可以得到一个更好的常数因子,然后对所有内容进行排序(我在这里假设你有一些辅助数据结构,所以你的增量更新可以比 O 更快地访问列表项(N)...)。一般来说,一次性排序比维护增量排序具有更多的设计自由度,因为增量更新必须始终保持完整的顺序,而一次性批量排序则不需要。

If nothing else, remember that there are lots of highly-optimized bulk sorts available.

如果不出意外,请记住有许多高度优化的批量排序可用。

回答by Javier

Usually it's far better to use a heap. in short, it splits the cost of maintaining order between the pusher and the picker. Both operations are O(log n), instead of O(n log n), like most other solutions.

通常使用要好得多。简而言之,它在推动者和拣选者之间分摊维护订单的成本。与大多数其他解决方案一样,这两个操作都是 O(log n),而不是 O(n log n)。

回答by Mark Ransom

If you're adding in bunches, you can use a merge sort. Sort the list of items to be added, then copy from both lists, comparing items to determine which one gets copied next. You could even copy in-place if resize your destination array and work from the end backwards.

如果您正在添加束,则可以使用归并排序。对要添加的项目列表进行排序,然后从两个列表中复制,比较项目以确定下一个要复制的项目。如果调整目标数组的大小并从末尾向后工作,您甚至可以就地复制。

The efficiency of this solution is O(n+m) + O(m log m) where n is the size of the original list, and m is the number of items being inserted.

该解决方案的效率为 O(n+m) + O(m log m),其中 n 是原始列表的大小,m 是插入的项目数。

Edit:Since this answer isn't getting any love, I thought I'd flesh it out with some C++ sample code. I assume that the sorted list is kept in a linked list rather than an array. This changes the algorithm to look more like an insertion than a merge, but the principle is the same.

编辑:由于这个答案没有得到任何人的喜爱,我想我会用一些 C++ 示例代码充实它。我假设排序列表保存在链接列表中而不是数组中。这使算法看起来更像是插入而不是合并,但原理是相同的。

// Note that itemstoadd is modified as a side effect of this function
template<typename T>
void AddToSortedList(std::list<T> & sortedlist, std::vector<T> & itemstoadd)
{
    std::sort(itemstoadd.begin(), itemstoadd.end());
    std::list<T>::iterator listposition = sortedlist.begin();
    std::vector<T>::iterator nextnewitem = itemstoadd.begin();
    while ((listposition != sortedlist.end()) || (nextnewitem != itemstoadd.end()))
    {
        if ((listposition == sortedlist.end()) || (*nextnewitem < *listposition))
            sortedlist.insert(listposition, *nextnewitem++);
        else
            ++listposition;
    }
}

回答by Mecki

I'd say, let's test it! :)

我想说,让我们来测试一下!:)

I tried with quicksort, but sorting an almost sorting array with quicksort is... well, not really a good idea. I tried a modified one, cutting off at 7 elements and using insertion sort for that. Still, horrible performance. I switched to merge sort. It might need quite a lot of memory for sorting (it's not in-place), but the performance is much better on sorted arrays and almost identical on random ones (the initial sort took almost the same time for both, quicksort was only slightly faster).

我尝试使用快速排序,但是使用快速排序对几乎排序的数组进行排序是......好吧,这不是一个好主意。我尝试了一个修改过的,在 7 个元素处切断并为此使用插入排序。仍然,可怕的表现。我切换到归并排序。它可能需要相当多的内存来进行排序(它不是就地排序),但是排序数组的性能要好得多,而随机数组的性能几乎相同(初始排序对两者几乎花费了相同的时间,快速排序只是稍微快一点)。

This already shows one thing: The answer to your questions depends strongly on the sorting algorithm you use. If it will have poor performance on almost sorted lists, inserting at the right position will be much faster than adding at the end and then re-sorting it; and merge sort might be no option for you, as it might need way too much external memory if the list is huge. BTW I used a custom merge sort implementation, that only uses 1/2 of external storage to the naive implementation (which needs as much external storage as the array size itself).

这已经说明了一件事:问题的答案很大程度上取决于您使用的排序算法。如果它在几乎排序的列表上性能不佳,在正确的位置插入将比在最后添加然后重新排序要快得多;合并排序可能不是您的选择,因为如果列表很大,它可能需要太多的外部内存。顺便说一句,我使用了一个自定义合并排序实现,它只使用 1/2 的外部存储来实现简单的实现(它需要与数组大小本身一样多的外部存储)。

If merge sort is no option and quicksort is no option for sure, the best alternative is probably heap sort.

如果合并排序不是选项并且快速排序肯定不是选项,那么最好的选择可能是堆排序。

My results are: Adding the new elements simply at the end and then re-sorting the array was several magnitudes faster than inserting them in the right position. However, my initial array had 10 mio elements (sorted) and I was adding another mio (unsorted). So if you add 10 elements to an array of 10 mio, inserting them correctly is much faster than re-sorting everything. So the answer to your question also depends on how big the initial (sorted) array is and how many new elements you want to add to it.

我的结果是:简单地在最后添加新元素然后重新排序数组比将它们插入正确位置要快几个数量级。但是,我的初始数组有 10 个 mio 元素(已排序),并且我添加了另一个 mio(未排序)。因此,如果将 10 个元素添加到 10 个 mio 的数组中,正确插入它们比重新排序所有内容要快得多。因此,您的问题的答案还取决于初始(已排序)数组的大小以及要向其中添加多少新元素。

回答by S.Lott

In principle, it's faster to create a tree than to sort a list. The tree inserts are O(log(n)) for each insert, leading to overall O(nlog(n)). Sorting in O(nlog(n)).

原则上,创建树比排序列表更快。对于每个插入,树插入是 O(log(n)),导致整体 O(n log(n))。以 O(nlog(n))排序

That's why Java has TreeMap, (in addition to TreeSet, TreeList, ArrayList and LinkedList implementations of a List.)

这就是 Java 有 TreeMap 的原因(除了 List 的 TreeSet、TreeList、ArrayList 和 LinkedList 实现。)

  • A TreeSet keeps things in object comparison order. The key is defined by the Comparable interface.

  • A LinkedList keeps things in the insertion order.

  • An ArrayList uses more memory, is faster for some operations.

  • A TreeMap, similarly, removes the need to sort by a key. The map is built in key order during the inserts and maintained in sorted order at all times.

  • TreeSet 以对象比较顺序保存事物。键由 Comparable 接口定义。

  • LinkedList 保持插入顺序中的内容。

  • ArrayList 使用更多内存,某些操作更快。

  • 同样,TreeMap 消除了按键排序的需要。该映射在插入期间按键顺序构建,并始终按排序顺序维护。

However, for some reason, the Java implementation of TreeSet is quite a bit slower than using an ArrayList and a sort.

但是,出于某种原因,TreeSet 的 Java 实现比使用 ArrayList 和排序要慢很多。

[It's hard to speculate as to why it would be dramatically slower, but it is. It should be slightly faster by one pass through the data. This kind of thing is often the cost of memory management trumping the algorithmic analysis.]

[很难推测为什么它会显着变慢,但确实如此。通过一次数据,它应该稍微快一点。这种事情往往是内存管理的成本胜过算法分析。]

回答by bmdhacks

It's about the same. Inserting an item into a sorted list is O(log N), and doing this for every element in the list, N, (thus building the list) would be O(N log N) which is the speed of quicksort (or merge sort which is closer to this approach).

差不多。将项目插入排序列表是 O(log N),对列表中的每个元素执行此操作,N,(从而构建列表)将是 O(N log N),这是快速排序(或合并排序)的速度这更接近这种方法)。

If you instead inserted them onto the front it would be O(1), but doing a quicksort after, it would still be O(N log N).

如果您将它们插入到前面,它将是 O(1),但之后进行快速排序,它仍然是 O(N log N)。

I would go with the first approach, because it has the potential to be slightly faster. If the initial size of your list, N, is much greater than the number of elements to insert, X, then the insert approach is O(X log N). Sorting after inserting to the head of the list is O(N log N). If N=0 (IE: your list is initially empty), the speed of inserting in sorted order, or sorting afterwards are the same.

我会采用第一种方法,因为它有可能稍微快一点。如果列表的初始大小 N 远大于要插入的元素数 X,则插入方法为 O(X log N)。插入到列表头部后的排序是 O(N log N)。如果 N=0(即:您的列表最初为空),则按排序顺序插入或之后排序的速度是相同的。

回答by warren

If the list is a) already sorted, and b) dynamic in nature, then inserting into a sorted list should always be faster (find the right place (O(n)) and insert (O(1))).

如果列表 a) 已经排序,并且 b) 本质上是动态的,那么插入排序列表应该总是更快(找到正确的位置(O(n))并插入(O(1)))。

However, if the list is static, then a shuffle of the remainder of the list has to occur (O(n) to find the right place and O(n) to slide things down).

但是,如果列表是静态的,则必须对列表的其余部分进行洗牌(O(n) 找到正确的位置,O(n) 向下滑动)。

Either way, inserting into a sorted list (or something like a Binary Search Tree) should be faster.

无论哪种方式,插入排序列表(或二叉搜索树之类的东西)都应该更快。

O(n) + O(n) should always be faster than O(N log n).

O(n) + O(n) 应该总是比 O(N log n) 快。

回答by Peter Parker

You should add them before and then use a radix sort this should be optimal

您应该在之前添加它们,然后使用基数排序这应该是最佳的

http://en.wikipedia.org/wiki/Radix_sort#Efficiency

http://en.wikipedia.org/wiki/Radix_sort#Efficiency

回答by Michael Brown

If this is .NET and the items are integers, it's quicker to add them to a Dictionary (or if you're on .Net 3.0 or above use the HashSet if you don't mind losing duplicates)This gives you automagic sorting.

如果这是 .NET 并且项目是整数,则将它们添加到字典会更快(或者如果您使用的是 .Net 3.0 或更高版本,如果您不介意丢失重复项,请使用 HashSet)这为您提供了自动排序。

I think that strings would work the same way as well. The beauty is you get O(1) insertion and sorting this way.

我认为字符串也会以同样的方式工作。美妙之处在于您可以通过这种方式获得 O(1) 次插入和排序。

回答by Ihar Bury

(If the list you're talking about is like C# List<T>.) Adding some values to right positions into a sorted list with many values is going to require less operations. But if the number of values being added becomes large, it will require more.

(如果您正在谈论的列表类似于 C# List<T>。)将一些值添加到具有许多值的排序列表中的正确位置将需要较少的操作。但是,如果添加的值的数量变大,则需要更多。

I would suggest using not a list but some more suitable data structure in your case. Like a binary tree, for example. A sorted data structure with minimal insertion time.

我建议您不要使用列表,而是使用一些更合适的数据结构。例如,像二叉树。具有最少插入时间的排序数据结构。