Java - Collections.sort() 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2883821/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 14:02:22  来源:igfitidea点击:

Java - Collections.sort() performance

javaalgorithmcollectionssorting

提问by msr

Im using Collections.sort()to sort a LinkedList whose elements implements Comparable interface, so they are sorted in a natural order. In the javadoc documentation its said this method uses mergesortalgorithm wich has n*log(n) performance.

我使用Collections.sort()对其元素实现 Comparable 接口的 LinkedList 进行排序,因此它们按自然顺序排序。在 javadoc 文档中,它说这种方法使用了具有 n*log(n) 性能的归并排序算法。

My question is if there is a more efficient algorithm to sort my LinkedList?

我的问题是是否有更有效的算法来对我的 LinkedList 进行排序?

The size of that list could be very high and sort will be also very frequent.

该列表的大小可能非常大,排序也将非常频繁。

Thanks!

谢谢!

采纳答案by polygenelubricants

O(N log N)is very good asymptotically. That said, there are linear time O(N)non-comparison based sort, e.g. counting sort and bucket sort. This is useful when, e.g. you're sorting millions and millions of integers, but they're between 1..10.

O(N log N)渐近很好。也就是说,存在O(N)基于线性时间非比较的排序,例如计数排序和桶排序。例如,当您对数以百万计的整数进行排序,但它们介于 1..10 之间时,这很有用。

Also, if the list is "almost sorted", the otherwise quadratic insertion sort is reported to actually be better under some scenarios.

此外,如果列表“几乎已排序”,则在某些情况下,否则二次插入排序实际上会更好。

Whether or not this is applicable, or even worth to implement, depends on your profiling results. I'd say that unless it shows the sort to be a bottleneck, don't worry about it.

这是否适用,甚至是否值得实施,取决于您的分析结果。我会说,除非它表明排序是瓶颈,否则不要担心。

See also

也可以看看

Related questions

相关问题

回答by Petar Minchev

There is no general sort algorithm better than n*log(n). And this is quite fast. By general I mean your data doesn't have special properties.

没有比 更好的通用排序算法n*log(n)。这是相当快的。一般来说,我的意思是您的数据没有特殊属性。

回答by Pete Kirkham

In terms of sorting the list, no, all comparison based sorts on general data are O(N log(N)).

在排序列表方面,不,所有基于一般数据的比较排序都是 O(N log(N))。

If your resorting is due to insertions, then you can try to batch your insertions and then merge sort with the main list - if you have B new items, you sort them in O(B log(B)) then do a single level merge of the two lists which is O(N+B).

如果您的重新排序是由于插入,那么您可以尝试批量插入,然后与主列表合并排序 - 如果您有 B 个新项目,则将它们排序为 O(B log(B)) 然后进行单级合并两个列表中的 O(N+B)。

If your resorting is due to changes in the values of the items, you might be able to do a similar batching if you change the mutable values into immutable ones and treat the changes to be a batch of insertions and deletions. Otherwise, you won't be able to avoid sorting the whole list.

如果您的重新排序是由于项目值的更改,那么如果您将可变值更改为不可变值并将更改视为一批插入和删除,则您可能能够执行类似的批处理。否则,您将无法避免对整个列表进行排序。

If your requirements allow it, then there are various non-linked-list structures such as TreeSet available which maintain a sorted order more efficiently, but will fail if the values are mutable.

如果您的要求允许,那么有各种非链表结构(例如 TreeSet)可用,它们可以更有效地维护排序顺序,但如果值是可变的,则会失败。

回答by Progman

If you say the list will be sorted "very frequent", you should consider holding the list in a sorted stated all the time, like using a tree instead of a LinkedList. Maybeyou can even use some SortedSetinstead of a List, if you don't have any duplicated values and don't need any List operations (as you are sorting them anyway all the time). Check the TreeSetclass of the SortedSetimplementation.

如果您说该列表将“非常频繁”地排序,则您应该考虑始终将列表保存在已排序的状态中,例如使用树而不是LinkedList. 也许你甚至可以使用 someSortedSet而不是 a List,如果你没有任何重复的值并且不需要任何 List 操作(因为你一直在对它们进行排序)。检查实现的TreeSetSortedSet

This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).

此实现为基本操作(添加、删除和包含)提供有保证的 log(n) 时间成本。

If you want to iterate over this "list" (which is actually a Set) you can use the Iterator of the class.

如果您想遍历这个“列表”(实际上是一个 Set),您可以使用该类的 Iterator。

Returns an iterator over the elements in this set in ascending order.

以升序返回此集合中元素的迭代器。

If you have duplicate values inside the List you have to use some tricks (like putting the value in a new class which also got some delta for sorting equal object)

如果 List 中有重复的值,则必须使用一些技巧(例如将值放入一个新类中,该类也有一些用于对相等对象进行排序的增量)

回答by jschober

I am experimenting with large data sets (GBs of data) and have implemented a merge sort (there is a good example @ googlecode). However, I am using Collection.sort() to pre-sort my temporary buffers and in my experience Collection.sort() gets ridiculously slow at a certain threshold of data. With a auxiliary buffer of 96MB I can sort one of those buffers in about 30sec (note: this heavily depends on the comparators you use - I use a custom column layout with a quite complex column parser), however increasing this to a 128MB chunk size the time jumps to over 3 minutes. This is in no relation to the linear (or near linear) behavior I can observe for smaller chunks. This has so much impact, that a merge sort with smaller buffers in almost (?) all cases faster than a in memory sort using a 128MB buffer. To make this short: Merge sort is the way to go for large data sets beyond the 100MB boundary. I cannot really answer why that is, and those numbers might even be machine dependent (mine is a OS-X on a 2.6GHz i7 & 16GB memory).

我正在试验大型数据集(GB 数据)并实现了合并排序(有一个很好的例子@googlecode)。但是,我使用 Collection.sort() 对我的临时缓冲区进行预排序,根据我的经验 Collection.sort() 在某个数据阈值处变得非常慢。使用 96MB 的辅助缓冲区,我可以在大约 30 秒内对其中一个缓冲区进行排序(注意:这在很大程度上取决于您使用的比较器 - 我使用具有非常复杂的列解析器的自定义列布局),但是将其增加到 128MB 块大小时间跳到3分钟以上。这与我可以观察到的较小块的线性(或接近线性)行为无关。这有很大的影响,在几乎(?)所有情况下,具有较小缓冲区的合并排序比使用 128MB 缓冲区的内存排序更快。简而言之:合并排序是处理超过 100MB 边界的大型数据集的方法。我真的无法回答为什么会这样,这些数字甚至可能取决于机器(我的是 2.6GHz i7 和 16GB 内存上的 OS-X)。