如何在现代 C++ 中实现经典的排序算法?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24650626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to implement classic sorting algorithms in modern C++?
提问by TemplateRex
The std::sort
algorithm (and its cousins std::partial_sort
and std::nth_element
) from the C++ Standard Library is in most implementations a complicated and hybrid amalgamation of more elementary sorting algorithms, such as selection sort, insertion sort, quick sort, merge sort, or heap sort.
来自 C++ 标准库的std::sort
算法(及其表亲std::partial_sort
和std::nth_element
)在大多数实现中是更基本排序算法的复杂和混合合并,例如选择排序、插入排序、快速排序、归并排序或堆排序。
There are many questions here and on sister sites such as https://codereview.stackexchange.com/related to bugs, complexity and other aspects of implementations of these classic sorting algorithms. Most of the offered implementations consist of raw loops, use index manipulation and concrete types, and are generally non-trivial to analyse in terms of correctness and efficiency.
这里和姊妹网站(例如https://codereview.stackexchange.com/)上有许多与这些经典排序算法实现的错误、复杂性和其他方面相关的问题。大多数提供的实现由原始循环、使用索引操作和具体类型组成,并且在正确性和效率方面通常很难分析。
Question: how can the above mentioned classic sorting algorithms be implemented using modern C++?
问题:如何使用现代 C++ 实现上述经典排序算法?
- no raw loops, but combining the Standard Library's algorithmic building blocks from
<algorithm>
- iterator interfaceand use of templatesinstead of index manipulation and concrete types
- C++14 style, including the full Standard Library, as well as syntactic noise reducers such as
auto
, template aliases, transparent comparators and polymorphic lambdas.
- 没有原始循环,但结合了标准库的算法构建块
<algorithm>
- 迭代器接口和使用模板而不是索引操作和具体类型
- C++14 风格,包括完整的标准库,以及语法降噪器,如
auto
模板别名、透明比较器和多态 lambda。
Notes:
注意事项:
- for further references on implementations of sorting algorithms see Wikipedia, Rosetta Codeor http://www.sorting-algorithms.com/
- according to Sean Parent's conventions(slide 39), a raw loop is a
for
-loop longer than composition of two functions with an operator. Sof(g(x));
orf(x); g(x);
orf(x) + g(x);
are not raw loops, and neither are the loops inselection_sort
andinsertion_sort
below. - I follow Scott Meyers's terminology to denote the current C++1y already as C++14, and to denote C++98 and C++03 both as C++98, so don't flame me for that.
- As suggested in the comments by @Mehrdad, I provide four implementations as a Live Example at the end of the answer: C++14, C++11, C++98 and Boost and C++98.
- The answer itself is presented in terms of C++14 only. Where relevant, I denote the syntactic and library differences where the various language versions differ.
- 有关排序算法实现的进一步参考,请参阅Wikipedia、Rosetta Code或http://www.sorting-algorithms.com/
- 根据Sean Parent 的约定(幻灯片 39),原始循环是一个
for
比两个函数与一个运算符的组合更长的循环。所以f(g(x));
orf(x); g(x);
orf(x) + g(x);
不是原始循环,也不是循环中selection_sort
和insertion_sort
下方。 - 我遵循 Scott Meyers 的术语,将当前的 C++1y 表示为 C++14,并将 C++98 和 C++03 都表示为 C++98,所以不要为此向我发火。
- 正如@Mehrdad 的评论中所建议的那样,我在答案末尾提供了四个实现作为实时示例:C++14、C++11、C++98 和 Boost 和 C++98。
- 答案本身仅以 C++14 的形式呈现。在相关的地方,我表示各种语言版本不同的句法和库差异。
回答by TemplateRex
Algorithmic building blocks
算法构建块
We begin by assembling the algorithmic building blocks from the Standard Library:
我们首先从标准库中组装算法构建块:
#include <algorithm> // min_element, iter_swap,
// upper_bound, rotate,
// partition,
// inplace_merge,
// make_heap, sort_heap, push_heap, pop_heap,
// is_heap, is_sorted
#include <cassert> // assert
#include <functional> // less
#include <iterator> // distance, begin, end, next
- the iterator tools such as non-member
std::begin()
/std::end()
as well as withstd::next()
are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range inboost::begin()
/boost::end()
, and from Boost.Utility inboost::next()
. - the
std::is_sorted
algorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms ofstd::adjacent_find
and a hand-written function object. Boost.Algorithm also provides aboost::algorithm::is_sorted
as a substitute. - the
std::is_heap
algorithm is only available for C++11 and beyond.
- 迭代器工具(例如非成员
std::begin()
/std::end()
以及 with)std::next()
仅在 C++11 及更高版本中可用。对于C++98,这些需要自己编写。在boost::begin()
/ 中有来自 Boost.Range 的替代品boost::end()
,在boost::next()
. - 该
std::is_sorted
算法仅适用于 C++11 及更高版本。对于 C++98,这可以通过std::adjacent_find
手写函数对象来实现。Boost.Algorithm 也提供了一个boost::algorithm::is_sorted
作为替代。 - 该
std::is_heap
算法仅适用于 C++11 及更高版本。
Syntactical goodies
语法好东西
C++14 provides transparent comparatorsof the form std::less<>
that act polymorphically on their arguments. This avoids having to provide an iterator's type. This can be used in combination with C++11's default function template argumentsto create a single overloadfor sorting algorithms that take <
as comparison and those that have a user-defined comparison function object.
C++14 提供了这种形式的透明比较器,std::less<>
可以对它们的参数进行多态操作。这避免了必须提供迭代器的类型。这可以与 C++11 的默认函数模板参数结合使用,以创建用于排序算法的单个重载<
,这些算法将作为比较的算法和具有用户定义的比较函数对象的算法。
template<class It, class Compare = std::less<>>
void xxx_sort(It first, It last, Compare cmp = Compare{});
In C++11, one can define a reusable template aliasto extract an iterator's value type which adds minor clutter to the sort algorithms' signatures:
在 C++11 中,可以定义一个可重用的模板别名来提取迭代器的值类型,这给排序算法的签名添加了轻微的混乱:
template<class It>
using value_type_t = typename std::iterator_traits<It>::value_type;
template<class It, class Compare = std::less<value_type_t<It>>>
void xxx_sort(It first, It last, Compare cmp = Compare{});
In C++98, one needs to write two overloads and use the verbose typename xxx<yyy>::type
syntax
在 C++98 中,需要编写两个重载并使用冗长的typename xxx<yyy>::type
语法
template<class It, class Compare>
void xxx_sort(It first, It last, Compare cmp); // general implementation
template<class It>
void xxx_sort(It first, It last)
{
xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>());
}
- Another syntactical nicety is that C++14 facilitates wrapping user-defined comparators through polymorphic lambdas(with
auto
parameters that are deduced like function template arguments). - C++11 only has monomorphic lambdas, that require the use of the above template alias
value_type_t
. - In C++98, one either needs to write a standalone function object or resort to the verbose
std::bind1st
/std::bind2nd
/std::not1
type of syntax. - Boost.Bind improves this with
boost::bind
and_1
/_2
placeholder syntax. - C++11 and beyond also have
std::find_if_not
, whereas C++98 needsstd::find_if
with astd::not1
around a function object.
- 另一个语法上的优点是 C++14 有助于通过多态 lambdas(具有
auto
像函数模板参数一样推导出来的参数)来包装用户定义的比较器。 - C++11 只有单态 lambda,需要使用上述模板 alias
value_type_t
。 - 在C ++ 98,一个或者需要编写一个独立的功能对象或诉诸冗长
std::bind1st
/std::bind2nd
/std::not1
类型语法。 - Boost.Bind 使用
boost::bind
和_1
/_2
占位符语法改进了这一点。 - C ++ 11及以后也有
std::find_if_not
,但是C ++ 98只需要std::find_if
与std::not1
周围的功能对象。
C++ Style
C++ 风格
There is no generally acceptable C++14 style yet. For better or for worse, I closely follow Scott Meyers's draft Effective Modern C++and Herb Sutter's revamped GotW. I use the following style recommendations:
目前还没有普遍接受的 C++14 风格。无论好坏,我都密切关注 Scott Meyers 的Effective Modern C++ 草案和 Herb Sutter改进后的 GotW。我使用以下风格建议:
- Herb Sutter's "Almost Always Auto"and Scott Meyers's "Prefer auto to specific type declarations"recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed.
- Scott Meyers's "Distinguish
()
and{}
when creating objects"and consistently choose braced-initialization{}
instead of the good old parenthesized initialization()
(in order to side-step all most-vexing-parse issues in generic code). - Scott Meyers's "Prefer alias declarations to typedefs". For templates this is a must anyway, and using it everywhere instead of
typedef
saves time and adds consistency. - I use a
for (auto it = first; it != last; ++it)
pattern in some places, in order to allow for loop invariant checking for already sorted sub-ranges. In production code, the use ofwhile (first != last)
and a++first
somewhere inside the loop might be slightly better.
- Herb Sutter 的“Almost Always Auto”和 Scott Meyers 的“Prefer auto to specific type declarations”建议,其简洁性是无与伦比的,尽管有时对其清晰度有争议。
- Scott Meyers 的“区分
()
和{}
创建对象时”并始终选择花{}
括号初始化而不是旧的带括号的初始化()
(为了避开通用代码中所有最令人烦恼的解析问题)。 - Scott Meyers 的“Prefer alias declarations to typedefs”。对于模板,无论如何这是必须的,并且在任何地方使用它而不是
typedef
节省时间并增加一致性。 - 我
for (auto it = first; it != last; ++it)
在某些地方使用了一种模式,以允许对已经排序的子范围进行循环不变检查。在生产代码中,在循环内的某处使用while (first != last)
和 a++first
可能会稍微好一些。
Selection sort
选择排序
Selection sortdoes not adapt to the data in any way, so its runtime is always O(N2)
. However, selection sort has the property of minimizing the number of swaps. In applications where the cost of swapping items is high, selection sort very well may be the algorithm of choice.
选择排序不会以任何方式适应数据,因此其运行时始终为O(N2)
. 然而,选择排序具有最小化交换次数的特性。在交换项目成本很高的应用程序中,选择排序非常好可能是选择的算法。
To implement it using the Standard Library, repeatedly use std::min_element
to find the remaining minimum element, and iter_swap
to swap it into place:
要使用标准库实现它,请重复使用std::min_element
以查找剩余的最小元素,并将iter_swap
其交换到位:
template<class FwdIt, class Compare = std::less<>>
void selection_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
for (auto it = first; it != last; ++it) {
auto const selection = std::min_element(it, last, cmp);
std::iter_swap(selection, it);
assert(std::is_sorted(first, std::next(it), cmp));
}
}
Note that selection_sort
has the already processed range [first, it)
sorted as its loop invariant. The minimal requirements are forward iterators, compared to std::sort
's random access iterators.
请注意,selection_sort
将已处理的范围[first, it)
作为其循环不变量进行排序。与的随机访问迭代器相比,最低要求是前向迭代std::sort
器。
Details omitted:
细节省略:
- selection sort can be optimized with an early test
if (std::distance(first, last) <= 1) return;
(or for forward / bidirectional iterators:if (first == last || std::next(first) == last) return;
). - for bidirectional iterators, the above test can be combined with a loop over the interval
[first, std::prev(last))
, because the last element is guaranteed to be the minimal remaining element and doesn't require a swap.
- 选择排序可以通过早期测试进行优化
if (std::distance(first, last) <= 1) return;
(或用于前向/双向迭代器:)if (first == last || std::next(first) == last) return;
。 - 对于双向迭代器,上面的测试可以与间隔上的循环结合
[first, std::prev(last))
,因为最后一个元素保证是最小的剩余元素并且不需要交换。
Insertion sort
插入排序
Although it is one of the elementary sorting algorithms with O(N2)
worst-case time, insertion sortis the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead). For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.
尽管它是具有O(N2)
最坏情况时间的基本排序算法之一,但插入排序是当数据接近排序(因为它是自适应的)或当问题规模很小(因为它的开销低)时选择的算法。由于这些原因,并且因为它也是稳定的,插入排序通常用作递归基本情况(当问题规模较小时)用于更高开销的分而治之排序算法,例如归并排序或快速排序。
To implement insertion_sort
with the Standard Library, repeatedly use std::upper_bound
to find the location where the current element needs to go, and use std::rotate
to shift the remaining elements upward in the input range:
为了实现insertion_sort
与标准库,可重复使用std::upper_bound
以找到当前元素需要到达的位置,并使用std::rotate
到剩余的元素在输入范围上移:
template<class FwdIt, class Compare = std::less<>>
void insertion_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
for (auto it = first; it != last; ++it) {
auto const insertion = std::upper_bound(first, it, *it, cmp);
std::rotate(insertion, it, std::next(it));
assert(std::is_sorted(first, std::next(it), cmp));
}
}
Note that insertion_sort
has the already processed range [first, it)
sorted as its loop invariant. Insertion sort also works with forward iterators.
请注意,insertion_sort
将已处理的范围[first, it)
作为其循环不变量进行排序。插入排序也适用于前向迭代器。
Details omitted:
细节省略:
- insertion sort can be optimized with an early test
if (std::distance(first, last) <= 1) return;
(or for forward / bidirectional iterators:if (first == last || std::next(first) == last) return;
) and a loop over the interval[std::next(first), last)
, because the first element is guaranteed to be in place and doesn't require a rotate. - for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear searchusing the Standard Library's
std::find_if_not
algorithm.
- 插入排序可以通过早期测试
if (std::distance(first, last) <= 1) return;
(或用于前向/双向迭代器:)if (first == last || std::next(first) == last) return;
和间隔上的循环进行优化[std::next(first), last)
,因为第一个元素保证就位并且不需要旋转。 - 对于双向迭代器,可以使用标准库的算法将查找插入点的二分搜索替换为反向线性搜索
std::find_if_not
。
Four Live Examples(C++14, C++11, C++98 and Boost, C++98) for the fragment below:
以下片段的四个实时示例(C++14、C++11、C++98 和 Boost、C++98):
using RevIt = std::reverse_iterator<BiDirIt>;
auto const insertion = std::find_if_not(RevIt(it), RevIt(first),
[=](auto const& elem){ return cmp(*it, elem); }
).base();
- For random inputs this gives
O(N2)
comparisons, but this improves toO(N)
comparisons for almost sorted inputs. The binary search always usesO(N log N)
comparisons. - For small input ranges, the better memory locality (cache, prefetching) of a linear search might also dominate a binary search (one should test this, of course).
- 对于随机输入,这提供了
O(N2)
比较,但这改进了O(N)
对几乎排序输入的比较。二分查找总是使用O(N log N)
比较。 - 对于较小的输入范围,线性搜索更好的内存局部性(缓存、预取)也可能主导二分搜索(当然,应该对此进行测试)。
Quick sort
快速排序
When carefully implemented, quick sortis robust and has O(N log N)
expected complexity, but with O(N2)
worst-case complexity that can be triggered with adversarially chosen input data. When a stable sort is not needed, quick sort is an excellent general-purpose sort.
如果仔细实施,快速排序是健壮的并且具有O(N log N)
预期的复杂性,但O(N2)
最坏情况的复杂性可以通过对抗性选择的输入数据触发。当不需要稳定排序时,快速排序是一种很好的通用排序。
Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle elementof the input range [first, last)
as the pivot, then use two calls to std::partition
(which are O(N)
) to three-way partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:
即使对于最简单的版本,使用标准库实现快速排序也比其他经典排序算法要复杂得多。下面的方法使用一些迭代器实用程序来定位输入范围的中间元素[first, last)
作为主元,然后使用两次调用std::partition
(它们是O(N)
)将输入范围三路划分为小于、等于、和大于选定的枢轴,分别。最后,对元素小于和大于主元的两个外部段进行递归排序:
template<class FwdIt, class Compare = std::less<>>
void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
auto const N = std::distance(first, last);
if (N <= 1) return;
auto const pivot = *std::next(first, N / 2);
auto const middle1 = std::partition(first, last, [=](auto const& elem){
return cmp(elem, pivot);
});
auto const middle2 = std::partition(middle1, last, [=](auto const& elem){
return !cmp(pivot, elem);
});
quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp));
quick_sort(middle2, last, cmp); // assert(std::is_sorted(middle2, last, cmp));
}
However, quick sort is rather tricky to get correct and efficient, as each of the above steps has to be carefully checked and optimized for production level code. In particular, for O(N log N)
complexity, the pivot has to result into a balanced partition of the input data, which cannot be guaranteed in general for an O(1)
pivot, but which can be guaranteed if one sets the pivot as the O(N)
median of the input range.
然而,快速排序要获得正确和高效是相当棘手的,因为必须针对生产级代码仔细检查和优化上述每个步骤。特别是,对于O(N log N)
复杂性,pivot 必须导致输入数据的平衡分区,这对于O(1)
pivot 来说通常不能保证,但是如果将 pivot 设置为O(N)
输入范围的中值,则可以保证。
Details omitted:
细节省略:
- the above implementation is particularly vulnerable to special inputs, e.g. it has
O(N^2)
complexity for the "organ pipe" input1, 2, 3, ..., N/2, ... 3, 2, 1
(because the middle is always larger than all other elements). - median-of-3pivot selection from randomly chosen elementsfrom the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to
O(N^2)
. - 3-way partitioning(separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to
std::partition
is not the most efficientO(N)
algorithm to achieve this result. - for random access iterators, a guaranteed
O(N log N)
complexity can be achieved through median pivot selectionusingstd::nth_element(first, middle, last)
, followed by recursive calls toquick_sort(first, middle, cmp)
andquick_sort(middle, last, cmp)
. - this guarantee comes at a cost, however, because the constant factor of the
O(N)
complexity ofstd::nth_element
can be more expensive than that of theO(1)
complexity of a median-of-3 pivot followed by anO(N)
call tostd::partition
(which is a cache-friendly single forward pass over the data).
- 上述实现特别容易受到特殊输入的影响,例如它
O(N^2)
对于“风琴管”输入具有复杂性1, 2, 3, ..., N/2, ... 3, 2, 1
(因为中间总是大于所有其他元素)。 - 从输入范围中随机选择的元素中的中值 3 主元选择可防止几乎排序的输入,否则其复杂性会恶化到
O(N^2)
。 - 两次调用所示的3 路分区(分隔小于、等于和大于主元的元素)
std::partition
并不是O(N)
实现此结果的最有效算法。 - 对于随机访问迭代器,
O(N log N)
可以通过使用中值枢轴选择std::nth_element(first, middle, last)
,然后递归调用quick_sort(first, middle, cmp)
和来实现有保证的复杂性quick_sort(middle, last, cmp)
。 - 然而,这种保证是有代价的,因为
O(N)
复杂度的常数因子std::nth_element
可能比O(1)
中位数为 3 的枢轴的复杂性更昂贵,然后O(N)
调用std::partition
(这是一个缓存友好的单前向传递数据)。
Merge sort
归并排序
If using O(N)
extra space is of no concern, then merge sortis an excellent choice: it is the only stableO(N log N)
sorting algorithm.
如果O(N)
不需要额外的空间,那么归并排序是一个很好的选择:它是唯一稳定的O(N log N)
排序算法。
It is simple to implement using Standard algorithms: use a few iterator utilities to locate the middle of the input range [first, last)
and combine two recursively sorted segments with a std::inplace_merge
:
使用标准算法很容易实现:使用一些迭代器实用程序来定位输入范围的中间位置[first, last)
,并将两个递归排序的段与 组合起来std::inplace_merge
:
template<class BiDirIt, class Compare = std::less<>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{})
{
auto const N = std::distance(first, last);
if (N <= 1) return;
auto const middle = std::next(first, N / 2);
merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp));
merge_sort(middle, last, cmp); // assert(std::is_sorted(middle, last, cmp));
std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp));
}
Merge sort requires bidirectional iterators, the bottleneck being the std::inplace_merge
. Note that when sorting linked lists, merge sort requires only O(log N)
extra space (for recursion). The latter algorithm is implemented by std::list<T>::sort
in the Standard Library.
归并排序需要双向迭代器,瓶颈是std::inplace_merge
. 请注意,在对链表进行排序时,归并排序只需要O(log N)
额外的空间(用于递归)。后一种算法由std::list<T>::sort
标准库实现。
Heap sort
堆排序
Heap sortis simple to implement, performs an O(N log N)
in-place sort, but is not stable.
堆排序实现简单,执行O(N log N)
就地排序,但不稳定。
The first loop, O(N)
"heapify" phase, puts the array into heap order. The second loop, the O(N log N
) "sortdown" phase, repeatedly extracts the maximum and restores heap order. The Standard Library makes this extremely straightforward:
第一个循环,O(N)
“heapify”阶段,将数组放入堆顺序。第二个循环,即O(N log N
)“排序”阶段,反复提取最大值并恢复堆顺序。标准库使这变得非常简单:
template<class RandomIt, class Compare = std::less<>>
void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp));
lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp));
}
In case you consider it "cheating" to use std::make_heap
and std::sort_heap
, you can go one level deeper and write those functions yourself in terms of std::push_heap
and std::pop_heap
, respectively:
如果您认为使用std::make_heap
and是“作弊” std::sort_heap
,您可以更深入一层,分别根据std::push_heap
and自己编写这些函数std::pop_heap
:
namespace lib {
// NOTE: is O(N log N), not O(N) as std::make_heap
template<class RandomIt, class Compare = std::less<>>
void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
for (auto it = first; it != last;) {
std::push_heap(first, ++it, cmp);
assert(std::is_heap(first, it, cmp));
}
}
template<class RandomIt, class Compare = std::less<>>
void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
for (auto it = last; it != first;) {
std::pop_heap(first, it--, cmp);
assert(std::is_heap(first, it, cmp));
}
}
} // namespace lib
The Standard Library specifies both push_heap
and pop_heap
as complexity O(log N)
. Note however that the outer loop over the range [first, last)
results in O(N log N)
complexity for make_heap
, whereas std::make_heap
has only O(N)
complexity. For the overall O(N log N)
complexity of heap_sort
it doesn't matter.
标准库将push_heap
和都指定pop_heap
为复杂性O(log N)
。但是请注意,范围内的外循环[first, last)
导致 的O(N log N)
复杂性make_heap
,而std::make_heap
只有O(N)
复杂性。对于它的整体O(N log N)
复杂性heap_sort
并不重要。
Details omitted: O(N)
implementation of make_heap
细节省略:O(N)
实现make_heap
Testing
测试
Here are four Live Examples(C++14, C++11, C++98 and Boost, C++98) testing all five algorithms on a variety of inputs (not meant to be exhaustive or rigorous). Just note the huge differences in the LOC: C++11/C++14 need around 130 LOC, C++98 and Boost 190 (+50%) and C++98 more than 270 (+100%).
这里有四个实时示例(C++14、C++11、C++98 和 Boost、C++98)在各种输入上测试所有五种算法(并非详尽或严格)。请注意 LOC 的巨大差异:C++11/C++14 需要大约 130 个 LOC,C++98 和 Boost 190 (+50%) 和 C++98 需要超过 270 (+100%)。
回答by Morwenn
Another small and rather elegant one originally found on code review. I thought it was worth sharing.
最初在代码中发现的另一个小而优雅的。我认为值得分享。
Counting sort
计数排序
While it is rather specialized, counting sortis a simple integer sorting algorithm and can often be really fast provided the values of the integers to sort are not too far apart. It's probably ideal if one ever needs to sort a collection of one million integers known to be between 0 and 100 for example.
虽然它相当专业,但计数排序是一种简单的整数排序算法,并且如果要排序的整数值相距不太远,通常可以非常快。例如,如果需要对已知介于 0 和 100 之间的一百万个整数的集合进行排序,这可能是理想的。
To implement a very simple counting sort that works with both signed and unsigned integers, one needs to find the smallest and greatest elements in the collection to sort; their difference will tell the size of the array of counts to allocate. Then, a second pass through the collection is done to count the number of occurrences of every element. Finally, we write back the required number of every integer back to the original collection.
要实现一种适用于有符号和无符号整数的非常简单的计数排序,需要找到集合中最小和最大的元素进行排序;它们的差异将告诉要分配的计数数组的大小。然后,对集合进行第二次遍历以计算每个元素的出现次数。最后,我们将每个整数的所需数量写回原始集合。
template<typename ForwardIterator>
void counting_sort(ForwardIterator first, ForwardIterator last)
{
if (first == last || std::next(first) == last) return;
auto minmax = std::minmax_element(first, last); // avoid if possible.
auto min = *minmax.first;
auto max = *minmax.second;
if (min == max) return;
using difference_type = typename std::iterator_traits<ForwardIterator>::difference_type;
std::vector<difference_type> counts(max - min + 1, 0);
for (auto it = first ; it != last ; ++it) {
++counts[*it - min];
}
for (auto count: counts) {
first = std::fill_n(first, count, min++);
}
}
While it is only useful when the range of the integers to sort is known to be small (generally not larger than the size of the collection to sort), making counting sort more generic would make it slower for its best cases. If the range is not known to be small, another algorithm such a radix sort, ska_sortor spreadsortcan be used instead.
虽然它仅在已知要排序的整数范围很小(通常不大于要排序的集合的大小)时才有用,但使计数排序更通用会使其在最佳情况下变慢。如果已知范围很小,则可以使用另一种算法,例如radix sort、ska_sort或spreadsort。
Details omitted:
细节省略:
We could have passed the bounds of the range of values accepted by the algorithm as parameters to totally get rid of the first
std::minmax_element
pass through the collection. This will make the algorithm even faster when a usefully-small range limit is known by other means. (It doesn't have to be exact; passing a constant 0 to 100 is still muchbetter than an extra pass over a million elements to find out that the true bounds are 1 to 95. Even 0 to 1000 would be worth it; the extra elements are written once with zero and read once).Growing
counts
on the fly is another way to avoid a separate first pass. Doubling thecounts
size each time it has to grow gives amortized O(1) time per sorted element (see hash table insertion cost analysis for the proof that exponential grown is the key). Growing at the end for a newmax
is easy withstd::vector::resize
to add new zeroed elements. Changingmin
on the fly and inserting new zeroed elements at the front can be done withstd::copy_backward
after growing the vector. Thenstd::fill
to zero the new elements.The
counts
increment loop is a histogram. If the data is likely to be highly repetitive, and the number of bins is small, it can be worth unrolling over multiple arraysto reduce the serializing data dependency bottleneck of store/reload to the same bin. This means more counts to zero at the start, and more to loop over at the end, but should be worth it on most CPUs for our example of millions of 0 to 100 numbers, especially if the input might already be (partially) sorted and have long runs of the same number.In the algorithm above, we use a
min == max
check to return early when every element has the same value (in which case the collection is sorted). It is actually possible to instead fully check whether the collection is already sorted while finding the extreme values of a collection with no additional time wasted (if the first pass is still memory bottlenecked with the extra work of updating min and max). However such an algorithm does not exist in the standard library and writing one would be more tedious than writing the rest of counting sort itself. It is left as an exercise for the reader.Since the algorithm only works with integer values, static assertions could be used to prevent users from making obvious type mistakes. In some contexts, a substitution failure with
std::enable_if_t
might be preferred.While modern C++ is cool, future C++ could be even cooler: structured bindingsand some parts of the Ranges TSwould make the algorithm even cleaner.
我们可以将算法接受的值范围的边界作为参数传递,以完全摆脱第
std::minmax_element
一次遍历集合。当通过其他方式知道有用的小范围限制时,这将使算法更快。(不一定要精确;传递一个常数 0 到 100 仍然比额外传递超过一百万个元素要好得多,以找出真正的边界是 1 到 95。即使是 0 到 1000 也是值得的;额外的元素用零写入一次并读取一次)。快速增长
counts
是避免单独的第一次通过的另一种方法。counts
每次必须增长时将大小加倍,每个排序元素的分摊时间为 O(1)(请参阅哈希表插入成本分析以证明指数增长是关键)。通过添加新的归零元素,最后为新的增长max
很容易std::vector::resize
。min
可以std::copy_backward
在增加向量后进行动态更改并在前面插入新的归零元素。然后std::fill
将新元素归零。该
counts
增量环是直方图。如果数据很可能是高度重复的,并且 bin 的数量很少,那么展开多个数组以减少存储/重新加载到同一 bin 的序列化数据依赖性瓶颈可能是值得的。这意味着在开始时更多的计数为零,并且在结束时更多地循环,但对于我们的数百万个 0 到 100 个数字的示例,在大多数 CPU 上应该是值得的,尤其是如果输入可能已经(部分)排序并且长跑相同的数字。在上面的算法中,
min == max
当每个元素具有相同的值时(在这种情况下集合已排序),我们使用检查来提前返回。实际上可以在不浪费额外时间的情况下完全检查集合是否已经排序,同时查找集合的极值(如果第一遍仍然是内存瓶颈,需要额外的更新 min 和 max 工作)。然而,标准库中不存在这样的算法,编写一个比编写计数排序本身的其余部分更乏味。它留给读者作为练习。由于该算法仅适用于整数值,因此可以使用静态断言来防止用户犯明显的类型错误。在某些情况下,替换失败
std::enable_if_t
可能是首选。虽然现代 C++ 很酷,但未来的 C++ 可能更酷:结构化绑定和Ranges TS 的某些部分将使算法更加清晰。