如何在现代 C++ 中实现经典的排序算法？

Question

提问by TemplateRex

The std::sortalgorithm (and its cousins std::partial_sortand std::nth_element) from the C++ Standard Library is in most implementations a complicated and hybrid amalgamation of more elementary sorting algorithms, such as selection sort, insertion sort, quick sort, merge sort, or heap sort.

来自 C++ 标准库的std::sort算法（及其表亲std::partial_sort和std::nth_element）在大多数实现中是更基本排序算法的复杂和混合合并，例如选择排序、插入排序、快速排序、归并排序或堆排序。

There are many questions here and on sister sites such as https://codereview.stackexchange.com/related to bugs, complexity and other aspects of implementations of these classic sorting algorithms. Most of the offered implementations consist of raw loops, use index manipulation and concrete types, and are generally non-trivial to analyse in terms of correctness and efficiency.

这里和姊妹网站（例如https://codereview.stackexchange.com/）上有许多与这些经典排序算法实现的错误、复杂性和其他方面相关的问题。大多数提供的实现由原始循环、使用索引操作和具体类型组成，并且在正确性和效率方面通常很难分析。

Question: how can the above mentioned classic sorting algorithms be implemented using modern C++?

问题：如何使用现代 C++ 实现上述经典排序算法？

no raw loops, but combining the Standard Library's algorithmic building blocks from <algorithm>
iterator interfaceand use of templatesinstead of index manipulation and concrete types
C++14 style, including the full Standard Library, as well as syntactic noise reducers such as auto, template aliases, transparent comparators and polymorphic lambdas.

没有原始循环，但结合了标准库的算法构建块<algorithm>
迭代器接口和使用模板而不是索引操作和具体类型
C++14 风格，包括完整的标准库，以及语法降噪器，如auto模板别名、透明比较器和多态 lambda。

Notes:

注意事项：

for further references on implementations of sorting algorithms see Wikipedia, Rosetta Codeor http://www.sorting-algorithms.com/
according to Sean Parent's conventions(slide 39), a raw loop is a for-loop longer than composition of two functions with an operator. So f(g(x));or f(x); g(x);or f(x) + g(x);are not raw loops, and neither are the loops in selection_sortand insertion_sortbelow.
I follow Scott Meyers's terminology to denote the current C++1y already as C++14, and to denote C++98 and C++03 both as C++98, so don't flame me for that.
As suggested in the comments by @Mehrdad, I provide four implementations as a Live Example at the end of the answer: C++14, C++11, C++98 and Boost and C++98.
The answer itself is presented in terms of C++14 only. Where relevant, I denote the syntactic and library differences where the various language versions differ.

有关排序算法实现的进一步参考，请参阅Wikipedia、Rosetta Code或http://www.sorting-algorithms.com/
根据Sean Parent 的约定（幻灯片 39），原始循环是一个for比两个函数与一个运算符的组合更长的循环。所以f(g(x));or f(x); g(x);orf(x) + g(x);不是原始循环，也不是循环中selection_sort和insertion_sort下方。
我遵循 Scott Meyers 的术语，将当前的 C++1y 表示为 C++14，并将 C++98 和 C++03 都表示为 C++98，所以不要为此向我发火。
正如@Mehrdad 的评论中所建议的那样，我在答案末尾提供了四个实现作为实时示例：C++14、C++11、C++98 和 Boost 和 C++98。
答案本身仅以 C++14 的形式呈现。在相关的地方，我表示各种语言版本不同的句法和库差异。

Answer 1

回答by TemplateRex

Algorithmic building blocks

算法构建块

We begin by assembling the algorithmic building blocks from the Standard Library:

我们首先从标准库中组装算法构建块：

#include <algorithm>    // min_element, iter_swap, 
                        // upper_bound, rotate, 
                        // partition, 
                        // inplace_merge,
                        // make_heap, sort_heap, push_heap, pop_heap,
                        // is_heap, is_sorted
#include <cassert>      // assert 
#include <functional>   // less
#include <iterator>     // distance, begin, end, next

the iterator tools such as non-member std::begin()/ std::end()as well as with std::next()are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range in boost::begin()/ boost::end(), and from Boost.Utility in boost::next().
the std::is_sortedalgorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms of std::adjacent_findand a hand-written function object. Boost.Algorithm also provides a boost::algorithm::is_sortedas a substitute.
the std::is_heapalgorithm is only available for C++11 and beyond.

迭代器工具（例如非成员std::begin()/std::end()以及 with）std::next()仅在 C++11 及更高版本中可用。对于C++98，这些需要自己编写。在boost::begin()/ 中有来自 Boost.Range 的替代品boost::end()，在boost::next().
该std::is_sorted算法仅适用于 C++11 及更高版本。对于 C++98，这可以通过std::adjacent_find手写函数对象来实现。Boost.Algorithm 也提供了一个boost::algorithm::is_sorted作为替代。
该std::is_heap算法仅适用于 C++11 及更高版本。

Syntactical goodies

语法好东西

C++14 provides transparent comparatorsof the form std::less<>that act polymorphically on their arguments. This avoids having to provide an iterator's type. This can be used in combination with C++11's default function template argumentsto create a single overloadfor sorting algorithms that take <as comparison and those that have a user-defined comparison function object.

C++14 提供了这种形式的透明比较器，std::less<>可以对它们的参数进行多态操作。这避免了必须提供迭代器的类型。这可以与 C++11 的默认函数模板参数结合使用，以创建用于排序算法的单个重载<，这些算法将作为比较的算法和具有用户定义的比较函数对象的算法。

template<class It, class Compare = std::less<>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++11, one can define a reusable template aliasto extract an iterator's value type which adds minor clutter to the sort algorithms' signatures:

在 C++11 中，可以定义一个可重用的模板别名来提取迭代器的值类型，这给排序算法的签名添加了轻微的混乱：

template<class It>
using value_type_t = typename std::iterator_traits<It>::value_type;

template<class It, class Compare = std::less<value_type_t<It>>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++98, one needs to write two overloads and use the verbose typename xxx<yyy>::typesyntax

在 C++98 中，需要编写两个重载并使用冗长的typename xxx<yyy>::type语法

template<class It, class Compare>
void xxx_sort(It first, It last, Compare cmp); // general implementation

template<class It>
void xxx_sort(It first, It last)
{
    xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>());
}

Another syntactical nicety is that C++14 facilitates wrapping user-defined comparators through polymorphic lambdas(with autoparameters that are deduced like function template arguments).
C++11 only has monomorphic lambdas, that require the use of the above template alias value_type_t.
In C++98, one either needs to write a standalone function object or resort to the verbose std::bind1st/ std::bind2nd/ std::not1type of syntax.
Boost.Bind improves this with boost::bindand _1/ _2placeholder syntax.
C++11 and beyond also have std::find_if_not, whereas C++98 needs std::find_ifwith a std::not1around a function object.

另一个语法上的优点是 C++14 有助于通过多态 lambdas（具有auto像函数模板参数一样推导出来的参数）来包装用户定义的比较器。
C++11 只有单态 lambda，需要使用上述模板 alias value_type_t。
在C ++ 98，一个或者需要编写一个独立的功能对象或诉诸冗长std::bind1st/ std::bind2nd/std::not1类型语法。
Boost.Bind 使用boost::bind和_1/_2占位符语法改进了这一点。
C ++ 11及以后也有std::find_if_not，但是C ++ 98只需要std::find_if与std::not1周围的功能对象。

C++ Style

C++ 风格

There is no generally acceptable C++14 style yet. For better or for worse, I closely follow Scott Meyers's draft Effective Modern C++and Herb Sutter's revamped GotW. I use the following style recommendations:

目前还没有普遍接受的 C++14 风格。无论好坏，我都密切关注 Scott Meyers 的Effective Modern C++ 草案和 Herb Sutter改进后的 GotW。我使用以下风格建议：

Herb Sutter's "Almost Always Auto"and Scott Meyers's "Prefer auto to specific type declarations"recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed.
Scott Meyers's "Distinguish ()and {}when creating objects"and consistently choose braced-initialization {}instead of the good old parenthesized initialization ()(in order to side-step all most-vexing-parse issues in generic code).
Scott Meyers's "Prefer alias declarations to typedefs". For templates this is a must anyway, and using it everywhere instead of typedefsaves time and adds consistency.
I use a for (auto it = first; it != last; ++it)pattern in some places, in order to allow for loop invariant checking for already sorted sub-ranges. In production code, the use of while (first != last)and a ++firstsomewhere inside the loop might be slightly better.

Herb Sutter 的“Almost Always Auto”和 Scott Meyers 的“Prefer auto to specific type declarations”建议，其简洁性是无与伦比的，尽管有时对其清晰度有争议。
Scott Meyers 的“区分()和{}创建对象时”并始终选择花{}括号初始化而不是旧的带括号的初始化()（为了避开通用代码中所有最令人烦恼的解析问题）。
Scott Meyers 的“Prefer alias declarations to typedefs”。对于模板，无论如何这是必须的，并且在任何地方使用它而不是typedef节省时间并增加一致性。
我for (auto it = first; it != last; ++it)在某些地方使用了一种模式，以允许对已经排序的子范围进行循环不变检查。在生产代码中，在循环内的某处使用while (first != last)和 a++first可能会稍微好一些。

Selection sort

选择排序

Selection sortdoes not adapt to the data in any way, so its runtime is always O(N2). However, selection sort has the property of minimizing the number of swaps. In applications where the cost of swapping items is high, selection sort very well may be the algorithm of choice.

选择排序不会以任何方式适应数据，因此其运行时始终为O(N2). 然而，选择排序具有最小化交换次数的特性。在交换项目成本很高的应用程序中，选择排序非常好可能是选择的算法。

To implement it using the Standard Library, repeatedly use std::min_elementto find the remaining minimum element, and iter_swapto swap it into place:

要使用标准库实现它，请重复使用std::min_element以查找剩余的最小元素，并将iter_swap其交换到位：

template<class FwdIt, class Compare = std::less<>>
void selection_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last; ++it) {
        auto const selection = std::min_element(it, last, cmp);
        std::iter_swap(selection, it); 
        assert(std::is_sorted(first, std::next(it), cmp));
    }
}

Note that selection_sorthas the already processed range [first, it)sorted as its loop invariant. The minimal requirements are forward iterators, compared to std::sort's random access iterators.

请注意，selection_sort将已处理的范围[first, it)作为其循环不变量进行排序。与的随机访问迭代器相比，最低要求是前向迭代std::sort器。

Details omitted:

细节省略：

selection sort can be optimized with an early test if (std::distance(first, last) <= 1) return;(or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;).
for bidirectional iterators, the above test can be combined with a loop over the interval [first, std::prev(last)), because the last element is guaranteed to be the minimal remaining element and doesn't require a swap.

选择排序可以通过早期测试进行优化if (std::distance(first, last) <= 1) return;（或用于前向/双向迭代器：）if (first == last || std::next(first) == last) return;。
对于双向迭代器，上面的测试可以与间隔上的循环结合[first, std::prev(last))，因为最后一个元素保证是最小的剩余元素并且不需要交换。

Insertion sort

插入排序

Although it is one of the elementary sorting algorithms with O(N2)worst-case time, insertion sortis the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead). For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.

尽管它是具有O(N2)最坏情况时间的基本排序算法之一，但插入排序是当数据接近排序（因为它是自适应的）或当问题规模很小（因为它的开销低）时选择的算法。由于这些原因，并且因为它也是稳定的，插入排序通常用作递归基本情况（当问题规模较小时）用于更高开销的分而治之排序算法，例如归并排序或快速排序。

To implement insertion_sortwith the Standard Library, repeatedly use std::upper_boundto find the location where the current element needs to go, and use std::rotateto shift the remaining elements upward in the input range:

为了实现insertion_sort与标准库，可重复使用std::upper_bound以找到当前元素需要到达的位置，并使用std::rotate到剩余的元素在输入范围上移：

template<class FwdIt, class Compare = std::less<>>
void insertion_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last; ++it) {
        auto const insertion = std::upper_bound(first, it, *it, cmp);
        std::rotate(insertion, it, std::next(it)); 
        assert(std::is_sorted(first, std::next(it), cmp));
    }
}

Note that insertion_sorthas the already processed range [first, it)sorted as its loop invariant. Insertion sort also works with forward iterators.

请注意，insertion_sort将已处理的范围[first, it)作为其循环不变量进行排序。插入排序也适用于前向迭代器。

Details omitted:

细节省略：

insertion sort can be optimized with an early test if (std::distance(first, last) <= 1) return;(or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;) and a loop over the interval [std::next(first), last), because the first element is guaranteed to be in place and doesn't require a rotate.
for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear searchusing the Standard Library's std::find_if_notalgorithm.

插入排序可以通过早期测试if (std::distance(first, last) <= 1) return;（或用于前向/双向迭代器：）if (first == last || std::next(first) == last) return;和间隔上的循环进行优化[std::next(first), last)，因为第一个元素保证就位并且不需要旋转。
对于双向迭代器，可以使用标准库的算法将查找插入点的二分搜索替换为反向线性搜索std::find_if_not。

Four Live Examples(C++14, C++11, C++98 and Boost, C++98) for the fragment below:

以下片段的四个实时示例（C++14、C++11、C++98 和 Boost、C++98）：

using RevIt = std::reverse_iterator<BiDirIt>;
auto const insertion = std::find_if_not(RevIt(it), RevIt(first), 
    [=](auto const& elem){ return cmp(*it, elem); }
).base();

For random inputs this gives O(N2)comparisons, but this improves to O(N)comparisons for almost sorted inputs. The binary search always uses O(N log N)comparisons.
For small input ranges, the better memory locality (cache, prefetching) of a linear search might also dominate a binary search (one should test this, of course).

对于随机输入，这提供了O(N2)比较，但这改进了O(N)对几乎排序输入的比较。二分查找总是使用O(N log N)比较。
对于较小的输入范围，线性搜索更好的内存局部性（缓存、预取）也可能主导二分搜索（当然，应该对此进行测试）。

Quick sort

快速排序

When carefully implemented, quick sortis robust and has O(N log N)expected complexity, but with O(N2)worst-case complexity that can be triggered with adversarially chosen input data. When a stable sort is not needed, quick sort is an excellent general-purpose sort.

如果仔细实施，快速排序是健壮的并且具有O(N log N)预期的复杂性，但O(N2)最坏情况的复杂性可以通过对抗性选择的输入数据触发。当不需要稳定排序时，快速排序是一种很好的通用排序。

Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle elementof the input range [first, last)as the pivot, then use two calls to std::partition(which are O(N)) to three-way partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:

即使对于最简单的版本，使用标准库实现快速排序也比其他经典排序算法要复杂得多。下面的方法使用一些迭代器实用程序来定位输入范围的中间元素[first, last)作为主元，然后使用两次调用std::partition（它们是O(N)）将输入范围三路划分为小于、等于、和大于选定的枢轴，分别。最后，对元素小于和大于主元的两个外部段进行递归排序：

template<class FwdIt, class Compare = std::less<>>
void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    auto const N = std::distance(first, last);
    if (N <= 1) return;
    auto const pivot = *std::next(first, N / 2);
    auto const middle1 = std::partition(first, last, [=](auto const& elem){ 
        return cmp(elem, pivot); 
    });
    auto const middle2 = std::partition(middle1, last, [=](auto const& elem){ 
        return !cmp(pivot, elem);
    });
    quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp));
    quick_sort(middle2, last, cmp);  // assert(std::is_sorted(middle2, last, cmp));
}

However, quick sort is rather tricky to get correct and efficient, as each of the above steps has to be carefully checked and optimized for production level code. In particular, for O(N log N)complexity, the pivot has to result into a balanced partition of the input data, which cannot be guaranteed in general for an O(1)pivot, but which can be guaranteed if one sets the pivot as the O(N)median of the input range.

然而，快速排序要获得正确和高效是相当棘手的，因为必须针对生产级代码仔细检查和优化上述每个步骤。特别是，对于O(N log N)复杂性，pivot 必须导致输入数据的平衡分区，这对于O(1)pivot 来说通常不能保证，但是如果将 pivot 设置为O(N)输入范围的中值，则可以保证。

Details omitted:

细节省略：

the above implementation is particularly vulnerable to special inputs, e.g. it has O(N^2)complexity for the "organ pipe" input 1, 2, 3, ..., N/2, ... 3, 2, 1(because the middle is always larger than all other elements).
median-of-3pivot selection from randomly chosen elementsfrom the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to O(N^2).
3-way partitioning(separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to std::partitionis not the most efficient O(N)algorithm to achieve this result.
for random access iterators, a guaranteed O(N log N)complexity can be achieved through median pivot selectionusing std::nth_element(first, middle, last), followed by recursive calls to quick_sort(first, middle, cmp)and quick_sort(middle, last, cmp).
this guarantee comes at a cost, however, because the constant factor of the O(N)complexity of std::nth_elementcan be more expensive than that of the O(1)complexity of a median-of-3 pivot followed by an O(N)call to std::partition(which is a cache-friendly single forward pass over the data).

上述实现特别容易受到特殊输入的影响，例如它O(N^2)对于“风琴管”输入具有复杂性1, 2, 3, ..., N/2, ... 3, 2, 1（因为中间总是大于所有其他元素）。
从输入范围中随机选择的元素中的中值 3 主元选择可防止几乎排序的输入，否则其复杂性会恶化到O(N^2)。
两次调用所示的3 路分区（分隔小于、等于和大于主元的元素）std::partition并不是O(N)实现此结果的最有效算法。
对于随机访问迭代器，O(N log N)可以通过使用中值枢轴选择std::nth_element(first, middle, last)，然后递归调用quick_sort(first, middle, cmp)和来实现有保证的复杂性quick_sort(middle, last, cmp)。
然而，这种保证是有代价的，因为O(N)复杂度的常数因子std::nth_element可能比O(1)中位数为 3 的枢轴的复杂性更昂贵，然后O(N)调用std::partition（这是一个缓存友好的单前向传递数据）。

Merge sort

归并排序

If using O(N)extra space is of no concern, then merge sortis an excellent choice: it is the only stableO(N log N)sorting algorithm.

如果O(N)不需要额外的空间，那么归并排序是一个很好的选择：它是唯一稳定的O(N log N)排序算法。

It is simple to implement using Standard algorithms: use a few iterator utilities to locate the middle of the input range [first, last)and combine two recursively sorted segments with a std::inplace_merge:

使用标准算法很容易实现：使用一些迭代器实用程序来定位输入范围的中间位置[first, last)，并将两个递归排序的段与组合起来std::inplace_merge：

template<class BiDirIt, class Compare = std::less<>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{})
{
    auto const N = std::distance(first, last);
    if (N <= 1) return;                   
    auto const middle = std::next(first, N / 2);
    merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp));
    merge_sort(middle, last, cmp);  // assert(std::is_sorted(middle, last, cmp));
    std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp));
}

Merge sort requires bidirectional iterators, the bottleneck being the std::inplace_merge. Note that when sorting linked lists, merge sort requires only O(log N)extra space (for recursion). The latter algorithm is implemented by std::list<T>::sortin the Standard Library.

归并排序需要双向迭代器，瓶颈是std::inplace_merge. 请注意，在对链表进行排序时，归并排序只需要O(log N)额外的空间（用于递归）。后一种算法由std::list<T>::sort标准库实现。

Heap sort

堆排序

Heap sortis simple to implement, performs an O(N log N)in-place sort, but is not stable.

堆排序实现简单，执行O(N log N)就地排序，但不稳定。

The first loop, O(N)"heapify" phase, puts the array into heap order. The second loop, the O(N log N) "sortdown" phase, repeatedly extracts the maximum and restores heap order. The Standard Library makes this extremely straightforward:

第一个循环，O(N)“heapify”阶段，将数组放入堆顺序。第二个循环，即O(N log N)“排序”阶段，反复提取最大值并恢复堆顺序。标准库使这变得非常简单：

template<class RandomIt, class Compare = std::less<>>
void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp));
    lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp));
}

In case you consider it "cheating" to use std::make_heapand std::sort_heap, you can go one level deeper and write those functions yourself in terms of std::push_heapand std::pop_heap, respectively:

如果您认为使用std::make_heapand是“作弊” std::sort_heap，您可以更深入一层，分别根据std::push_heapand自己编写这些函数std::pop_heap：

namespace lib {

// NOTE: is O(N log N), not O(N) as std::make_heap
template<class RandomIt, class Compare = std::less<>>
void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last;) {
        std::push_heap(first, ++it, cmp); 
        assert(std::is_heap(first, it, cmp));           
    }
}

template<class RandomIt, class Compare = std::less<>>
void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    for (auto it = last; it != first;) {
        std::pop_heap(first, it--, cmp);
        assert(std::is_heap(first, it, cmp));           
    } 
}

}   // namespace lib

The Standard Library specifies both push_heapand pop_heapas complexity O(log N). Note however that the outer loop over the range [first, last)results in O(N log N)complexity for make_heap, whereas std::make_heaphas only O(N)complexity. For the overall O(N log N)complexity of heap_sortit doesn't matter.

标准库将push_heap和都指定pop_heap为复杂性O(log N)。但是请注意，范围内的外循环[first, last)导致的O(N log N)复杂性make_heap，而std::make_heap只有O(N)复杂性。对于它的整体O(N log N)复杂性heap_sort并不重要。

Details omitted: O(N)implementation of make_heap

细节省略：O(N)实现make_heap

Testing

测试

Here are four Live Examples(C++14, C++11, C++98 and Boost, C++98) testing all five algorithms on a variety of inputs (not meant to be exhaustive or rigorous). Just note the huge differences in the LOC: C++11/C++14 need around 130 LOC, C++98 and Boost 190 (+50%) and C++98 more than 270 (+100%).

这里有四个实时示例（C++14、C++11、C++98 和 Boost、C++98）在各种输入上测试所有五种算法（并非详尽或严格）。请注意 LOC 的巨大差异：C++11/C++14 需要大约 130 个 LOC，C++98 和 Boost 190 (+50%) 和 C++98 需要超过 270 (+100%)。

Answer 2

回答by Morwenn

Another small and rather elegant one originally found on code review. I thought it was worth sharing.

最初在代码中发现的另一个小而优雅的。我认为值得分享。

Counting sort

计数排序

While it is rather specialized, counting sortis a simple integer sorting algorithm and can often be really fast provided the values of the integers to sort are not too far apart. It's probably ideal if one ever needs to sort a collection of one million integers known to be between 0 and 100 for example.

虽然它相当专业，但计数排序是一种简单的整数排序算法，并且如果要排序的整数值相距不太远，通常可以非常快。例如，如果需要对已知介于 0 和 100 之间的一百万个整数的集合进行排序，这可能是理想的。

To implement a very simple counting sort that works with both signed and unsigned integers, one needs to find the smallest and greatest elements in the collection to sort; their difference will tell the size of the array of counts to allocate. Then, a second pass through the collection is done to count the number of occurrences of every element. Finally, we write back the required number of every integer back to the original collection.

要实现一种适用于有符号和无符号整数的非常简单的计数排序，需要找到集合中最小和最大的元素进行排序；它们的差异将告诉要分配的计数数组的大小。然后，对集合进行第二次遍历以计算每个元素的出现次数。最后，我们将每个整数的所需数量写回原始集合。

template<typename ForwardIterator>
void counting_sort(ForwardIterator first, ForwardIterator last)
{
    if (first == last || std::next(first) == last) return;

    auto minmax = std::minmax_element(first, last);  // avoid if possible.
    auto min = *minmax.first;
    auto max = *minmax.second;
    if (min == max) return;

    using difference_type = typename std::iterator_traits<ForwardIterator>::difference_type;
    std::vector<difference_type> counts(max - min + 1, 0);

    for (auto it = first ; it != last ; ++it) {
        ++counts[*it - min];
    }

    for (auto count: counts) {
        first = std::fill_n(first, count, min++);
    }
}

While it is only useful when the range of the integers to sort is known to be small (generally not larger than the size of the collection to sort), making counting sort more generic would make it slower for its best cases. If the range is not known to be small, another algorithm such a radix sort, ska_sortor spreadsortcan be used instead.

虽然它仅在已知要排序的整数范围很小（通常不大于要排序的集合的大小）时才有用，但使计数排序更通用会使其在最佳情况下变慢。如果已知范围很小，则可以使用另一种算法，例如radix sort、ska_sort或spreadsort。

Details omitted:

细节省略：

We could have passed the bounds of the range of values accepted by the algorithm as parameters to totally get rid of the first std::minmax_elementpass through the collection. This will make the algorithm even faster when a usefully-small range limit is known by other means. (It doesn't have to be exact; passing a constant 0 to 100 is still muchbetter than an extra pass over a million elements to find out that the true bounds are 1 to 95. Even 0 to 1000 would be worth it; the extra elements are written once with zero and read once).
Growing countson the fly is another way to avoid a separate first pass. Doubling the countssize each time it has to grow gives amortized O(1) time per sorted element (see hash table insertion cost analysis for the proof that exponential grown is the key). Growing at the end for a new maxis easy with std::vector::resizeto add new zeroed elements. Changing minon the fly and inserting new zeroed elements at the front can be done with std::copy_backwardafter growing the vector. Then std::fillto zero the new elements.
The countsincrement loop is a histogram. If the data is likely to be highly repetitive, and the number of bins is small, it can be worth unrolling over multiple arraysto reduce the serializing data dependency bottleneck of store/reload to the same bin. This means more counts to zero at the start, and more to loop over at the end, but should be worth it on most CPUs for our example of millions of 0 to 100 numbers, especially if the input might already be (partially) sorted and have long runs of the same number.
In the algorithm above, we use a min == maxcheck to return early when every element has the same value (in which case the collection is sorted). It is actually possible to instead fully check whether the collection is already sorted while finding the extreme values of a collection with no additional time wasted (if the first pass is still memory bottlenecked with the extra work of updating min and max). However such an algorithm does not exist in the standard library and writing one would be more tedious than writing the rest of counting sort itself. It is left as an exercise for the reader.
Since the algorithm only works with integer values, static assertions could be used to prevent users from making obvious type mistakes. In some contexts, a substitution failure with std::enable_if_tmight be preferred.
While modern C++ is cool, future C++ could be even cooler: structured bindingsand some parts of the Ranges TSwould make the algorithm even cleaner.

我们可以将算法接受的值范围的边界作为参数传递，以完全摆脱第std::minmax_element一次遍历集合。当通过其他方式知道有用的小范围限制时，这将使算法更快。（不一定要精确；传递一个常数 0 到 100 仍然比额外传递超过一百万个元素要好得多，以找出真正的边界是 1 到 95。即使是 0 到 1000 也是值得的；额外的元素用零写入一次并读取一次）。
快速增长counts是避免单独的第一次通过的另一种方法。counts每次必须增长时将大小加倍，每个排序元素的分摊时间为 O(1)（请参阅哈希表插入成本分析以证明指数增长是关键）。通过添加新的归零元素，最后为新的增长max很容易std::vector::resize。min可以std::copy_backward在增加向量后进行动态更改并在前面插入新的归零元素。然后std::fill将新元素归零。
该counts增量环是直方图。如果数据很可能是高度重复的，并且 bin 的数量很少，那么展开多个数组以减少存储/重新加载到同一 bin 的序列化数据依赖性瓶颈可能是值得的。这意味着在开始时更多的计数为零，并且在结束时更多地循环，但对于我们的数百万个 0 到 100 个数字的示例，在大多数 CPU 上应该是值得的，尤其是如果输入可能已经（部分）排序并且长跑相同的数字。
在上面的算法中，min == max当每个元素具有相同的值时（在这种情况下集合已排序），我们使用检查来提前返回。实际上可以在不浪费额外时间的情况下完全检查集合是否已经排序，同时查找集合的极值（如果第一遍仍然是内存瓶颈，需要额外的更新 min 和 max 工作）。然而，标准库中不存在这样的算法，编写一个比编写计数排序本身的其余部分更乏味。它留给读者作为练习。
由于该算法仅适用于整数值，因此可以使用静态断言来防止用户犯明显的类型错误。在某些情况下，替换失败std::enable_if_t可能是首选。
虽然现代 C++ 很酷，但未来的 C++ 可能更酷：结构化绑定和Ranges TS 的某些部分将使算法更加清晰。

如何在现代 C++ 中实现经典的排序算法？

提问by TemplateRex

回答by TemplateRex

Algorithmic building blocks

算法构建块

Syntactical goodies

语法好东西

C++ Style

C++ 风格

Selection sort

选择排序

Insertion sort

插入排序

Quick sort

快速排序

Merge sort

归并排序

Heap sort

堆排序

Testing

测试

回答by Morwenn

Counting sort

计数排序

相关推荐

最近更新

标签

如何在现代 C++ 中实现经典的排序算法？

提问by TemplateRex

回答by TemplateRex

Algorithmic building blocks

算法构建块

Syntactical goodies

语法好东西

C++ Style

C++ 风格

Selection sort

选择排序

Insertion sort

插入排序

Quick sort

快速排序

Merge sort

归并排序

Heap sort

堆排序

Testing

测试

回答by Morwenn

Counting sort

计数排序

相关推荐

C++ 函数指针的解除引用是如何发生的？

使用字符数组反转字符串 C++

C++ 如何让 CMake 在 Mac OS X 上使用 GCC 而不是 Clang？

C++ 从字符串中给定结束索引的字符串中复制子字符串

相关推荐

最近更新

标签