C++ qsort 与 std::sort 的性能?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4708105/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Performance of qsort vs std::sort?
提问by Chan
According Scott Meyers, in his Effective STL book - item 46. He claimed that std::sort
is about 670% faster than std::qsort
due to the fact of inline. I tested myself, and I saw that qsort is faster :( ! Could anyone help me to explain this strange behavior?
根据 Scott Meyers 的说法,在他的 Effective STL 书中 - 第 46 项。他声称这std::sort
比std::qsort
由于内联的事实快约 670% 。我测试了自己,我发现 qsort 更快 :( !谁能帮我解释一下这种奇怪的行为?
#include <iostream>
#include <vector>
#include <algorithm>
#include <cstdlib>
#include <ctime>
#include <cstdio>
const size_t LARGE_SIZE = 100000;
struct rnd {
int operator()() {
return rand() % LARGE_SIZE;
}
};
int comp( const void* a, const void* b ) {
return ( *( int* )a - *( int* )b );
}
int main() {
int ary[LARGE_SIZE];
int ary_copy[LARGE_SIZE];
// generate random data
std::generate( ary, ary + LARGE_SIZE, rnd() );
std::copy( ary, ary + LARGE_SIZE, ary_copy );
// get time
std::time_t start = std::clock();
// perform quick sort C using function pointer
std::qsort( ary, LARGE_SIZE, sizeof( int ), comp );
std::cout << "C quick-sort time elapsed: " << static_cast<double>( clock() - start ) / CLOCKS_PER_SEC << "\n";
// get time again
start = std::clock();
// perform quick sort C++ using function object
std::sort( ary_copy, ary_copy + LARGE_SIZE );
std::cout << "C++ quick-sort time elapsed: " << static_cast<double>( clock() - start ) / CLOCKS_PER_SEC << "\n";
}
This is my result:
这是我的结果:
C quick-sort time elapsed: 0.061
C++ quick-sort time elapsed: 0.086
Press any key to continue . . .
Update
更新
Effective STL 3rd Edition ( 2001 )
Chapter 7 Programming with STL
Item 46: Consider function objects instead of functions as algorithm parameters.
Effective STL 第 3 版 (2001)
第 7 章使用 STL 编程
第 46 项:将函数对象而不是函数视为算法参数。
Best regards,
此致,
回答by Puppy
std::clock() is not a viable timing clock. You should use a platform-specific higher resolution timer, like the Windows High Performance Timer. More than that, the way that you call clock() is that first, text is output to the console, which is included in the time. This definitely invalidates the test. In addition, make sure that you compiled with all optimizations.
std::clock() 不是一个可行的计时时钟。您应该使用特定于平台的更高分辨率计时器,例如 Windows 高性能计时器。更重要的是,您调用 clock() 的方式是首先将文本输出到控制台,该控制台包含在时间中。这肯定会使测试无效。此外,请确保您使用所有优化进行编译。
Finally, I copied and pasted your code, and got 0.016 for qsort and 0.008 for std::sort.
最后,我复制并粘贴了您的代码,qsort 为 0.016,std::sort 为 0.008。
回答by rasmus
I am surprised that no one mentions caches.
我很惊讶没有人提到缓存。
In your code, you start by touching aryand *ary_copy* so they are resident in the cache at the time of qsort. During qsort, *ary_copy* might get evicted. At the time of std::sort, the elements would have to be fetched from memory or a larger (read slower) cache level. This will of course depend on your cache sizes.
在您的代码中,您首先接触ary和 *ary_copy*,以便它们在qsort时驻留在缓存中。在qsort期间, *ary_copy* 可能会被驱逐。在std::sort 时,必须从内存或更大(读取较慢)缓存级别获取元素。这当然取决于您的缓存大小。
Try to reverse the test, i.e., start by running std::sort.
尝试反转测试,即从运行std::sort 开始。
As some people have pointed out; making the array larger will make the test more fair. The reason is that a large array is less likely to fit in cache.
正如一些人指出的那样;使数组更大将使测试更公平。原因是大数组不太可能放入缓存。
回答by templatetypedef
The two sorting algorithms, without optimizations enabled, should have comparable performance. The reason that the C++ sort
tends to appreciably beat qsort
is that the compiler can inline the comparisons being made, since the compiler has type information about what function is being used to perform the comparison. Did you run these tests with optimization enabled? If not, try turning it on and running this test again.
这两种排序算法,在没有启用优化的情况下,应该具有相当的性能。C++sort
明显优于C++ 的原因qsort
是编译器可以内联进行的比较,因为编译器具有关于正在使用什么函数来执行比较的类型信息。您是否在启用优化的情况下运行这些测试?如果没有,请尝试将其打开并再次运行此测试。
回答by Zan Lynx
Another reason that qsort may perform much better than expected is that newer compilers can inline and optimize through the function pointer.
qsort 性能可能比预期好得多的另一个原因是,较新的编译器可以通过函数指针进行内联和优化。
If the C header defines an inline implementation of qsort instead of implementing it inside of a library and the compiler supports indirect function inlining, then qsort can be just as fast as std::sort.
如果 C 头文件定义了 qsort 的内联实现而不是在库内部实现它,并且编译器支持间接函数内联,那么 qsort 可以与 std::sort 一样快。
回答by 6502
On my machine adding some meat (making the array 10 million elements and moving it in the data section) and compiling with
在我的机器上添加一些肉(使数组有 1000 万个元素并将其移动到数据部分)并编译
g++ -Wall -O2 -osortspeed sortspeed.cpp
I get as result
我得到结果
C quick-sort time elapsed: 3.48
C++ quick-sort time elapsed: 1.26
Be also careful about modern "green" CPUs that may be configured to run at a variable speed depending on the load of the system. When benchmarking, this kind of behavior can drive you crazy (on my machine I've a small script that fixes CPU clock that I use when making speed tests).
还要注意现代“绿色”CPU,它们可能被配置为根据系统负载以可变速度运行。在进行基准测试时,这种行为可能会让您发疯(在我的机器上,我有一个小脚本来修复我在进行速度测试时使用的 CPU 时钟)。
回答by Mr Bingley
Writing accurate benchmarks is difficult, so let's get Noniusto do it for us! Let's test qsort
, std::sort
with no inlining, and std::sort
with inlining (the default) on a vector of a million random integers.
编写准确的基准测试很困难,所以让Nonius为我们做吧!让我们在一个包含一百万个随机整数的向量上测试qsort
,std::sort
没有内联和std::sort
内联(默认)。
// sort.cpp
#define NONIUS_RUNNER
#include <nonius.h++>
#include <random>
#include <algorithm>
// qsort
int comp(const void* a, const void* b) {
const int arg1 = *static_cast<const int*>(a);
const int arg2 = *static_cast<const int*>(b);
// we can't simply return a - b, because that might under/overflow
return (arg1 > arg2) - (arg1 < arg2);
}
// std::sort with no inlining
struct compare_noinline {
__attribute__((noinline)) bool operator()(const int a, const int b) {
return a < b;
}
};
// std::sort with inlining
struct compare {
// the compiler will automatically inline this
bool operator()(const int a, const int b) {
return a < b;
}
};
std::vector<int> gen_random_vector(const size_t size) {
std::random_device seed;
std::default_random_engine engine{seed()};
std::uniform_int_distribution<int> dist{std::numeric_limits<int>::min(), std::numeric_limits<int>::max()};
std::vector<int> vec;
for (size_t i = 0; i < size; i += 1) {
const int rand_int = dist(engine);
vec.push_back(rand_int);
}
return vec;
}
// generate a vector of a million random integers
constexpr size_t size = 1'000'000;
static const std::vector<int> rand_vec = gen_random_vector(size);
NONIUS_BENCHMARK("qsort", [](nonius::chronometer meter) {
// Nonius does multiple runs of the benchmark, and each one needs a new
// copy of the original vector, otherwise we'd just be sorting the same
// one over and over
const size_t runs = static_cast<size_t>(meter.runs());
std::vector<std::vector<int>> vectors{runs};
std::fill(vectors.begin(), vectors.end(), rand_vec);
meter.measure([&](const size_t run) {
std::vector<int>& current_vec = vectors[run];
std::qsort(current_vec.data(), current_vec.size(), sizeof(int), comp);
return current_vec;
});
});
NONIUS_BENCHMARK("std::sort noinline", [](nonius::chronometer meter) {
const size_t runs = static_cast<size_t>(meter.runs());
std::vector<std::vector<int>> vectors{runs};
std::fill(vectors.begin(), vectors.end(), rand_vec);
meter.measure([&](const size_t run) {
std::vector<int>& current_vec = vectors[run];
std::sort(current_vec.begin(), current_vec.end(), compare_noinline{});
return current_vec;
});
});
NONIUS_BENCHMARK("std::sort inline", [](nonius::chronometer meter) {
const size_t runs = static_cast<size_t>(meter.runs());
std::vector<std::vector<int>> vectors{runs};
std::fill(vectors.begin(), vectors.end(), rand_vec);
meter.measure([&](const size_t run) {
std::vector<int>& current_vec = vectors[run];
std::sort(current_vec.begin(), current_vec.end(), compare{});
return current_vec;
});
});
Compiling with Apple Clang 7.3.0,
使用 Apple Clang 7.3.0 编译,
$ clang++ -std=c++14 -stdlib=libc++ -O3 -march=native sort.cpp -o sort
$ ./sort
and running it on my 1.7 GHz i5 Macbook Air, we get
并在我的 1.7 GHz i5 Macbook Air 上运行它,我们得到
qsort 211 ms +/- 6 ms
std::sort noinline 127 ms +/- 5 ms
std::sort inline 87 ms +/- 4 ms
So std::sort
with no inlining is about 1.7x faster than qsort
(perhaps due to different sorting algorithms), and inlining bumps that up to about 2.4x faster. Certainly an impressive speedup, but much less than 670%.
因此std::sort
,没有内联的速度大约比qsort
(可能是由于不同的排序算法)快 1.7 倍,而内联的颠簸速度大约快 2.4 倍。当然是令人印象深刻的加速,但远低于 670%。
回答by Dennis Beier
Don′t know how std::sort was implemented years ago. But std::sort can be much faster, because std::sort is quicksort with a heap sort fallback. Heapsort is a linearhitmic alghoritm, meaning if you have twice the sorting data, the sorting time doubles. Actually it more than doubles because it is not exactly linear, but however, qsort can be quadratic, so needing exponential more time for sorting twice the input.
不知道几年前 std::sort 是如何实现的。但是 std::sort 可以快得多,因为 std::sort 是带有堆排序回退的快速排序。Heapsort 是一个线性hitmic alghoritm,这意味着如果你有两倍的排序数据,排序时间就会加倍。实际上它增加了一倍多,因为它不是完全线性的,但是,qsort 可以是二次的,因此需要更多的指数时间来对两倍的输入进行排序。