C++ 为什么 std::vector::operator[] 比 std::vector::at() 快 5 到 10 倍?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3269809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why is std::vector::operator[] 5 to 10 times faster than std::vector::at()?
提问by Poni
During program optimization, trying to optimize a loop that iterates through a vector, I found the following fact: ::std::vector::at() is EXTREMELY slower than operator[] !
在程序优化过程中,尝试优化遍历向量的循环,我发现以下事实:::std::vector::at() 比 operator[] 慢得多!
The operator[] is 5 to 10 times faster than at(), both in release & debug builds (VS2008 x86).
在发布和调试版本 (VS2008 x86) 中, operator[] 比 at() 快 5 到 10 倍。
Reading a bit on the web got me to realize that at() has boundary checking. Ok, but, slowing the operation by up to 10 times?!
在网上阅读了一些让我意识到 at() 有边界检查。好的,但是,将操作减慢多达 10 倍?!
Is there any reason for that? I mean, boundary checking is a simple number comparison, or am I missing something?
The question is what is the real reason for this performance hit?
Further more, is there any way to make it even faster?
有什么理由吗?我的意思是,边界检查是一个简单的数字比较,还是我遗漏了什么?
问题是这种性能下降的真正原因是什么?
此外,有没有办法让它更快?
I'm certainly going to swap all my at() calls with [] in other code parts (in which I already have custom boundary check!).
我肯定会将所有 at() 调用与其他代码部分中的 [] 交换(其中我已经进行了自定义边界检查!)。
Proof of concept:
概念证明:
#define _WIN32_WINNT 0x0400
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <conio.h>
#include <vector>
#define ELEMENTS_IN_VECTOR 1000000
int main()
{
__int64 freq, start, end, diff_Result;
if(!::QueryPerformanceFrequency((LARGE_INTEGER*)&freq))
throw "Not supported!";
freq /= 1000000; // microseconds!
::std::vector<int> vec;
vec.reserve(ELEMENTS_IN_VECTOR);
for(int i = 0; i < ELEMENTS_IN_VECTOR; i++)
vec.push_back(i);
int xyz = 0;
printf("Press any key to start!");
_getch();
printf(" Running speed test..\n");
{ // at()
::QueryPerformanceCounter((LARGE_INTEGER*)&start);
for(int i = 0; i < ELEMENTS_IN_VECTOR; i++)
xyz += vec.at(i);
::QueryPerformanceCounter((LARGE_INTEGER*)&end);
diff_Result = (end - start) / freq;
}
printf("Result\t\t: %u\n\n", diff_Result);
printf("Press any key to start!");
_getch();
printf(" Running speed test..\n");
{ // operator []
::QueryPerformanceCounter((LARGE_INTEGER*)&start);
for(int i = 0; i < ELEMENTS_IN_VECTOR; i++)
xyz -= vec[i];
::QueryPerformanceCounter((LARGE_INTEGER*)&end);
diff_Result = (end - start) / freq;
}
printf("Result\t\t: %u\n", diff_Result);
_getch();
return xyz;
}
Edit:
Now the value is being assiged to "xyz", so the compiler will not "wipe" it out.
编辑:
现在该值被分配给“xyz”,因此编译器不会“擦除”它。
回答by Mike Seymour
The reason is that an unchecked access can probably be done with a single processor instruction. A checked access will also have to load the size from memory, compare it with the index, and (assuming it's in range) skip over a conditional branch to the error handler. There may be more faffing around to handle the possibility of throwing an exception. This will be many times slower, and this is precisely why you have both options.
原因是未经检查的访问可能可以通过单个处理器指令完成。检查访问还必须从内存加载大小,将其与索引进行比较,并且(假设它在范围内)跳过条件分支到错误处理程序。处理抛出异常的可能性可能会有更多的麻烦。这会慢很多倍,这正是您有两种选择的原因。
If you can prove that the index is within range without a runtime check then use operator[]
. Otherwise, use at()
, or add your own check before access. operator[]
should be more or less as fast as possible, but will explode messily if the index is invalid.
如果您可以证明索引在没有运行时检查的范围内,则使用operator[]
. 否则,at()
请在访问前使用或添加您自己的检查。operator[]
应该或多或少尽可能快,但如果索引无效,则会爆炸。
回答by James McNellis
I ran your test code on my machine:
我在我的机器上运行了你的测试代码:
In an unoptimized debug build, the difference between the two loops is insignificant.
在未优化的调试版本中,两个循环之间的差异是微不足道的。
In an optimized release build, the second for loop is optimized out entirely (the call to operator[]
is likely inlined and the optimizer can see that the loop does nothing and can remove the whole loop).
在优化的发布版本中,第二个 for 循环被完全优化(调用operator[]
可能是内联的,优化器可以看到循环什么也不做,可以删除整个循环)。
If I change the body of the loops to do some actual work, e.g., vec.at(i)++;
and vec[i]++;
, respectively, the difference between the two loops is insignificant.
如果我改变循环体来做一些实际的工作,例如,vec.at(i)++;
和vec[i]++;
,分别,两个循环之间的区别是微不足道的。
I don't see this five to tenfold performance difference that you see.
我没有看到您看到的这种五到十倍的性能差异。
回答by Ben Voigt
You don't do anything with the return value, so if the compiler inlines these functions it can optimize them away completely. Or perhaps it can optimize away the subscript ([]
) version completely. Running without optimizations is useless from a performance measurement perspective, what you need is some simple but useful program to exercise the functions so they don't just get optimized away. For example you could shuffle the vector (randomly swap 50000 pairs of elements).
你不会对返回值做任何事情,所以如果编译器内联这些函数,它可以完全优化它们。或者它可以完全优化掉下标 ( []
) 版本。从性能测量的角度来看,在没有优化的情况下运行是无用的,您需要的是一些简单但有用的程序来练习这些功能,这样它们就不会被优化掉。例如,您可以打乱向量(随机交换 50000 对元素)。