与 C++ 中的普通指针相比,智能指针的开销是多少?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22295665/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How much is the overhead of smart pointers compared to normal pointers in C++?
提问by Venemo
How much is the overhead of smart pointers compared to normal pointers in C++11? In other words, is my code going to be slower if I use smart pointers, and if so, how much slower?
与 C++11 中的普通指针相比,智能指针的开销是多少?换句话说,如果我使用智能指针,我的代码是否会变慢,如果是,那么慢多少?
Specifically, I'm asking about the C++11 std::shared_ptr
and std::unique_ptr
.
具体来说,我问的是 C++11std::shared_ptr
和std::unique_ptr
.
Obviously, the stuff pushed down the stack is going to be larger (at least I think so), because a smart pointer also needs to store its internal state (reference count, etc), the question really is, how much is this going to affect my performance, if at all?
显然,压下堆栈的东西会更大(至少我是这么认为的),因为智能指针还需要存储它的内部状态(引用计数等),问题是,这要多少?影响我的表现,如果有的话?
For example, I return a smart pointer from a function instead of a normal pointer:
例如,我从函数返回一个智能指针而不是普通指针:
std::shared_ptr<const Value> getValue();
// versus
const Value *getValue();
Or, for example, when one of my functions accept a smart pointer as parameter instead of a normal pointer:
或者,例如,当我的一个函数接受智能指针作为参数而不是普通指针时:
void setValue(std::shared_ptr<const Value> val);
// versus
void setValue(const Value *val);
回答by lisyarus
std::unique_ptr
has memory overhead only if you provide it with some non-trivial deleter.
std::unique_ptr
仅当您为它提供一些非平凡的删除器时才有内存开销。
std::shared_ptr
always has memory overhead for reference counter, though it is very small.
std::shared_ptr
引用计数器总是有内存开销,尽管它非常小。
std::unique_ptr
has time overhead only during constructor (if it has to copy the provided deleter and/or null-initialize the pointer) and during destructor (to destroy the owned object).
std::unique_ptr
只有在构造函数(如果它必须复制提供的删除器和/或空初始化指针)和析构函数(销毁拥有的对象)期间才有时间开销。
std::shared_ptr
has time overhead in constructor (to create the reference counter), in destructor (to decrement the reference counter and possibly destroy the object) and in assignment operator (to increment the reference counter). Due to thread-safety guarantees of std::shared_ptr
, these increments/decrements are atomic, thus adding some more overhead.
std::shared_ptr
在构造函数(创建引用计数器)、析构函数(减少引用计数器并可能销毁对象)和赋值运算符(增加引用计数器)中都有时间开销。由于 的线程安全保证std::shared_ptr
,这些增量/减量是原子的,因此会增加一些开销。
Note that none of them has time overhead in dereferencing (in getting the reference to owned object), while this operation seems to be the most common for pointers.
请注意,它们都没有取消引用(获取对拥有对象的引用)的时间开销,而此操作似乎是最常见的指针操作。
To sum up, there is some overhead, but it shouldn't make the code slow unless you continuously create and destroy smart pointers.
总而言之,有一些开销,但它不应该使代码变慢,除非您不断地创建和销毁智能指针。
回答by Cheers and hth. - Alf
As with all code performance, the only really reliable means to obtain hard information is to measureand/or inspectmachine code.
与所有代码性能一样,获取硬信息的唯一真正可靠的方法是测量和/或检查机器代码。
That said, simple reasoning says that
也就是说,简单的推理说
You can expect some overhead in debug builds, since e.g.
operator->
must be executed as a function call so that you can step into it (this is in turn due to general lack of support for marking classes and functions as non-debug).For
shared_ptr
you can expect some overhead in initial creation, since that involves dynamic allocation of a control block, and dynamic allocation is very much slower than any other basic operation in C++ (do usemake_shared
when practically possible, to minimize that overhead).Also for
shared_ptr
there is some minimal overhead in maintaining a reference count, e.g. when passing ashared_ptr
by value, but there's no such overhead forunique_ptr
.
您可以预期在调试构建中会有一些开销,因为 eg
operator->
必须作为函数调用执行,以便您可以进入它(这反过来是由于普遍缺乏对将类和函数标记为非调试的支持)。因为
shared_ptr
您可以在初始创建时预期一些开销,因为这涉及控制块的动态分配,并且动态分配比 C++ 中的任何其他基本操作慢得多(make_shared
在实际可能的情况下使用,以最小化该开销)。也因为
shared_ptr
在维护引用计数方面有一些最小的开销,例如当传递一个shared_ptr
按值时,但没有这样的开销unique_ptr
。
Keeping the first point above in mind, when you measure, do that both for debug and release builds.
记住上面的第一点,当你测量时,对调试和发布版本都这样做。
The international C++ standardization committee has published a technical report on performance, but this was in 2006, before unique_ptr
and shared_ptr
were added to the standard library. Still, smart pointers were old hat at that point, so the report considered also that. Quoting the relevant part:
国际 C++ 标准化委员会发布了一份关于性能的技术报告,但这是在 2006 年之前unique_ptr
,shared_ptr
并被添加到标准库中。尽管如此,智能指针在那时还是过时的,所以报告也考虑了这一点。引用相关部分:
“if accessing a value through a trivial smart pointer is significantly slower than accessing it through an ordinary pointer, the compiler is inefficiently handling the abstraction. In the past, most compilers had significant abstraction penalties and several current compilers still do. However, at least two compilers have been reported to have abstraction penalties below 1% and another a penalty of 3%, so eliminating this kind of overhead is well within the state of the art”
“如果通过简单的智能指针访问一个值比通过普通指针访问它慢得多,那么编译器处理抽象的效率很低。过去,大多数编译器都有严重的抽象惩罚,现在的一些编译器仍然如此。但是,据报道至少有两个编译器的抽象惩罚低于 1%,另一个是 3%,因此消除这种开销完全属于最先进的技术”
As an informed guess, the “well within the state of the art” has been achieved with the most popular compilers today, as of early 2014.
作为知情的猜测,截至 2014 年初,当今最流行的编译器已经实现了“最先进的技术”。
回答by Lothar
My answer is different from the others and i really wonder if they ever profiled code.
我的回答与其他人不同,我真的很想知道他们是否曾经分析过代码。
shared_ptr has a significant overhead for creation because of it's memory allocation for the control block (which keeps the ref counter and a pointer list to all weak references). It has also a huge memory overhead because of this and the fact that std::shared_ptr is always a 2 pointer tuple (one to the object, one to the control block).
shared_ptr 有很大的创建开销,因为它为控制块分配了内存(它保留了 ref 计数器和指向所有弱引用的指针列表)。由于这一点以及 std::shared_ptr 始终是一个 2 指针元组(一个指向对象,一个指向控制块),它也具有巨大的内存开销。
If you pass a shared_pointer to a function as a value parameter then it will be at least 10 times slower then a normal call and create lots of codes in the code segment for the stack unwinding. If you pass it by reference you get an additional indirection which can be also pretty worse in terms of performance.
如果您将 shared_pointer 作为值参数传递给函数,那么它至少会比正常调用慢 10 倍,并在代码段中为堆栈展开创建大量代码。如果你通过引用传递它,你会得到一个额外的间接性,这在性能方面也很糟糕。
Thats why you should not do this unless the function is really involved in ownership management. Otherwise use "shared_ptr.get()". It is not designed to make sure your object isn't killed during a normal function call.
这就是为什么您不应该这样做的原因,除非该功能真正涉及所有权管理。否则使用“shared_ptr.get()”。它并非旨在确保您的对象在正常函数调用期间不会被杀死。
If you go mad and use shared_ptr on small objects like an abstract syntax tree in a compiler or on small nodes in any other graph structure you will see a huge perfomance drop and a huge memory increase. I have seen a parser system which was rewritten soon after C++14 hit the market and before the programmer learned to use smart pointers correctly. The rewrite was a magnitude slower then the old code.
如果您发疯并在诸如编译器中的抽象语法树之类的小对象或任何其他图形结构中的小节点上使用 shared_ptr ,您将看到性能大幅下降和内存大幅增加。我见过一个解析器系统,它在 C++14 上市后不久和程序员学会正确使用智能指针之前被重写。重写比旧代码慢了一个数量级。
It is not a silver bullet and raw pointers aren't bad by definition either. Bad programmers are bad and bad design is bad. Design with care, design with clear ownership in mind and try to use the shared_ptr mostly on the subsystem API boundary.
它不是灵丹妙药,原始指针也不是坏的定义。糟糕的程序员很糟糕,糟糕的设计也很糟糕。谨慎设计,在设计时牢记明确的所有权,并尽量在子系统 API 边界上使用 shared_ptr。
If you want to learn more you can watch Nicolai M. Josuttis good talk about "The Real Price of Shared Pointers in C++" https://vimeo.com/131189627
It goes deep into the implementation details and CPU architecture for write barriers, atomic locks etc. once listening you will never talk about this feature being cheap. If you just want a proof of the magnitude slower, skip the first 48 minutes and watch him running example code which runs upto 180 times slower (compiled with -O3) when using shared pointer everywhere.
如果您想了解更多信息,可以观看 Nicolai M. Josuttis 的精彩演讲“C++ 中共享指针的真实价格” https://vimeo.com/131189627
深入探讨写屏障、原子的实现细节和 CPU 架构锁等一旦听过,您将永远不会谈论此功能便宜。如果您只想证明速度较慢,请跳过前 48 分钟并观看他运行的示例代码,当在任何地方使用共享指针时,该示例代码的运行速度要慢 180 倍(使用 -O3 编译)。
回答by Claudiordgz
In other words, is my code going to be slower if I use smart pointers, and if so, how much slower?
换句话说,如果我使用智能指针,我的代码是否会变慢,如果是,那么慢多少?
Slower? Most likely not, unless you are creating a huge index using shared_ptrs and you have not enough memory to the point that your computer starts wrinkling, like an old lady being plummeted to the ground by an unbearable force from afar.
慢点?很可能不会,除非您正在使用 shared_ptrs 创建一个巨大的索引并且您没有足够的内存到您的计算机开始起皱的地步,就像一位老妇人被远方无法承受的力量摔倒在地。
What would make your code slower is sluggish searches, unnecessary loop processing, huge copies of data, and a lot of write operations to disk (like hundreds).
会使您的代码变慢的原因是搜索缓慢、不必要的循环处理、大量数据副本以及对磁盘的大量写入操作(例如数百个)。
The advantages of a smart pointer are all related to management. But is the overhead necessary?This depends on your implementation. Let's say you are iterating over an array of 3 phases, each phase has an array of 1024 elements. Creating a smart_ptr
for this process might be overkill, since once the iteration is done you'll know you have to erase it. So you could gain extra memory from not using a smart_ptr
...
智能指针的优点都与管理有关。但是开销是必要的吗?这取决于您的实施。假设您正在迭代一个包含 3 个阶段的数组,每个阶段都有一个包含 1024 个元素的数组。smart_ptr
为这个过程创建一个可能有点矫枉过正,因为一旦迭代完成,你就会知道你必须删除它。因此,您可以通过不使用smart_ptr
...
A single memory leak could make your product have a point of failure in time (let's say your program leaks 4 megabytes each hour, it would take months to break a computer, nevertheless, it will break, you know it because the leak is there).
单个内存泄漏可能会使您的产品及时出现故障点(假设您的程序每小时泄漏 4 兆字节,损坏计算机需要几个月的时间,但是,它会损坏,您知道,因为泄漏就在那里) .
Is like saying "you software is guaranteed for 3 months, then, call me for service."
就像说“你的软件保修3个月,然后打电话给我服务”。
So in the end it really is a matter of... can you handle this risk? does using a raw pointer to handle your indexing over hundreds of different objects is worth loosing control of the memory.
所以最后真的是……你能应对这种风险吗?使用原始指针来处理数百个不同对象的索引确实值得失去对内存的控制。
If the answer is yes, then use a raw pointer.
如果答案是肯定的,则使用原始指针。
If you don't even want to consider it, a smart_ptr
is a good, viable, and awesome solution.
如果您甚至不想考虑它,asmart_ptr
是一个很好的、可行的、很棒的解决方案。
回答by liqg3
Just for a glimpse and just for the []
operator,it is ~5X slower than the raw pointer as demonstrated in the following code, which was compiled using gcc -lstdc++ -std=c++14 -O0
and outputted this result:
只是为了一瞥,仅供[]
操作员使用,它比原始指针慢约 5 倍,如以下代码所示,该代码是使用gcc -lstdc++ -std=c++14 -O0
此结果编译并输出的:
malloc []: 414252610
unique [] is: 2062494135
uq get [] is: 238801500
uq.get()[] is: 1505169542
new is: 241049490
I'm beginning to learn c++, I got this in my mind: you always need to know what are you doing and take more time to know what others had done in your c++.
我开始学习 c++,我想到了这一点:你总是需要知道你在做什么,并花更多的时间去了解其他人在你的 c++ 中做了什么。
EDIT
编辑
As methioned by @Mohan Kumar, I provided more details. The gcc version is 7.4.0 (Ubuntu 7.4.0-1ubuntu1~14.04~ppa1)
, The above result was obtained when the -O0
is used, however, when I use '-O2' flag, I got this:
正如@Mohan Kumar 所说,我提供了更多细节。gcc 版本是7.4.0 (Ubuntu 7.4.0-1ubuntu1~14.04~ppa1)
,上面的结果是在-O0
使用时得到的,但是,当我使用 '-O2' 标志时,我得到了这个:
malloc []: 223
unique [] is: 105586217
uq get [] is: 71129461
uq.get()[] is: 69246502
new is: 9683
Then shifted to clang version 3.9.0
, -O0
was :
然后转移到clang version 3.9.0
,-O0
是:
malloc []: 409765889
unique [] is: 1351714189
uq get [] is: 256090843
uq.get()[] is: 1026846852
new is: 255421307
-O2
was:
-O2
曾是:
malloc []: 150
unique [] is: 124
uq get [] is: 83
uq.get()[] is: 83
new is: 54
The result of clang -O2
is amazing.
clang 的结果-O2
是惊人的。
#include <memory>
#include <iostream>
#include <chrono>
#include <thread>
uint32_t n = 100000000;
void t_m(void){
auto a = (char*) malloc(n*sizeof(char));
for(uint32_t i=0; i<n; i++) a[i] = 'A';
}
void t_u(void){
auto a = std::unique_ptr<char[]>(new char[n]);
for(uint32_t i=0; i<n; i++) a[i] = 'A';
}
void t_u2(void){
auto a = std::unique_ptr<char[]>(new char[n]);
auto tmp = a.get();
for(uint32_t i=0; i<n; i++) tmp[i] = 'A';
}
void t_u3(void){
auto a = std::unique_ptr<char[]>(new char[n]);
for(uint32_t i=0; i<n; i++) a.get()[i] = 'A';
}
void t_new(void){
auto a = new char[n];
for(uint32_t i=0; i<n; i++) a[i] = 'A';
}
int main(){
auto start = std::chrono::high_resolution_clock::now();
t_m();
auto end1 = std::chrono::high_resolution_clock::now();
t_u();
auto end2 = std::chrono::high_resolution_clock::now();
t_u2();
auto end3 = std::chrono::high_resolution_clock::now();
t_u3();
auto end4 = std::chrono::high_resolution_clock::now();
t_new();
auto end5 = std::chrono::high_resolution_clock::now();
std::cout << "malloc []: " << (end1 - start).count() << std::endl;
std::cout << "unique [] is: " << (end2 - end1).count() << std::endl;
std::cout << "uq get [] is: " << (end3 - end2).count() << std::endl;
std::cout << "uq.get()[] is: " << (end4 - end3).count() << std::endl;
std::cout << "new is: " << (end5 - end4).count() << std::endl;
}