C++ shared_ptr:可怕的速度
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3628081/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
shared_ptr: horrible speed
提问by Ian
When comparing two variants of pointers—classic vs. shared_ptr—I was surprised by a significant increase of the running speed of the program. For testing 2D Delaunay incremental Insertion algorithm has been used.
在比较指针的两种变体——经典与 shared_ptr 时——我对程序运行速度的显着提高感到惊讶。为了测试 2D Delaunay 增量插入算法已被使用。
Compiler settings:
编译器设置:
VS 2010 (release) /O2 /MD /GL, W7 Prof, CPU 3.GHZ DualCore
VS 2010 (release) /O2 /MD /GL, W7 Prof, CPU 3.GHZ DualCore
Results:
结果:
shared_ptr (C++ 0x00):
shared_ptr (C++ 0x00):
N[points] t[sec]
100 000 6
200 000 11
300 000 16
900 000 36
Pointers:
指针:
N[points] t[sec]
100 000 0,5
200 000 1
300 000 2
900 000 4
Running time of the shared_ptr versions is approximately 10 times longer. Is this caused by the compiler settings or C++ 0x00 shared_ptr implementation is so slow?
shared_ptr 版本的运行时间大约长 10 倍。这是编译器设置引起的还是C++ 0x00 shared_ptr 实现太慢了?
VS2010 Profiler: For raw pointers about 60% of the time is spent by heuristic searching of the triangle containing inserted point (it is OK, it is a well-known fact). But for the shared_ptr version approx 58% of the time is spent using shared_ptr.reset() and only 10% is used for heuristic searching.
VS2010 Profiler:对于原始指针,大约 60% 的时间用于启发式搜索包含插入点的三角形(没关系,这是众所周知的事实)。但是对于 shared_ptr 版本,大约 58% 的时间用于使用 shared_ptr.reset() 而只有 10% 用于启发式搜索。
Testing code with raw pointers:
使用原始指针测试代码:
void DT2D::DT ( Node2DList *nl, HalfEdgesList *half_edges_dt, bool print )
{
// Create 2D Delaunay triangulation using incremental insertion method
unsigned int nodes_count_before = nl->size();
// Remove duplicit points
nl->removeDuplicitPoints();
// Get nodes count after deletion of duplicated points
unsigned int nodes_count_after = nl->size();
//Print info
std::cout << "> Starting DT, please wait... ";
std::cout << nodes_count_after << " points, " << ( nodes_count_before - nodes_count_after ) << " removed.";
// Are in triangulation more than three points
try
{
//There are at least 3 points
if ( nodes_count_after > 2 )
{
// Create simplex triangle
createSimplexTriangle ( nl, half_edges_dt );
// Increment nodes count
nodes_count_after += 3;
// Starting half edge using for searching
HalfEdge *e_heuristic = ( *half_edges_dt ) [0];
// Insert all points into triangulation using incremental method
for ( unsigned int i = 3; i < nodes_count_after; i++ ) // Jump over simplex
{
DTInsertPoint ( ( *nl ) [i], &e_heuristic, half_edges_dt );
}
//Corect boundary triangles (swap edges in triangles adjacent to simplex triangles).
//They are legal due to DT, but not creating the convex hull )
correctBoundaryTriangles ( nl, half_edges_dt );
// Remove triangles having simplex points
removeSimplexTriangles ( nl, half_edges_dt );
}
//Print results
std::cout << " Completed." << std::endl;
}
Insert point procedure:
插入点程序:
void DT2D::DTInsertPoint ( Point2D *p, HalfEdge **e1, HalfEdgesList *half_edges_dt )
{
// One step of the Delaunay triangulation, incremental insertion by de Berg (2001)
short status = -1;
//Pointers
HalfEdge *e31 = NULL;
HalfEdge *e21 = NULL;
HalfEdge *e12 = NULL;
HalfEdge *e32 = NULL;
HalfEdge *e23 = NULL;
HalfEdge *e13 = NULL;
HalfEdge *e53 = NULL;
HalfEdge *e44 = NULL;
HalfEdge *e63 = NULL;
try
{
// Test, if point lies inside triangle
*e1 = LawsonOrientedWalk::findTriangleWalk ( p, &status, *e1, 0 );
if ( e1 != NULL )
{
// Edges inside triangle lies the point
HalfEdge *e2 = ( *e1 )->getNextEdge();
HalfEdge *e3 = e2->getNextEdge();
// Point lies inside the triangle
if ( status == 1 )
{
// Create first new triangle T1, twin edges set after creation
e31 = new HalfEdge ( p, *e1, NULL );
e21 = new HalfEdge ( e2->getPoint(), e31, NULL );
( *e1 )->setNextEdge ( e21 );
// Create second new triangle T2, twin edges set after creation
e12 = new HalfEdge ( p, e2, NULL );
e32 = new HalfEdge ( e3->getPoint(), e12, NULL );
e2->setNextEdge ( e32 );
// Create third new triangle T3, twin edges set after creation
e23 = new HalfEdge ( p, e3, NULL );
e13 = new HalfEdge ( ( *e1 )->getPoint(), e23, NULL );
e3->setNextEdge ( e13 );
// Set twin edges in T1, T2, T3
e12->setTwinEdge ( e21 );
e21->setTwinEdge ( e12 );
e13->setTwinEdge ( e31 );
e31->setTwinEdge ( e13 );
e23->setTwinEdge ( e32 );
e32->setTwinEdge ( e23 );
// Add new edges into list
half_edges_dt->push_back ( e21 );
half_edges_dt->push_back ( e12 );
half_edges_dt->push_back ( e31 );
half_edges_dt->push_back ( e13 );
half_edges_dt->push_back ( e32 );
half_edges_dt->push_back ( e23 );
// Legalize triangle T1
if ( ( *e1 )->getTwinEdge() != NULL )
{
legalizeTriangle ( p, *e1 );
}
// Legalize triangle T2
if ( e2->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e2 );
}
// Legalize triangle T3
if ( e3->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e3 );
}
}
// Point lies on the edge of the triangle
else if ( status == 2 )
{
// Find adjacent triangle
HalfEdge *e4 = ( *e1 )->getTwinEdge();
HalfEdge *e5 = e4->getNextEdge();
HalfEdge *e6 = e5->getNextEdge();
// Create first new triangle T1, twin edges set after creation
e21 = new HalfEdge ( p, e3, NULL );
( *e1 )->setNextEdge ( e21 );
// Create second new triangle T2, OK
e12 = new HalfEdge ( p, e2, e4 );
e32 = new HalfEdge ( e3->getPoint(), e12, e21 );
e2->setNextEdge ( e32 );
// Create third new triangle T3, twin edges set after creation
e53 = new HalfEdge ( p, e6, NULL );
e4->setNextEdge ( e53 );
// Create fourth new triangle T4, OK
e44 = new HalfEdge ( p, e5, *e1 );
e63 = new HalfEdge ( e6->getPoint(), e44, e53 );
e5->setNextEdge ( e63 );
// Set twin edges in T1, T3
e21->setTwinEdge ( e32 );
( *e1 )->setTwinEdge ( e44 );
e53->setTwinEdge ( e63 );
e4->setTwinEdge ( e12 );
// Add new edges into list
half_edges_dt->push_back ( e21 );
half_edges_dt->push_back ( e12 );
half_edges_dt->push_back ( e32 );
half_edges_dt->push_back ( e53 );
half_edges_dt->push_back ( e63 );
half_edges_dt->push_back ( e44 );
// Legalize triangle T1
if ( e3->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e3 );
}
// Legalize triangle T4
if ( e5->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e5 );
}
// Legalize triangle T3
if ( e6->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e6 );
}
// Legalize triangle T2
if ( e2->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e2 );
}
}
}
}
//Throw exception
catch ( std::bad_alloc &e )
{
//Free memory
if ( e31 != NULL ) delete e31;
if ( e21 != NULL ) delete e21;
if ( e12 != NULL ) delete e12;
if ( e32 != NULL ) delete e32;
if ( e23 != NULL ) delete e23;
if ( e13 != NULL ) delete e13;
if ( e53 != NULL ) delete e53;
if ( e44 != NULL ) delete e44;
if ( e63 != NULL ) delete e63;
//Throw exception
throw ErrorBadAlloc ( "EErrorBadAlloc: ", "Delaunay triangulation: Can not create new triangles for inserted point p." );
}
//Throw exception
catch ( ErrorMathZeroDevision &e )
{
//Free memory
if ( e31 != NULL ) delete e31;
if ( e21 != NULL ) delete e21;
if ( e12 != NULL ) delete e12;
if ( e32 != NULL ) delete e32;
if ( e23 != NULL ) delete e23;
if ( e13 != NULL ) delete e13;
if ( e53 != NULL ) delete e53;
if ( e44 != NULL ) delete e44;
if ( e63 != NULL ) delete e63;
//Throw exception
throw ErrorBadAlloc ( "EErrorMathZeroDevision: ", "Delaunay triangulation: Can not create new triangles for inserted point p." );
}
}
Testing code with shared_ptr:
使用 shared_ptr 测试代码:
Code was rewritten without any optimization...
代码在没有任何优化的情况下被重写...
void DT2D::DTInsertPoint ( std::shared_ptr <Point2D> p, std::shared_ptr <HalfEdge> *e1, HalfEdgesList * half_edges_dt )
{
// One step of the Delaunay triangulation, incremental insertion by de Berg (2001)
short status = -1;
//Pointers
std::shared_ptr <HalfEdge> e31;
std::shared_ptr <HalfEdge> e21;
std::shared_ptr <HalfEdge> e12;
std::shared_ptr <HalfEdge> e32;
std::shared_ptr <HalfEdge> e23;
std::shared_ptr <HalfEdge> e13;
std::shared_ptr <HalfEdge> e53;
std::shared_ptr <HalfEdge> e44;
std::shared_ptr <HalfEdge> e63;
try
{
// Test, if point lies inside triangle
*e1 = LawsonOrientedWalk::findTriangleWalk ( p, &status, *e1, 0 );
if ( e1 != NULL )
{
// Edges inside triangle lies the point
std::shared_ptr <HalfEdge> e2((*e1 )->getNextEdge());
std::shared_ptr <HalfEdge> e3(e2->getNextEdge());
// Point lies inside the triangle
if ( status == 1 )
{
// Create first new triangle T1, twin edges set after creation
e31.reset( new HalfEdge ( p, *e1, NULL ));
e21.reset( new HalfEdge ( e2->getPoint(), e31, NULL ));
( *e1 )->setNextEdge ( e21 );
// Create second new triangle T2, twin edges set after creation
e12.reset( new HalfEdge ( p, e2, NULL ));
e32.reset( new HalfEdge ( e3->getPoint(), e12, NULL ));
e2->setNextEdge ( e32 );
// Create third new triangle T3, twin edges set after creation
e23.reset( new HalfEdge ( p, e3, NULL ));
e13.reset( new HalfEdge ( ( *e1 )->getPoint(), e23, NULL ));
e3->setNextEdge ( e13 );
// Set twin edges in T1, T2, T3
e12->setTwinEdge ( e21 );
e21->setTwinEdge ( e12 );
e13->setTwinEdge ( e31 );
e31->setTwinEdge ( e13 );
e23->setTwinEdge ( e32 );
e32->setTwinEdge ( e23 );
// Add new edges into list
half_edges_dt->push_back ( e21 );
half_edges_dt->push_back ( e12 );
half_edges_dt->push_back ( e31 );
half_edges_dt->push_back ( e13 );
half_edges_dt->push_back ( e32 );
half_edges_dt->push_back ( e23 );
// Legalize triangle T1
if ( ( *e1 )->getTwinEdge() != NULL )
{
legalizeTriangle ( p, *e1 );
}
// Legalize triangle T2
if ( e2->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e2 );
}
// Legalize triangle T3
if ( e3->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e3 );
}
}
// Point lies on the edge of the triangle
else if ( status == 2 )
{
// Find adjacent triangle
std::shared_ptr <HalfEdge> e4 = ( *e1 )->getTwinEdge();
std::shared_ptr <HalfEdge> e5 = e4->getNextEdge();
std::shared_ptr <HalfEdge> e6 = e5->getNextEdge();
// Create first new triangle T1, twin edges set after creation
e21.reset(new HalfEdge ( p, e3, NULL ));
( *e1 )->setNextEdge ( e21 );
// Create second new triangle T2, OK
e12.reset(new HalfEdge ( p, e2, e4 ));
e32.reset(new HalfEdge ( e3->getPoint(), e12, e21 ));
e2->setNextEdge ( e32 );
// Create third new triangle T3, twin edges set after creation
e53.reset(new HalfEdge ( p, e6, NULL ));
e4->setNextEdge ( e53 );
// Create fourth new triangle T4, OK
e44.reset(new HalfEdge ( p, e5, *e1 ));
e63.reset(new HalfEdge ( e6->getPoint(), e44, e53 ));
e5->setNextEdge ( e63 );
// Set twin edges in T1, T3
e21->setTwinEdge ( e32 );
( *e1 )->setTwinEdge ( e44 );
e53->setTwinEdge ( e63 );
e4->setTwinEdge ( e12 );
// Add new edges into list
half_edges_dt->push_back ( e21 );
half_edges_dt->push_back ( e12 );
half_edges_dt->push_back ( e32 );
half_edges_dt->push_back ( e53 );
half_edges_dt->push_back ( e63 );
half_edges_dt->push_back ( e44 );
// Legalize triangle T1
if ( e3->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e3 );
}
// Legalize triangle T4
if ( e5->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e5 );
}
// Legalize triangle T3
if ( e6->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e6 );
}
// Legalize triangle T2
if ( e2->getTwinEdge() != NULL )
{
legalizeTriangle ( p, e2 );
}
}
}
}
//Throw exception
catch ( std::bad_alloc &e )
{
/*
//Free memory
if ( e31 != NULL ) delete e31;
if ( e21 != NULL ) delete e21;
if ( e12 != NULL ) delete e12;
if ( e32 != NULL ) delete e32;
if ( e23 != NULL ) delete e23;
if ( e13 != NULL ) delete e13;
if ( e53 != NULL ) delete e53;
if ( e44 != NULL ) delete e44;
if ( e63 != NULL ) delete e63;
*/
//Throw exception
throw ErrorBadAlloc ( "EErrorBadAlloc: ", "Delaunay triangulation: Can not create new triangles for inserted point p." );
}
//Throw exception
catch ( ErrorMathZeroDevision &e )
{
/*
//Free memory
if ( e31 != NULL ) delete e31;
if ( e21 != NULL ) delete e21;
if ( e12 != NULL ) delete e12;
if ( e32 != NULL ) delete e32;
if ( e23 != NULL ) delete e23;
if ( e13 != NULL ) delete e13;
if ( e53 != NULL ) delete e53;
if ( e44 != NULL ) delete e44;
if ( e63 != NULL ) delete e63;
*/
//Throw exception
throw ErrorBadAlloc ( "EErrorMathZeroDevision: ", "Delaunay triangulation: Can not create new triangles for inserted point p." );
}
}
Thanks for your help...
谢谢你的帮助...
Edit
编辑
I replaced direct passing of all objects with alias passing &. Copy constructors are used less frequent then before.
我用别名传递 & 替换了所有对象的直接传递。复制构造函数的使用频率低于以前。
Updated tables for shared_ptr
更新了 shared_ptr 的表
shared_ptr (C++ 0x00) old:
shared_ptr (C++ 0x00) 旧:
N[points] t[sec]
100 000 6
200 000 11
300 000 16
900 000 36
shared_ptr (C++ 0x00) new version:
shared_ptr (C++ 0x00) 新版本:
N[points] t[sec]
100 000 2
200 000 5
300 000 9
900 000 24
There is a considerable improvement, but the shared_ptr version is still 4 times slower than raw pointer one. I am afraid that running speed of the program can not be significantly increased.
有相当大的改进,但 shared_ptr 版本仍然比原始指针版本慢 4 倍。恐怕程序的运行速度无法大幅提升。
回答by Matthieu M.
shared_ptr
are the most complicated type of pointer ever:
shared_ptr
是有史以来最复杂的指针类型:
- Ref counting takes time
- Multiple allocation (there are 3 parts: the object, the counter, the deleter)
- A number of virtual methods (in the counter and the deleter) for type erasure
- Works among multiple threads (thus synchronization)
- 参考计数需要时间
- 多重分配(有3个部分:对象、计数器、删除器)
- 一些用于类型擦除的虚拟方法(在计数器和删除器中)
- 在多个线程之间工作(因此是同步的)
There are 2 ways to make them faster:
有两种方法可以使它们更快:
- use
make_shared
to allocate them, because (unfortunately) the normal constructor allocates two different blocks: one for the object and one for the counter and deleter. - don't copy themif you don't need to: methods should accept
shared_ptr<T> const&
- 用于
make_shared
分配它们,因为(不幸的是)普通构造函数分配了两个不同的块:一个用于对象,另一个用于计数器和删除器。 - 如果不需要,请不要复制它们:方法应该接受
shared_ptr<T> const&
But there are also many ways NOT to use them.
但是也有很多方法不使用它们。
Looking at your code it looks like your doing a LOT of memory allocation, and I can't help but wonder if you couldn't find a better strategy. I must admit I didn't got the full figure, so I may be heading straight into a wall but...
看看你的代码,你似乎做了很多内存分配,我不禁想知道你是否找不到更好的策略。我必须承认我没有得到完整的数字,所以我可能会直接撞到墙但是......
Usually code is much simpler if you have an owner for each of the objects. Therefore, shared_ptr
should be a last resort measure, employed when you can't get a single owner.
如果每个对象都有一个所有者,通常代码会简单得多。因此,shared_ptr
应该是不得已而为之的措施,当您无法获得单个所有者时使用。
Anyway, we're comparing apples and oranges here, the original code is buggy. You take care of deleting
the memory (good) but you forgot that these objects were also referenced from other points in the program e1->setNextEdge(e21)
which now holds pointers to destructed objects (in a free'd memory zone). Therefore I guess that in case of exception you just wipe out the entire list ? (Or somehow bet on undefined behavior to play nice)
无论如何,我们在这里比较苹果和橙子,原始代码有问题。您处理deleting
了内存(很好),但您忘记了这些对象也被程序中的其他点引用,这些点e1->setNextEdge(e21)
现在保存指向已破坏对象的指针(在空闲的内存区域中)。因此,我想如果出现异常,您只需清除整个列表?(或者以某种方式押注未定义的行为以表现良好)
So it's hard to judge on performances since the former doesn't recover well from exceptions while the latter does.
所以很难判断性能,因为前者不能很好地从异常中恢复,而后者可以。
Finally: Have you thought about using intrusive_ptr? It could give you some boost (hehe) if you don't synchronize them (single thread) and you would avoid a lot of stuff performed by the shared_ptr
as well as gain on locality of reference.
最后:你有没有想过使用intrusive_ptr?如果您不同步它们(单线程),它可以给您一些提升(呵呵),并且您将避免由 执行的很多东西shared_ptr
以及参考局部性的增益。
回答by Just another metaprogrammer
I always recommend using std::shared_ptr<> instead of relying on manual memory life-time management. However, automatic lifetime management costs something but usually not significant.
我总是建议使用 std::shared_ptr<> 而不是依赖手动内存生命周期管理。然而,自动生命周期管理会花费一些成本,但通常并不重要。
In your case you noticed shared_ptr<> is significant and as some said you should make sure that you don't unnecessarily copies a shared pointer as that force an addref/release.
在您的情况下,您注意到 shared_ptr<> 很重要,正如一些人所说,您应该确保不要不必要地复制共享指针,因为这会强制添加引用/释放。
But there's another question in the background: Do you really need to rely on new/delete in the first place? new/delete uses malloc/free which are not tuned for allocations of small objects.
但是在后台还有另一个问题:您真的需要首先依赖 new/delete 吗?new/delete 使用 malloc/free,它们没有针对小对象的分配进行调整。
A library that helped me alot before is boost::object_pool.
以前对我有很大帮助的库是boost::object_pool。
At some stage I wanted to create graphs very fast. Nodes and edges are naturally dynamically allocated and I get two costs from doing that.
在某个阶段,我想非常快速地创建图表。节点和边自然是动态分配的,我这样做会得到两个成本。
- malloc/free
- Memory lifetime management
- malloc/免费
- 内存生命周期管理
boost:object_pool helps reduce both these costs at the costs of not being as general as malloc/free.
boost:object_pool 有助于降低这些成本,但代价是不像 malloc/free 那样通用。
So as an example let's say we have a simple node like this:
举个例子,假设我们有一个像这样的简单节点:
struct node
{
node * left;
node * right;
};
So instead of allocation node with new I use boost::object_pool. But boost::object_pool also tracks all instance allocated with it so at the end of my calculation I destroyed object_pool and didn't need to track each node thus simplifying my code and improving the performance.
因此,我使用 boost::object_pool 代替分配节点。但是 boost::object_pool 也跟踪分配给它的所有实例,所以在我的计算结束时我销毁了 object_pool 并且不需要跟踪每个节点,从而简化了我的代码并提高了性能。
I did some performance testing (I wrote my own pool class just for fun but bool::object_pool should give the same performance or better).
我做了一些性能测试(我写了自己的池类只是为了好玩,但 bool::object_pool 应该提供相同的性能或更好)。
10,000,000 nodes created and destroyed
创建和销毁 10,000,000 个节点
- Plain new/delete: 2.5secs
- shared_ptr: 5secs
- boost::object_pool: 0.15secs
- 普通新建/删除:2.5 秒
- shared_ptr: 5 秒
- boost::object_pool:0.15 秒
So if boost::object_pool works for you it might help reduce the memory allocation overhead significantly.
因此,如果 boost::object_pool 适合您,它可能有助于显着减少内存分配开销。
回答by David Rodríguez - dribeas
By default, if you create your shared pointers the na?ve way (i.e. shared_ptr<type> p( new type )
) you incur two memory allocations, one for the actual object and an extra allocation for the reference count. You can avoid the extra allocation by making use of the make_shared
template that will perform a single instantiation for both the object and the reference count and then in-place construct the object.
默认情况下,如果您以天真的方式(即shared_ptr<type> p( new type )
)创建共享指针,则会导致两次内存分配,一次用于实际对象,一次用于引用计数。您可以通过使用make_shared
模板来避免额外的分配,该模板将为对象和引用计数执行单个实例化,然后就地构造对象。
The rest of the extra costs are quite small compared with doubling the calls to malloc, like incrementing and decrementing the count (both atomic operations) and testing for deletion. If you can provide some code in how you are using the pointers/shared pointers you might get a better insight as to what is actually going on in the code.
与对 malloc 的调用加倍相比,其余的额外成本非常小,例如递增和递减计数(都是原子操作)和测试删除。如果您可以提供一些关于如何使用指针/共享指针的代码,您可能会对代码中实际发生的事情有更好的了解。
回答by Ken Simon
Try it in "release" mode and see if you get closer benchmarks. Debug mode tends to turn on lots of assertions in the STL which slow lots of things down.
在“发布”模式下尝试一下,看看是否可以获得更接近的基准。调试模式倾向于打开 STL 中的大量断言,这会减慢很多事情的速度。
回答by jalf
shared_ptr
arenoticeably slower than raw pointers. That's why they should only be used if you actually needshared ownership semantics.
shared_ptr
是明显比原始指针慢。这就是为什么只有在您确实需要共享所有权语义时才应该使用它们。
Otherwise, there are several other smart pointer types available. scoped_ptr
and auto_ptr
(C++03) or unique_ptr
(C++0x) both have their uses. And often, the best solution is not to use a pointer of any kind, and just write your own RAII class instead.
否则,还有其他几种智能指针类型可用。scoped_ptr
和auto_ptr
(C++03) 或unique_ptr
(C++0x) 都有它们的用途。通常,最好的解决方案不是使用任何类型的指针,而是编写自己的 RAII 类。
A shared_ptr
has to increment/decrement/read the reference counter, and depending on the implementation and how it is instantiated, the ref counter may be allocated separately, causing potential cache misses. And it has to access the ref counter atomically, which adds additional overhead.
Ashared_ptr
必须递增/递减/读取引用计数器,并且根据实现及其实例化方式,引用计数器可能会单独分配,从而导致潜在的缓存未命中。它必须以原子方式访问 ref 计数器,这会增加额外的开销。
回答by Steve Townsend
It's impossible to answer this without more data. Have you profiled the code to accurately identify the source of the slowdown in the shared_ptr version? Using the container will certainly add overhead but I'd be surprised if it makes it 10x slower.
没有更多数据就无法回答这个问题。您是否对代码进行了分析以准确识别 shared_ptr 版本中速度变慢的来源?使用容器肯定会增加开销,但如果它使它慢 10 倍,我会感到惊讶。
VSTS has nice perf tools that will attribute the CPU usage exactly if you can run this for 30 secs or so. If you don't have access to the VS Performance Tools or other profiling toolset, then run the shared_ptr code in the debugger and break into it 10 or 15 times to get a brute force sample of where it's spending all its time. This is surprisingly and counter-intuitively effective, I have found.
VSTS 有很好的性能工具,如果您可以运行它 30 秒左右,它将准确地归因于 CPU 使用情况。如果您无权访问 VS 性能工具或其他分析工具集,则在调试器中运行 shared_ptr 代码并闯入它 10 或 15 次,以获取其所有时间都花费在何处的蛮力示例。我发现这是令人惊讶且违反直觉的有效。
[EDIT] Do not pass your shared_ptr by value in that variant of the code - use ref to const. If this function is called a lot this will have measurable -ve impact.
[编辑] 不要在该代码变体中按值传递您的 shared_ptr - 使用 ref 到 const。如果这个函数被多次调用,这将产生可衡量的影响。
回答by dev1223
It's slow because it uses for reference inc/dec operations atomic instructions, thus it's horible slow. If you really need GC in C++, don't use naive RF GC and use some more developed RC strategy, or tracing GC. http://www.hboehm.info/gc/is nice for not speed critical tasks (but a lot better than "smart pointers" naive RC).
它很慢,因为它用于参考 inc/dec 操作原子指令,因此它非常慢。如果你真的需要 C++ 中的 GC,不要使用幼稚的 RF GC,而使用一些更成熟的 RC 策略,或者跟踪 GC。http://www.hboehm.info/gc/非常适合非速度关键任务(但比“智能指针”天真的 RC 好得多)。