C++ dynamic_cast 的性能?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4050901/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Performance of dynamic_cast?
提问by MOnsDaR
Before reading the question:
This question is not about how useful it is to use dynamic_cast
. Its just about its performance.
在阅读问题之前:
这个问题不是关于使用dynamic_cast
. 它只是关于它的性能。
I've recently developed a design where dynamic_cast
is used a lot.
When discussing it with co-workers almost everyone says that dynamic_cast
shouldn't be used because of its bad performance (these are co-workers which have different backgrounds and in some cases do not know each other. I'm working in a huge company)
我最近开发了一个dynamic_cast
经常使用的设计。
在与同事讨论时,几乎每个人都说dynamic_cast
不应该使用它,因为它的性能很差(这些同事具有不同的背景,在某些情况下彼此不认识。我在一家大公司工作)
I decided to test the performance of this method instead of just believing them.
我决定测试这种方法的性能,而不是仅仅相信它们。
The following code was used:
使用了以下代码:
ptime firstValue( microsec_clock::local_time() );
ChildObject* castedObject = dynamic_cast<ChildObject*>(parentObject);
ptime secondValue( microsec_clock::local_time() );
time_duration diff = secondValue - firstValue;
std::cout << "Cast1 lasts:\t" << diff.fractional_seconds() << " microsec" << std::endl;
The above code uses methods from boost::date_time
on Linux to get usable values.
I've done 3 dynamic_cast
in one execution, the code for measuring them is the same.
上面的代码使用boost::date_time
Linux 上的方法来获取可用的值。
我dynamic_cast
一次执行了 3 次,测量它们的代码是相同的。
The results of 1 execution were the following:
Cast1 lasts: 74 microsec
Cast2 lasts: 2 microsec
Cast3 lasts: 1 microsec
1 次执行的结果如下:
Cast1 持续时间:74 微秒
Cast2 持续时间:2 微秒
Cast3 持续时间:1 微秒
The first cast always took 74-111 microsec, the following casts in the same execution took 1-3 microsec.
第一次转换总是需要 74-111 微秒,相同执行中的后续转换需要 1-3 微秒。
So finally my questions:
Is dynamic_cast
really performing bad?
According to the testresults its not. Is my testcode correct?
Why do so much developers think that it is slow if it isn't?
所以最后我的问题
是:dynamic_cast
表现真的很糟糕吗?
根据测试结果它不是。我的测试代码正确吗?
为什么这么多开发人员认为它不是很慢?
回答by Oliver Charlesworth
Firstly, you need to measure the performance over a lot more than just a few iterations, as your results will be dominated by the resolution of the timer. Try e.g. 1 million+, in order to build up a representative picture. Also, this result is meaningless unless you compare it against something, i.e. doing the equivalent but without the dynamic casting.
首先,您需要通过多次迭代来衡量性能,因为您的结果将由计时器的分辨率决定。尝试例如 100 万+,以构建具有代表性的图片。此外,除非您将其与某些内容进行比较,否则此结果毫无意义,即进行等效但没有动态转换。
Secondly, you need to ensure the compiler isn't giving you false results by optimising away multiple dynamic casts on the same pointer (so use a loop, but use a different input pointer each time).
其次,您需要通过优化同一个指针上的多个动态强制转换来确保编译器不会给您错误的结果(因此使用循环,但每次使用不同的输入指针)。
Dynamic casting will be slower, because it needs to access the RTTI (run-time type information) table for the object, and check that the cast is valid. Then, in order to use it properly, you will need to add error-handling code that checks whether the returned pointer is NULL
. All of this takes up cycles.
动态转换会更慢,因为它需要访问对象的 RTTI(运行时类型信息)表,并检查转换是否有效。然后,为了正确使用它,您需要添加错误处理代码来检查返回的指针是否为NULL
。所有这些都需要循环。
I know you didn't want to talk about this, but "a design where dynamic_cast is used a lot" is probably an indicator that you're doing something wrong...
我知道你不想谈论这个,但是“一个经常使用 dynamic_cast 的设计”可能表明你做错了什么......
回答by MOnsDaR
Performance is meaningless without comparing equivalent functionality.Most people say dynamic_cast is slow without comparing to equivalent behavior. Call them out on this. Put another way:
如果不比较等效的功能,性能就毫无意义。大多数人说 dynamic_cast 与等效行为相比很慢。把他们叫出来。换一种方式:
If 'works' isn't a requirement, I can write code that fails faster than yours.
如果“有效”不是必需的,我可以编写比您更快失败的代码。
There are various ways to implement dynamic_cast, and some are faster than others. Stroustrup published a paper about using primes to improve dynamic_cast, for example. Unfortunately it's unusual to control how your compiler implements the cast, but if performance really matters to you, then you do have control over which compiler you use.
实现 dynamic_cast 的方法有很多种,有些方法比其他方法快。例如,Stroustrup 发表了一篇关于使用素数改进 dynamic_cast的论文。不幸的是,控制编译器如何实现转换是不寻常的,但如果性能对您来说真的很重要,那么您确实可以控制使用哪个编译器。
However, not usingdynamic_cast will alwaysbe faster than using it — but if you don't actually need dynamic_cast, then don't use it! If you do need dynamic lookup, then there will be some overhead, and you can then compare various strategies.
然而,不使用dynamic_cast总是比使用它快——但如果你实际上不需要 dynamic_cast,那就不要使用它!如果你确实需要动态查找,那么会有一些开销,然后你可以比较各种策略。
回答by VladV
Here are a few benchmarks:
http://tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/
http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast.html
以下是一些基准测试:
http: //tinodidriksen.com/2010/04/14/cpp-dynamic-cast-performance/
http://www.nerdblog.com/2006/12/how-slow-is-dynamiccast。 html
According to them, dynamic_cast is 5-30 times slower than reinterpret_cast, and the best alternative performs almost the same as reinterpret_cast.
根据他们的说法,dynamic_cast 比 reinterpret_cast 慢 5-30 倍,最佳替代方案的性能几乎与 reinterpret_cast 相同。
I'll quote the conclusion from the first article:
我引用第一篇文章的结论:
- dynamic_cast is slow for anything but casting to the base type; that particular cast is optimized out
- the inheritance level has a big impact on dynamic_cast
- member variable + reinterpret_cast is the fastest reliable way to
determine type; however, that has a lot higher maintenance overhead
when coding
- dynamic_cast 对于除转换为基本类型之外的任何事情都很慢;那个特定的演员被优化了
- 继承级别对 dynamic_cast 有很大影响
- 成员变量 + reinterpret_cast 是
确定类型最快的可靠方法;然而,这
在编码时有更高的维护开销
Absolute numbers are on the order of 100 ns for a single cast. Values like 74 msec doesn't seem close to reality.
单次转换的绝对数字约为 100 ns。像 74 毫秒这样的值似乎不太接近现实。
回答by Eugene Smith
Sorry to say this, but your test is virtually useless for determining whether the cast is slow or not. Microsecond resolution is nowhere near good enough. We're talking about an operation that, even in the worst case scenario, shouldn't take more than, say, 100 clock ticks, or less than 50 nanoseconds on a typical PC.
很抱歉这么说,但是您的测试对于确定演员是否缓慢几乎毫无用处。微秒分辨率远远不够好。我们讨论的是一种操作,即使在最坏的情况下,在典型的 PC 上也不应该花费超过 100 个时钟滴答或少于 50 纳秒。
There's no doubt that the dynamic cast will be slower than a static cast or a reinterpret cast, because, on the assembly level, the latter two will amount to an assignment (really fast, order of 1 clock tick), and the dynamic cast requires the code to go and inspect the object to determine its real type.
毫无疑问,动态转换会比静态转换或重新解释转换慢,因为,在装配级别,后两者相当于赋值(非常快,1 个时钟滴答的顺序),而动态转换需要用于检查对象以确定其真实类型的代码。
I can't say off-hand how slow it really is, that would probably vary from compiler to compiler, I'd need to see the assembly code generated for that line of code. But, like I said, 50 nanoseconds per call is the upper limit of what expect to be reasonable.
我不能直接说它到底有多慢,这可能因编译器而异,我需要查看为该行代码生成的汇编代码。但是,就像我说的,每次调用 50 纳秒是期望合理的上限。
回答by greggo
Your mileage may vary, to understate the situation.
您的里程可能会有所不同,以低估情况。
The performance of dynamic_cast depends a great deal on what you are doing, and can depend on what the names of classes are (and, comparing time relative to reinterpet_cast
seems odd, since in most cases that takes zero instructions for practical purposes, as does e.g. a cast from unsigned
to int
).
的dynamic_cast的性能取决于你在做什么很大,并能依靠什么类的名称(和相对时间比较reinterpet_cast
,似乎有些奇怪,因为在采取实际目的为零说明大多数情况下,如确实如投射unsigned
到int
)。
I've been looking into how it works in clang/g++. Assuming that you are dynamic_cast
ing from a B*
to a D*
, where B
is a (direct or indirect) base of D
, and disregarding multiple-base-class complications, It seems to work by calling a library function which does something like this:
我一直在研究它在 clang/g++ 中是如何工作的。假设您dynamic_cast
从 aB*
到 a D*
,其中B
是 的(直接或间接)基础D
,并且不考虑多基类的复杂性,它似乎通过调用执行以下操作的库函数来工作:
for dynamic_cast<D*>( p ) where p is B*
type_info const * curr_typ = &typeid( *p );
while(1) {
if( *curr_typ == typeid(D)) { return static_cast<D*>(p); } // success;
if( *curr_typ == typeid(B)) return nullptr; //failed
curr_typ = get_direct_base_type_of(*curr_typ); // magic internal operation
}
So, yes, it's pretty fast when *p
is actually a D
; just one successful type_info
compare.
The worst case is when the cast fails, and there are a lot of steps from D
to B
; in this case there are a lot of failed type comparisons.
所以,是的,它*p
实际上是一个D
; 只是一个成功的type_info
比较。最坏的情况是当演员表失败时,从D
到有很多步骤B
;在这种情况下,有很多失败的类型比较。
How long does type comparison take? it does this, on clang/g++:
类型比较需要多长时间?它在 clang/g++ 上这样做:
compare_eq( type_info const &a, type_info const & b ){
if( &a == &b) return true; // same object
return strcmp( a.name(), b.name())==0;
}
The strcmp is needed since it's possible to have two different type_info
objects representing the same type (although I'm pretty sure this only happens when one is in a shared library, and the other is not in that library). But, in most cases, when types are actually equal, they reference the same type_info; thus most successfultype comparisons are very fast.
strcmp 是必需的,因为可能有两个不同的type_info
对象表示相同的类型(尽管我很确定这只发生在一个在共享库中而另一个不在该库中时)。但是,在大多数情况下,当类型实际上相等时,它们引用相同的 type_info;因此,大多数成功的类型比较都非常快。
The name()
method just returns a pointer to a fixed string containing the mangled name of the class.
So there's another factor: if many of the classes on the way from D
to B
have names starting with MyAppNameSpace::AbstractSyntaxNode<
, then the failing compares are going to take longer than usual; the strcmp won't fail until it reaches a difference in the mangled type names.
该name()
方法只返回一个指向包含类的重整名称的固定字符串的指针。所以还有另一个因素:如果从D
到 的许多类的B
名称都以 开头MyAppNameSpace::AbstractSyntaxNode<
,那么失败的比较将花费比平常更长的时间;strcmp 不会失败,直到它达到损坏的类型名称的差异。
And, of course, since the operation as a whole is traversing a bunch of linked data structures representing the type hierarchy, the time will depend on whether those things are fresh in the cache or not. So the same cast done repeatedly is likely to show an average time which doesn't necessarily represent the typical performance for that cast.
而且,当然,由于整个操作正在遍历表示类型层次结构的一组链接数据结构,因此时间将取决于这些内容是否在缓存中是新鲜的。因此,重复进行的同一个演员很可能会显示平均时间,这不一定代表该演员的典型表现。