在 C++ 类中使用虚方法的性能成本是多少?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/667634/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the performance cost of having a virtual method in a C++ class?
提问by MiniQuark
Having at least one virtual method in a C++ class (or any of its parent classes) means that the class will have a virtual table, and every instance will have a virtual pointer.
在 C++ 类(或其任何父类)中至少有一个虚拟方法意味着该类将有一个虚拟表,并且每个实例都有一个虚拟指针。
So the memory cost is quite clear. The most important is the memory cost on the instances (especially if the instances are small, for example if they are just meant to contain an integer: in this case having a virtual pointer in every instance might double the size of the instances. As for the memory space used up by the virtual tables, I guess it is usually negligible compared to the space used up by the actual method code.
所以内存开销就很清楚了。最重要的是实例上的内存成本(特别是如果实例很小,例如如果它们只是打算包含一个整数:在这种情况下,在每个实例中都有一个虚拟指针可能会使实例的大小增加一倍。至于虚拟表使用的内存空间,我想与实际方法代码使用的空间相比,它通常可以忽略不计。
This brings me to my question: is there a measurable performance cost (i.e. speed impact) for making a method virtual? There will be a lookup in the virtual table at runtime, upon every method call, so if there are very frequent calls to this method, and if this method is very short, then there might be a measurable performance hit? I guess it depends on the platform, but has anyone run some benchmarks?
这让我想到了一个问题:使方法虚拟化是否存在可衡量的性能成本(即速度影响)?每次调用方法时都会在运行时在虚拟表中进行查找,因此如果对此方法的调用非常频繁,并且如果此方法很短,那么可能会出现可衡量的性能下降?我想这取决于平台,但是有人运行过一些基准测试吗?
The reason I am asking is that I came across a bug that happened to be due to a programmer forgetting to define a method virtual. This is not the first time I see this kind of mistake. And I thought: why do we addthe virtual keyword when needed instead of removingthe virtual keyword when we are absolutely sure that it is notneeded? If the performance cost is low, I think I will simply recommend the following in my team: simply make everymethod virtual by default, including the destructor, in every class, and only remove it when you need to. Does that sound crazy to you?
我问的原因是我遇到了一个错误,该错误恰好是由于程序员忘记定义虚拟方法。这不是我第一次看到这种错误。我想:我们为什么要添加虚拟关键字,而不是需要时取出时,我们绝对相信这是它的虚拟关键字没有必要?如果性能成本低,我想我会在我的团队中简单地推荐以下内容:在每个类中将每个方法默认为虚拟,包括析构函数,并且仅在需要时将其删除。这对你来说听起来很疯狂吗?
采纳答案by Crashworks
I ran some timingson a 3ghz in-order PowerPC processor. On that architecture, a virtual function call costs 7 nanoseconds longer than a direct (non-virtual) function call.
我在3ghz 的有序 PowerPC 处理器上运行了一些计时。在该架构上,虚拟函数调用比直接(非虚拟)函数调用多花费 7 纳秒。
So, not really worth worrying about the cost unless the function is something like a trivial Get()/Set() accessor, in which anything other than inline is kind of wasteful. A 7ns overhead on a function that inlines to 0.5ns is severe; a 7ns overhead on a function that takes 500ms to execute is meaningless.
因此,除非函数类似于简单的 Get()/Set() 访问器,否则真的不值得担心成本,其中除内联之外的任何东西都有些浪费。内联到 0.5ns 的函数的 7ns 开销是严重的;一个需要 500 毫秒来执行的函数的 7 纳秒开销是没有意义的。
The big cost of virtual functions isn't really the lookup of a function pointer in the vtable (that's usually just a single cycle), but that the indirect jump usually cannot be branch-predicted. This can cause a large pipeline bubble as the processor cannot fetch any instructions until the indirect jump (the call through the function pointer) has retired and a new instruction pointer computed. So, the cost of a virtual function call is much bigger than it might seem from looking at the assembly... but still only 7 nanoseconds.
虚函数的巨大成本实际上并不是在 vtable 中查找函数指针(通常只是一个循环),而是间接跳转通常无法进行分支预测。这可能会导致大的流水线气泡,因为在间接跳转(通过函数指针的调用)退出并计算新的指令指针之前,处理器无法获取任何指令。因此,虚函数调用的成本比从程序集看起来要大得多……但仍然只有 7 纳秒。
Edit:Andrew, Not Sure, and others also raise the very good point that a virtual function call may cause an instruction cache miss: if you jump to a code address that is not in cache then the whole program comes to a dead halt while the instructions are fetched from main memory. This is alwaysa significant stall: on Xenon, about 650 cycles (by my tests).
编辑:安德鲁,不确定,和其他人也提出了一个很好的观点,即虚函数调用可能会导致指令缓存未命中:如果跳转到不在缓存中的代码地址,那么整个程序就会停止,而指令从主存中取出。这总是一个严重的停顿:在氙气上,大约 650 个周期(根据我的测试)。
However this isn't a problem specific to virtual functions because even a direct function call will cause a miss if you jump to instructions that aren't in cache. What matters is whether the function has been run before recently (making it more likely to be in cache), and whether your architecture can predict static (not virtual) branches and fetch those instructions into cache ahead of time. My PPC does not, but maybe Intel's most recent hardware does.
然而,这不是虚函数特有的问题,因为如果跳转到不在缓存中的指令,即使是直接的函数调用也会导致未命中。重要的是该函数是否最近运行过(使其更有可能在缓存中),以及您的架构是否可以预测静态(非虚拟)分支并提前将这些指令提取到缓存中。我的 PPC 没有,但也许英特尔最新的硬件有。
My timings control for the influence of icache misses on execution (deliberately, since I was trying to examine the CPU pipeline in isolation), so they discount that cost.
我的时间控制了 icache 未命中对执行的影响(故意的,因为我试图孤立地检查 CPU 管道),所以他们打折了这个成本。
回答by Andrew Grant
There is definitely measurable overhead when calling a virtual function - the call must use the vtable to resolve the address of the function for that type of object. The extra instructions are the least of your worries. Not only do vtables prevent many potential compiler optimizations (since the type is polymorphic the compiler) they can also thrash your I-Cache.
调用虚函数时肯定有可衡量的开销——调用必须使用 vtable 来解析该类型对象的函数地址。额外的说明是您最不担心的。vtables 不仅会阻止许多潜在的编译器优化(因为编译器的类型是多态的),它们还可以破坏您的 I-Cache。
Of course whether these penalties are significant or not depends on your application, how often those code paths are executed, and your inheritance patterns.
当然,这些惩罚是否重要取决于您的应用程序、这些代码路径的执行频率以及您的继承模式。
In my opinion though, having everything as virtual by default is a blanket solution to a problem you could solve in other ways.
不过,在我看来,默认情况下将所有内容都设为虚拟是解决您可以通过其他方式解决的问题的全面解决方案。
Perhaps you could look at how classes are designed/documented/written. Generally the header for a class should make quite clear which functions can be overridden by derived classes and how they are called. Having programmers write this documentation is helpful in ensuring they are marked correctly as virtual.
也许你可以看看类是如何设计/记录/编写的。通常,类的头文件应该非常清楚哪些函数可以被派生类覆盖以及如何调用它们。让程序员编写此文档有助于确保它们被正确标记为虚拟。
I would also say that declaring every function as virtual could lead to more bugs than just forgetting to mark something as virtual. If all functions are virtual everything can be replaced by base classes - public, protected, private - everything becomes fair game. By accident or intention subclasses could then change the behavior of functions that then cause problems when used in the base implementation.
我还要说,将每个函数声明为 virtual 可能会导致更多的错误,而不仅仅是忘记将某些东西标记为 virtual。如果所有函数都是虚拟的,那么一切都可以被基类取代——公共、保护、私有——一切都变得公平。偶然或有意,子类可能会更改函数的行为,然后在基本实现中使用时会导致问题。
回答by jalf
It depends. :) (Had you expected anything else?)
这取决于。:)(你还有其他期待吗?)
Once a class gets a virtual function, it can no longer be a POD datatype, (it may not have been one before either, in which case this won't make a difference) and that makes a whole range of optimizations impossible.
一旦一个类获得了一个虚函数,它就不能再是一个 POD 数据类型(以前也可能不是,在这种情况下这不会有什么区别),这使得一系列优化变得不可能。
std::copy() on plain POD types can resort to a simple memcpy routine, but non-POD types have to be handled more carefully.
普通 POD 类型上的 std::copy() 可以求助于简单的 memcpy 例程,但必须更仔细地处理非 POD 类型。
Construction becomes a lot slower because the vtable has to be initialized. In the worst case, the difference in performance between POD and non-POD datatypes can be significant.
因为 vtable 必须被初始化,所以构造变得很慢。在最坏的情况下,POD 和非 POD 数据类型之间的性能差异可能很大。
In the worst case, you may see 5x slower execution (that number is taken from a university project I did recently to reimplement a few standard library classes. Our container took roughly 5x as long to construct as soon as the data type it stored got a vtable)
在最坏的情况下,您可能会看到执行速度慢了 5 倍(这个数字来自我最近为重新实现一些标准库类而做的一个大学项目。一旦它存储的数据类型得到一个虚表)
Of course, in most cases, you're unlikely to see any measurable performance difference, this is simply to point out that in someborder cases, it can be costly.
当然,在大多数情况下,您不太可能看到任何可测量的性能差异,这只是为了指出在某些边界情况下,它可能代价高昂。
However, performance shouldn't be your primary consideration here. Making everything virtual is not a perfect solution for other reasons.
但是,性能不应该是您在这里的主要考虑因素。由于其他原因,使一切都虚拟化并不是一个完美的解决方案。
Allowing everything to be overridden in derived classes makes it much harder to maintain class invariants. How does a class guarantee that it stays in a consistent state when any one of its methods could be redefined at any time?
允许在派生类中覆盖所有内容使得维护类不变量变得更加困难。当任何一个方法可以在任何时候被重新定义时,一个类如何保证它保持一致的状态?
Making everything virtual may eliminate a few potential bugs, but it also introduces new ones.
使一切都虚拟化可能会消除一些潜在的错误,但也会引入新的错误。
回答by jalf
If you need the functionality of virtual dispatch, you have to pay the price. The advantage of C++ is that you can use a very efficient implementation of virtual dispatch provided by the compiler, rather than a possibly inefficient version you implement yourself.
如果您需要虚拟调度的功能,则必须付出代价。C++ 的优点是您可以使用编译器提供的非常有效的虚拟分派实现,而不是您自己实现的可能低效的版本。
However, lumbering yourself with the overhead if you don't needx it is possibly going a bit too far. And most classesare not designed to be inherited from - to create a good base class requires more than making its functions virtual.
但是,如果您不需要的话,让您自己承担开销可能有点过头了。并且大多数类都不是为了继承而设计的——创建一个好的基类需要的不仅仅是让它的函数成为虚拟的。
回答by Tony Delroy
Virtual dispatch is an order of magnitude slower than some alternatives - not due to indirection so much as the prevention of inlining. Below, I illustrate that by contrasting virtual dispatch with an implementation embedding a "type(-identifying) number" in the objects and using a switch statement to select the type-specific code. This avoids function call overhead completely - just doing a local jump. There is a potential cost to maintainability, recompilation dependencies etc through the forced localisation (in the switch) of the type-specific functionality.
虚拟调度比某些替代方案慢一个数量级 - 不是由于间接,而是由于防止内联。下面,我通过将虚拟调度与在对象中嵌入“类型(识别)编号”的实现进行对比,并使用 switch 语句来选择特定于类型的代码来说明这一点。这完全避免了函数调用开销——只是做一个本地跳转。通过类型特定功能的强制本地化(在切换中),可维护性、重新编译依赖性等存在潜在成本。
IMPLEMENTATION
执行
#include <iostream>
#include <vector>
// virtual dispatch model...
struct Base
{
virtual int f() const { return 1; }
};
struct Derived : Base
{
virtual int f() const { return 2; }
};
// alternative: member variable encodes runtime type...
struct Type
{
Type(int type) : type_(type) { }
int type_;
};
struct A : Type
{
A() : Type(1) { }
int f() const { return 1; }
};
struct B : Type
{
B() : Type(2) { }
int f() const { return 2; }
};
struct Timer
{
Timer() { clock_gettime(CLOCK_MONOTONIC, &from); }
struct timespec from;
double elapsed() const
{
struct timespec to;
clock_gettime(CLOCK_MONOTONIC, &to);
return to.tv_sec - from.tv_sec + 1E-9 * (to.tv_nsec - from.tv_nsec);
}
};
int main(int argc)
{
for (int j = 0; j < 3; ++j)
{
typedef std::vector<Base*> V;
V v;
for (int i = 0; i < 1000; ++i)
v.push_back(i % 2 ? new Base : (Base*)new Derived);
int total = 0;
Timer tv;
for (int i = 0; i < 100000; ++i)
for (V::const_iterator i = v.begin(); i != v.end(); ++i)
total += (*i)->f();
double tve = tv.elapsed();
std::cout << "virtual dispatch: " << total << ' ' << tve << '\n';
// ----------------------------
typedef std::vector<Type*> W;
W w;
for (int i = 0; i < 1000; ++i)
w.push_back(i % 2 ? (Type*)new A : (Type*)new B);
total = 0;
Timer tw;
for (int i = 0; i < 100000; ++i)
for (W::const_iterator i = w.begin(); i != w.end(); ++i)
{
if ((*i)->type_ == 1)
total += ((A*)(*i))->f();
else
total += ((B*)(*i))->f();
}
double twe = tw.elapsed();
std::cout << "switched: " << total << ' ' << twe << '\n';
// ----------------------------
total = 0;
Timer tw2;
for (int i = 0; i < 100000; ++i)
for (W::const_iterator i = w.begin(); i != w.end(); ++i)
total += (*i)->type_;
double tw2e = tw2.elapsed();
std::cout << "overheads: " << total << ' ' << tw2e << '\n';
}
}
PERFORMANCE RESULTS
性能结果
On my Linux system:
在我的 Linux 系统上:
~/dev g++ -O2 -o vdt vdt.cc -lrt
~/dev ./vdt
virtual dispatch: 150000000 1.28025
switched: 150000000 0.344314
overhead: 150000000 0.229018
virtual dispatch: 150000000 1.285
switched: 150000000 0.345367
overhead: 150000000 0.231051
virtual dispatch: 150000000 1.28969
switched: 150000000 0.345876
overhead: 150000000 0.230726
This suggests an inline type-number-switched approach is about (1.28 - 0.23) / (0.344 - 0.23) = 9.2times as fast. Of course, that's specific to the exact system tested / compiler flags & version etc., but generally indicative.
这表明内联类型编号切换方法的速度大约是 (1.28 - 0.23) / (0.344 - 0.23) = 9.2倍。当然,这特定于经过测试的确切系统/编译器标志和版本等,但通常具有指示性。
COMMENTS RE VIRTUAL DISPATCH
对虚拟发送的评论
It must be said though that virtual function call overheads are something that's rarely significant, and then only for oft-called trivial functions (like getters and setters). Even then, you might be able to provide a single function to get and set a whole lot of things at once, minimising the cost. People worry about virtual dispatch way too much - so do do the profiling before finding awkward alternatives. The main issue with them is that they perform an out-of-line function call, though they also delocalise the code executed which changes the cache utilisation patterns (for better or (more often) worse).
必须要说的是,虚函数调用开销很少是重要的,而且只适用于经常调用的琐碎函数(如 getter 和 setter)。即便如此,您也可以提供一个函数来一次获取和设置很多东西,从而最大限度地降低成本。人们过于担心虚拟调度方式 - 所以在找到尴尬的替代方案之前做分析。它们的主要问题是它们执行了一个外线函数调用,尽管它们也对执行的代码进行了非本地化,这会改变缓存利用模式(更好或更糟)。
回答by peterchen
The extra cost is virtually nothing in most scenarios. (pardon the pun). ejac has already posted sensible relative measures.
在大多数情况下,额外成本几乎为零。(请原谅双关语)。ejac 已经发布了合理的相关措施。
The biggest thing you give up is possible optimizations due to inlining. They can be especially good if the function is called with constant parameters. This rarely makes a real difference, but in a few cases, this can be huge.
您放弃的最大事情是由于内联而可能进行的优化。如果使用常量参数调用函数,它们会特别好。这很少有真正的区别,但在少数情况下,这可能是巨大的。
Regarding optimizations:
It is important to know and consider the relative cost of constructs of your language. Big O notation is onl half of the story - how does your application scale. The other half is the constant factor in front of it.
关于优化:
了解并考虑语言结构的相对成本很重要。大 O 符号只是故事的一半 -您的应用程序如何扩展。另一半是前面的常数因子。
As a rule of thumb, I wouldn't go out of my way to avoid virtual functions, unless there are clear and specific indications that it is a bottle neck. A clean design always comes first - but it is only one stakeholder that should not undulyhurt others.
根据经验,除非有明确和具体的迹象表明它是瓶颈,否则我不会特意避免虚拟函数。干净的设计总是第一位的——但只有一个利益相关者不应该过度伤害他人。
Contrived Example: An empty virtual destructor on an array of one million small elements may plow through at least 4MB of data, thrashing your cache. If that destructor can be inlined away, the data won't be touched.
人为示例:一个包含 100 万个小元素的数组上的空虚拟析构函数可能会遍历至少 4MB 的数据,从而破坏您的缓存。如果可以内联该析构函数,则不会触及数据。
When writing library code, such considerations are far from premature. You never know how many loops will be put around your function.
在编写库代码时,这样的考虑还为时过早。您永远不知道将在您的函数周围放置多少个循环。
回答by Tommy Hui
While everyone else is correct about the performance of virtual methods and such, I think the real problem is whether the team knows about the definition of the virtual keyword in C++.
虽然其他人对虚方法的性能等都是正确的,但我认为真正的问题是团队是否知道 C++ 中 virtual 关键字的定义。
Consider this code, what is the output?
考虑一下这段代码,输出是什么?
#include <stdio.h>
class A
{
public:
void Foo()
{
printf("A::Foo()\n");
}
};
class B : public A
{
public:
void Foo()
{
printf("B::Foo()\n");
}
};
int main(int argc, char** argv)
{
A* a = new A();
a->Foo();
B* b = new B();
b->Foo();
A* a2 = new B();
a2->Foo();
return 0;
}
Nothing surprising here:
这里没有什么令人惊讶的:
A::Foo()
B::Foo()
A::Foo()
As nothing is virtual. If the virtual keyword is added to the front of Foo in both A and B classes, we get this for the output:
因为没有什么是虚拟的。如果在 A 和 B 类中将 virtual 关键字添加到 Foo 的前面,我们将得到以下输出:
A::Foo()
B::Foo()
B::Foo()
Pretty much what everyone expects.
几乎是每个人所期望的。
Now, you mentioned that there are bugs because someone forgot to add a virtual keyword. So consider this code (where the virtual keyword is added to A, but not B class). What is the output then?
现在,您提到存在错误,因为有人忘记添加虚拟关键字。所以考虑一下这段代码(其中 virtual 关键字被添加到 A 类,而不是 B 类)。那么输出是什么?
#include <stdio.h>
class A
{
public:
virtual void Foo()
{
printf("A::Foo()\n");
}
};
class B : public A
{
public:
void Foo()
{
printf("B::Foo()\n");
}
};
int main(int argc, char** argv)
{
A* a = new A();
a->Foo();
B* b = new B();
b->Foo();
A* a2 = new B();
a2->Foo();
return 0;
}
Answer: The same as if the virtual keyword is added to B? The reason is that the signature for B::Foo matches exactly as A::Foo() and because A's Foo is virtual, so is B's.
答:和B加virtual关键字一样吗?原因是 B::Foo 的签名与 A::Foo() 完全匹配,并且因为 A 的 Foo 是虚拟的,所以 B 的也是虚拟的。
Now consider the case where B's Foo is virtual and A's is not. What is the output then? In this case, the output is
现在考虑 B 的 Foo 是虚拟的而 A 不是的情况。那么输出是什么?在这种情况下,输出是
A::Foo()
B::Foo()
A::Foo()
The virtual keyword works downwards in the hierarchy, not upwards. It never makes the base class methods virtual. The first time a virtual method is encountered in the hierarchy is when the polymorphism begins. There isn't a way for later classes to make previous classes have virtual methods.
virtual 关键字在层次结构中向下起作用,而不是向上起作用。它永远不会使基类方法成为虚拟的。第一次在层次结构中遇到虚方法是在多态开始时。后面的类没有办法让以前的类具有虚拟方法。
Don't forget that virtual methods mean that this class is giving future classes the ability to override/change some of its behaviors.
不要忘记虚拟方法意味着该类为未来的类提供了覆盖/更改其某些行为的能力。
So if you have a rule to remove the virtual keyword, it may not have the intended effect.
因此,如果您有删除 virtual 关键字的规则,它可能不会达到预期的效果。
The virtual keyword in C++ is a powerful concept. You should make sure each member of the team really knows this concept so that it can be used as designed.
C++ 中的 virtual 关键字是一个强大的概念。你应该确保团队的每个成员都真正了解这个概念,以便它可以按设计使用。
回答by Dan Olson
Depending on your platform, the overhead of a virtual call can be very undesirable. By declaring every function virtual you're essentially calling them all through a function pointer. At the very least this is an extra dereference, but on some PPC platforms it will use microcoded or otherwise slow instructions to accomplish this.
根据您的平台,虚拟调用的开销可能非常不理想。通过将每个函数声明为 virtual,您实际上是通过函数指针调用它们。至少这是一个额外的取消引用,但在某些 PPC 平台上,它将使用微编码或其他慢速指令来完成此操作。
I'd recommend against your suggestion for this reason, but if it helps you prevent bugs then it may be worth the trade off. I can't help but think that there must be some middle ground that is worth finding, though.
由于这个原因,我建议反对您的建议,但如果它可以帮助您防止错误,那么可能值得进行权衡。不过,我不禁想到,一定有一些值得寻找的中间立场。
回答by alex2k8
It will require just a couple of extra asm instruction to call virtual method.
它只需要一些额外的 asm 指令来调用虚方法。
But I don't think you worry that fun(int a, int b) has a couple of extra 'push' instructions compared to fun(). So don't worry about virtuals too, until you are in special situation and see that it really leads to problems.
但我不认为你担心 fun(int a, int b) 与 fun() 相比有几个额外的“推送”指令。所以不要担心虚拟化,直到你处于特殊情况并且看到它真的会导致问题。
P.S. If you have a virtual method, make sure you have a virtual destructor. This way you'll avoid possible problems
PS 如果你有一个虚方法,请确保你有一个虚析构函数。这样你就可以避免可能的问题
In response to 'xtofl' and 'Tom' comments. I did small tests with 3 functions:
回应 'xtofl' 和 'Tom' 评论。我用 3 个函数做了小测试:
- Virtual
- Normal
- Normal with 3 int parameters
- 虚拟的
- 普通的
- 正常,有 3 个 int 参数
My test was a simple iteration:
我的测试是一个简单的迭代:
for(int it = 0; it < 100000000; it ++) {
test.Method();
}
And here the results:
结果如下:
- 3,913 sec
- 3,873 sec
- 3,970 sec
- 3,913 秒
- 3,873 秒
- 3,970 秒
It was compiled by VC++ in debug mode. I did only 5 tests per method and computed the mean value (so results may be pretty inaccurate)... Any way, the values are almost equal assuming 100 million calls. And the method with 3 extra push/pop was slower.
它是由 VC++ 在调试模式下编译的。我对每种方法只做了 5 次测试并计算了平均值(因此结果可能非常不准确)......无论如何,假设 1 亿次调用,这些值几乎相等。并且带有 3 个额外 push/pop 的方法更慢。
The main point is that if you don't like the analogy with the push/pop, think of extra if/else in your code? Do you think about CPU pipeline when you add extra if/else ;-) Also, you never know on what CPU the code will be running... Usual compiler can generates code more optimal for one CPU and less optimal for an other (Intel C++ Compiler)
主要的一点是,如果您不喜欢与 push/pop 进行类比,请在您的代码中考虑额外的 if/else?当您添加额外的 if/else 时,您是否考虑过 CPU 管道;-) 此外,您永远不知道代码将在哪个 CPU 上运行......通常的编译器可以为一个 CPU 生成更优化的代码,而对另一个 CPU 生成的代码不太优化(英特尔C++ 编译器)