C语言 内联函数与 C 中的宏 - 开销(内存/速度)是多少?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5226803/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 07:59:28  来源:igfitidea点击:

Inline function v. Macro in C -- What's the Overhead (Memory/Speed)?

cperformanceoptimizationmacrosinline

提问by Jason R. Mick

I searched Stack Overflowfor the pros/cons of function-like macros v. inline functions.

我在Stack Overflow 上搜索了类似函数的宏与内联函数的优缺点。

I found the following discussion: Pros and Cons of Different macro function / inline methods in C

我发现了以下讨论: C 中不同宏函数/内联方法的优点和缺点

...but it didn't answer my primary burning question.

...但它没有回答我的主要问题。

Namely, what is the overhead in c of using a macro function (with variables, possibly other function calls) v. an inline function, in terms of memory usage and execution speed?

也就是说,就内存使用和执行速度而言,使用宏函数(带有变量,可能还有其他函数调用)v. 内联函数在 c 中的开销是多少?

Are there any compiler-dependent differences in overhead? I have both icc and gcc at my disposal.

在开销方面是否存在任何依赖于编译器的差异?我有 icc 和 gcc 可供我使用。

My code snippet I'm modularizing is:

我正在模块化的代码片段是:

double AttractiveTerm = pow(SigmaSquared/RadialDistanceSquared,3);
double RepulsiveTerm = AttractiveTerm * AttractiveTerm;
EnergyContribution += 
   4 * Epsilon * (RepulsiveTerm - AttractiveTerm);

My reason for turning it into an inline function/macro is so I can drop it into a c file and then conditionally compile other similar, but slightly different functions/macros.

我把它变成内联函数/宏的原因是我可以把它放到 ac 文件中,然后有条件地编译其他类似但略有不同的函数/宏。

e.g.:

例如:

double AttractiveTerm = pow(SigmaSquared/RadialDistanceSquared,3);
double RepulsiveTerm = pow(SigmaSquared/RadialDistanceSquared,9);
EnergyContribution += 
   4 * Epsilon * (RepulsiveTerm - AttractiveTerm);

(note the difference in the second line...)

(注意第二行的区别...)

This function is a central one to my code and gets called thousands of times per step in my program and my program performs millions of steps. Thus I want to have the LEAST overhead possible, hence why I'm wasting time worrying about the overhead of inlining v. transforming the code into a macro.

这个函数是我代码的核心,在我的程序中每一步被调用数千次,我的程序执行数百万步。因此,我希望尽可能减少开销,因此为什么我要浪费时间担心内联与将代码转换为宏的开销。

Based on the prior discussion I already realize other pros/cons (type independence and resulting errors from that) of macros... but what I want to know most, and don't currently know is the PERFORMANCE.

基于之前的讨论,我已经意识到宏的其他优点/缺点(类型独立性和由此产生的错误)......但我最想知道的,目前不知道的是性能。

I know some of you C veterans will have some great insight for me!!

我知道你们中的一些 C 老手会对我有一些深刻的见解!!

采纳答案by Stephen Canon

Calling an inline function may or may not generate a function call, which typically incurs a very small amount of overhead. The exact situations under which an inlinefunction actually gets inlined vary depending on the compiler; most make a good-faith effort to inline small functions (at least when optimization is enabled), but there is no requirement that they do so (C99, §6.7.4):

调用内联函数可能会也可能不会生成函数调用,这通常会产生非常小的开销。inline函数实际内联的确切情况因编译器而异。大多数人真诚地努力内联小函数(至少在启用优化时),但没有要求他们这样做(C99,第 6.7.4 节):

Making a function an inline function suggests that calls to the function be as fast as possible. The extent to which such suggestions are effective is implementation-defined.

使函数成为内联函数意味着对该函数的调用尽可能快。这些建议的有效程度是由实施定义的。

A macro is less likely to incur such overhead (though again, there is little to prevent a compiler from somehow doing something; the standard doesn't define what machine code programs must expand to, only the observable behavior of a compiled program).

宏不太可能产生这样的开销(尽管同样,几乎没有什么可以阻止编译器以某种方式做某事;标准没有定义机器代码程序必须扩展到什么,只定义编译程序的可观察行为)。

Use whatever is cleaner. Profile. If it matters, do something different.

使用任何更清洁的东西。轮廓。如果重要,请做一些不同的事情。

Also, what fizzersaid; calls to pow (and division) are both typically more expensive than function-call overhead. Minimizing those is a good start:

还有,菲泽尔说的;对 pow(和除法)的调用通常都比函数调用开销更昂贵。最小化这些是一个好的开始:

double ratio = SigmaSquared/RadialDistanceSquared;
double AttractiveTerm = ratio*ratio*ratio;
EnergyContribution += 4 * Epsilon * AttractiveTerm * (AttractiveTerm - 1.0);

Is EnergyContributionmade up only of terms that look like this? If so, pull the 4 * Epsilonout, and save two multiplies per iteration:

是否EnergyContribution仅由看起来像这样的术语组成?如果是这样,拉出4 * Epsilon,并在每次迭代中保存两个乘法:

double ratio = SigmaSquared/RadialDistanceSquared;
double AttractiveTerm = ratio*ratio*ratio;
EnergyContribution += AttractiveTerm * (AttractiveTerm - 1.0);
// later, once you've done all of those terms...
EnergyContribution *= 4 * Epsilon;

回答by Hassan Syed

An macro is not really a function. whatever you define as a macro gets verbatim posted into your code, before the compiler gets to see it, by the preprocessor. The preprocessor is just a software engineers tool that enables various abstractions to better structure your code.

宏并不是真正的函数。在编译器看到它之前,预处理器将您定义为宏的任何内容逐字发布到您的代码中。预处理器只是一个软件工程师工具,它支持各种抽象来更好地构建代码。

A function inline or otherwise the compiler does know about, and can make decisions on what to do with it. A user supplined inlinekeyword is just a suggestion and the compiler may over-ride it. It is this over-riding that in most cases would result in better code.

内联函数或编译器知道的其他函数,并且可以决定如何处理它。用户提供的inline关键字只是一个建议,编译器可能会覆盖它。正是这种覆盖在大多数情况下会产生更好的代码。

Another side effect of the compiler being aware of the functions is that you could potentially force the compiler to take certain decisions -for example, disabling inlining of your code, which could enable you to better debug or profile your code. There are probably many other use-cases that inline functions enable vs. macros.

编译器了解函数的另一个副作用是,您可能会强制编译器做出某些决定 - 例如,禁用代码内联,这可以让您更好地调试或分析代码。内联函数与宏相比可能还有许多其他用例。

Macros are extremely powerful though, and to back this up I would cite google test and google mock. There are many reasons to use macros :D.

宏虽然非常强大,为了支持这一点,我会引用谷歌测试和谷歌模拟。使用宏的原因有很多:D。

Simple mathmatical operations that are chained together using functions are often inlined by the compiler, especially if the function is only called once in the translation step. So, I wouldn't be surprised that the compiler takes inlining decisions for you, regardless of weather the keyword is supplied or not.

使用函数链接在一起的简单数学运算通常由编译器内联,尤其是在转换步骤中只调用一次函数的情况下。因此,无论是否提供关键字,编译器都会为您做出内联决定,我不会感到惊讶。

However, if the compiler doesn't you can manually flatted out segments of your code. If you do flatten it out perhaps macros will serve as a good abstraction, after all they present similar semantics to a "real" function.

但是,如果编译器没有,您可以手动展开代码段。如果你确实把它弄平了,也许宏将作为一个很好的抽象,毕竟它们呈现出与“真实”函数相似的语义。

The Crux

症结

So, do you want the compiler to be aware of certain logical boundaries so it can produce better physical code, or do you want force decisions on the compiler by flattening it out manually or by using macros. The industry leans towards the former.

那么,您是希望编译器知道某些逻辑边界,以便生成更好的物理代码,还是希望通过手动将其展平或使用宏来强制编译器做出决定。该行业倾向于前者。

I would lean towards using macros in this case, just because it's quick and dirty, without having to learn much more. However, as macros are a software engineering abstraction, and because you are concerned with the code the compiler generates, if the problem were to become slightly more advanced I would use C++ templates, as they were designed for the concerns you are pondering.

在这种情况下,我倾向于使用宏,只是因为它快速而肮脏,而无需学习更多。但是,由于宏是软件工程抽象,并且因为您关心编译器生成的代码,所以如果问题变得稍微高级一点,我会使用 C++ 模板,因为它们是为您正在考虑的问题而设计的。

回答by fizzer

It's the calls to pow() you want to eliminate. This function takes general floating point exponents and is inefficient for raising to integral exponents. Replacing these calls with e.g.

这是您想要消除的对 pow() 的调用。此函数采用一般浮点指数,对于提高整数指数效率低下。用例如替换这些调用

inline double cube(double x)
{
    return x * x * x;
}

is the only thing which will make a significant difference to your performance here.

是唯一会对您在这里的表现产生重大影响的事情。

回答by Muhammed Abdul Galeil

Please review the CERT Secure coding standard talking about macros and inline functions in terms of security and bug arousing , i do not encourage using function-like macros , because : - Less Profiling - Less Traceable - Harder to debug - Could Lead to severe Bugs

请查看 CERT 安全编码标准,在安全性和错误引发方面讨论宏和内联函数,我不鼓励使用类似函数的宏,因为: - 更少的分析 - 更难追踪 - 更难调试 - 可能导致严重的错误

回答by John Bode

Macros, including function-like macros, are simple text substitutions, and as such can bite you in the ass if you're not reallycareful with your parameters. For example, the ever-so-popular SQUARE macro:

宏,包括类似函数的宏,是简单的文本替换,因此如果您对参数不十分小心,可能会让您大吃一惊。例如,非常流行的 SQUARE 宏:

#define SQUARE(x) ((x)*(x))

can be a disaster waiting to happen if you call it as SQUARE(i++). Also, function-like macros have no concept of scope, and don't support local variables; the most popular hack is something like

如果您将其称为 ,则可能是一场等待发生的灾难SQUARE(i++)。另外,类函数宏没有作用域的概念,也不支持局部变量;最流行的黑客是这样的

#define MACRO(S,R,E,C)                                     \
do                                                         \   
{                                                          \
  double AttractiveTerm = pow((S)/(R),3);                  \
  double RepulsiveTerm = AttractiveTerm * AttractiveTerm;  \
  (C) = 4 * (E) * (RepulsiveTerm - AttractiveTerm);        \
} while(0)

which, of course, makes it hard to assign a result like x = MACRO(a,b);.

当然,这使得很难分配像x = MACRO(a,b);.

The best bet from a correctnessand maintainabilitystandpoint is to make it a function and specify inline. Macros are not functions, and should not be confused with them.

正确性可维护性的角度来看,最好的办法是使它成为一个函数并指定inline. 宏不是函数,不应与它们混淆。

Once you've done that, measure the performance and find where any actualbottleneck is before hacking at it (the call to powwould certainly be a candidate for streamlining).

完成此操作后,请测量性能并找出任何实际瓶颈所在,然后再对其进行破解(调用powt 肯定是精简的候选者)。

回答by Mike Dunlavey

If you random-pausethis, what you're probably going to see is that 100% (minus epsilon) of the time is inside the powfunction, so how it got there makes basically nodifference.

如果您随机暂停它,您可能会看到 100%(减去 epsilon)的时间在pow函数内部,因此它如何到达那里基本上没有区别。

Assuming you find that, the first thing to do is get rid of the calls to powthat you found on the stack. (In general, what it does is take the logof the first argument, multiply it by the second argument, and expof that, or something that does the same thing. The logand expcould well be done by some kind of series involving a lot of arithmetic. It looks for special cases, of course, but it's still going to take longer than you would.) That alone should give you around an order of magnitude speedup.

假设您发现了这一点,首先要做的是摆脱对pow您在堆栈中找到的调用。(一般情况下,它的作用是把log第二个参数的第一个参数,乘它,exp这一点,或者一些做同样的事情,在logexp很可能通过某种一系列涉及大量的运算来完成.当然,它会寻找特殊情况,但它仍然会比您花费更长的时间。)仅此一项就可以为您提供大约一个数量级的加速。

Then do the random-pausing again. Now you're going to see something else taking a lot of the time. I can't guess what it will be, and neither can anyone else, but you can probably reduce that too. Just keep doing it until you can't any more.

然后再次进行随机暂停。现在你会看到其他东西需要花费很多时间。我猜不出它会是什么,其他人也猜不到,但你也可以减少它。继续做,直到你做不到为止。

It may happen along the way that you choose to use a macro, and it might be slightly faster than an inline function. That's for you to judge when you get there.

在您选择使用宏的过程中可能会发生这种情况,并且它可能比内联函数稍快。那是你到达那里时的判断。

回答by Eric Melski

The best way to answer your question is to benchmark both approaches to see which is actually faster in yourapplication, using yourtest data. Predictions about performance are notoriously unreliable except at the coarsest levels.

回答您的问题的最佳方法是使用您的测试数据对两种方法进行基准测试,以查看哪种方法在您的应用程序中实际上更快。众所周知,关于性能的预测是不可靠的,除非是在最粗略的级别。

That said, I would expect there to be no significant difference between a macro and a trulyinlined function call. In both cases, you should end up with the same assembly code under the hood.

也就是说,我希望宏和真正的内联函数调用之间没有显着差异。在这两种情况下,您都应该在后台使用相同的汇编代码。

回答by Keith Nicholas

as others have said, it mostly depends on the compiler.

正如其他人所说,这主要取决于编译器。

I bet "pow" costs you more than any inlining or macro will save you :)

我敢打赌,“pow”比任何内联或宏节省的成本都要高:)

I think its cleaner if its an inline function rather than a macro.

如果它是内联函数而不是宏,我认为它更干净。

caching and pipelining are really where you are gonna get good gains if you are running this on a modern processor. ie. remove branching statements like 'if' make enormous differences ( can be done by a number of tricks )

如果您在现代处理器上运行缓存和流水线,您将真正获得不错的收益。IE。删除像“if”这样的分支语句会产生巨大的差异(可以通过许多技巧来完成)

回答by Tavison

As I understand it from some guys who write compilers, once you call a function from inside it is not very likely your code will be inlined anyway. But, that is why you should not use a macro. Macros remove information and leave the compiler with far fewer options to optimize. With multi-pass compilers and whole program optimizations they will know that inlining your code will cause a failed branch prediction or a cache miss or other black magic forces modern CPUs use to go fast. I think everyone is right to point out that the code above is not optimal anyway, so that is where the focus should be.

正如我从一些编写编译器的人那里了解到的那样,一旦你从内部调用一个函数,你的代码无论如何都不太可能被内联。但是,这就是为什么你不应该使用宏的原因。宏删除信息,让编译器的优化选项少得多。使用多遍编译器和整个程序优化,他们将知道内联您的代码将导致分支预测失败或缓存未命中或现代 CPU 使用的其他黑魔法力量来加快速度。我认为每个人都正确地指出上面的代码无论如何都不是最优的,所以这才是重点。