visual-studio 我什么时候应该使用 __forceinline 而不是内联？

Question

提问by Michael Labbé

Visual Studio includes support for __forceinline. The Microsoft Visual Studio 2005 documentation states:

Visual Studio 包括对 __forceinline 的支持。Microsoft Visual Studio 2005 文档说明：

The __forceinline keyword overrides the cost/benefit analysis and relies on the judgment of the programmer instead.

__forceinline 关键字覆盖了成本/收益分析，而是依赖于程序员的判断。

This raises the question: When is the compiler's cost/benefit analysis wrong? And, how am I supposed to know that it's wrong?

这就提出了一个问题：编译器的成本/收益分析何时是错误的？而且，我怎么知道这是错误的？

In what scenario is it assumed that I know better than my compiler on this issue?

在什么情况下，假设我比我的编译器更了解这个问题？

Answer 1

采纳答案by SmacL

The compiler is making its decisions based on static code analysis, whereas if you profile as don says, you are carrying out a dynamic analysis that can be much farther reaching. The number of calls to a specific piece of code is often largely determined by the context in which it is used, e.g. the data. Profiling a typical set of use cases will do this. Personally, I gather this information by enabling profiling on my automated regression tests. In addition to forcing inlines, I have unrolled loops and carried out other manual optimizations on the basis of such data, to good effect. It is also imperative to profile again afterwards, as sometimes your best efforts can actually lead to decreased performance. Again, automation makes this a lot less painful.

编译器根据静态代码分析做出决定，而如果您按照唐所说的进行分析，则您正在执行可能影响更深远的动态分析。对特定代码段的调用次数通常很大程度上取决于使用它的上下文，例如数据。分析一组典型的用例将执行此操作。就我个人而言，我通过在我的自动回归测试中启用分析来收集这些信息。除了强制内联外，我还根据这些数据展开了循环并进行了其他手动优化，效果很好。之后再次进行分析也很重要，因为有时您的最大努力实际上会导致性能下降。同样，自动化使这变得不那么痛苦。

More often than not though, in my experience, tweaking alogorithms gives much better results than straight code optimization.

但通常情况下，根据我的经验，调整算法会比直接代码优化产生更好的结果。

Answer 2

回答by Don Neufeld

You know better than the compiler only when your profiling data tells you so.

只有当您的分析数据告诉您时，您才比编译器更了解。

Answer 3

回答by peterchen

The one place I am using it is licence verification.

我使用它的一个地方是许可证验证。

One important factor to protect against easy* cracking is to verify being licenced in multiple places rather than only one, and you don't want these places to be the same function call.

防止容易*破解的一个重要因素是验证在多个地方而不是一个地方获得许可，并且您不希望这些地方是同一个函数调用。

*) Please don't turn this in a discussion that everything can be cracked - I know. Also, this alone does not help much.

*) 请不要在讨论一切都可以破解的情况下转过来——我知道。此外，仅此一项并没有多大帮助。

Answer 4

回答by Johann Gerell

I've developed software for limited resource devices for 9 years or so and the onlytime I've ever seen the need to use __forceinlinewas in a tight loop where a camera driver needed to copy pixel data from a capture buffer to the device screen. There we could clearly see that the cost of a specific function call really hogged the overlay drawing performance.

我已经为有限资源设备开发了 9 年左右的软件，我唯一一次看到需要使用的__forceinline是在一个紧密循环中，其中相机驱动程序需要将像素数据从捕获缓冲区复制到设备屏幕。在那里我们可以清楚地看到，特定函数调用的成本确实影响了叠加绘制性能。

Answer 5

回答by Greg Hewgill

The only way to be sure is to measure performance with and without. Unless you are writing highly performance critical code, this will usually be unnecessary.

唯一可以确定的方法是在有和没有的情况下衡量性能。除非您正在编写高性能的关键代码，否则这通常是不必要的。

Answer 6

回答by Greg Hewgill

The inline directive will be totally of no use when used for functions which are:

当用于以下函数时，内联指令将完全没有用：

recursive, long, composed of loops,

递归的，长的，由循环组成，

If you want to force this decision using __forceinline

如果您想使用 __forceinline 强制执行此决定

Answer 7

回答by rxantos

Actually, even with the __forceinline keyword. Visual C++ sometimes chooses not to inline the code. (Source: Resulting assembly source code.)

实际上，即使使用 __forceinline 关键字。Visual C++ 有时选择不内联代码。（来源：产生的汇编源代码。）

Always look at the resulting assembly code where speed is of importance (such as tight inner loops needed to be run on each frame).

始终查看生成的汇编代码，其中速度很重要（例如需要在每一帧上运行紧密的内部循环）。

Sometimes using #define instead of inline will do the trick. (of course you loose a lot of checking by using #define, so use it only when and where it really matters).

有时使用 #define 而不是 inline 可以解决问题。（当然，使用#define 会丢失很多检查，因此请仅在真正重要的时间和地点使用它）。

Answer 8

回答by Soonts

For SIMD code.

对于 SIMD 代码。

SIMD code often uses constants/magic numbers. In a regular function, every const __m128 c = _mm_setr_ps(1,2,3,4);becomes a memory reference.

SIMD 代码通常使用常量/幻数。在常规函数中，每个都const __m128 c = _mm_setr_ps(1,2,3,4);成为内存引用。

With __forceinline, compiler can load it once and reuse the value, unless your code exhausts registers (usually 16).

使用__forceinline，编译器可以加载一次并重用该值，除非您的代码耗尽寄存器（通常为 16 个）。

CPU caches are great but registers are still faster.

CPU 缓存很棒，但寄存器仍然更快。

P.S. Just got 12% performance improvement by __forceinlinealone.

PS 仅凭一项就获得了 12% 的性能提升__forceinline。

Answer 9

回答by Cookie

Actually, boost is loaded with it.

实际上，boost 加载了它。

For example

例如

 BOOST_CONTAINER_FORCEINLINE flat_tree&  operator=(BOOST_RV_REF(flat_tree) x)
    BOOST_NOEXCEPT_IF( (allocator_traits_type::propagate_on_container_move_assignment::value ||
                        allocator_traits_type::is_always_equal::value) &&
                         boost::container::container_detail::is_nothrow_move_assignable<Compare>::value)
 {  m_data = boost::move(x.m_data); return *this;  }

 BOOST_CONTAINER_FORCEINLINE const value_compare &priv_value_comp() const
 { return static_cast<const value_compare &>(this->m_data); }

 BOOST_CONTAINER_FORCEINLINE value_compare &priv_value_comp()
 { return static_cast<value_compare &>(this->m_data); }

 BOOST_CONTAINER_FORCEINLINE const key_compare &priv_key_comp() const
 { return this->priv_value_comp().get_comp(); }

 BOOST_CONTAINER_FORCEINLINE key_compare &priv_key_comp()
 { return this->priv_value_comp().get_comp(); }

 public:
 // accessors:
 BOOST_CONTAINER_FORCEINLINE Compare key_comp() const
 { return this->m_data.get_comp(); }

 BOOST_CONTAINER_FORCEINLINE value_compare value_comp() const
 { return this->m_data; }

 BOOST_CONTAINER_FORCEINLINE allocator_type get_allocator() const
 { return this->m_data.m_vect.get_allocator(); }

 BOOST_CONTAINER_FORCEINLINE const stored_allocator_type &get_stored_allocator() const
 {  return this->m_data.m_vect.get_stored_allocator(); }

 BOOST_CONTAINER_FORCEINLINE stored_allocator_type &get_stored_allocator()
 {  return this->m_data.m_vect.get_stored_allocator(); }

 BOOST_CONTAINER_FORCEINLINE iterator begin()
 { return this->m_data.m_vect.begin(); }

 BOOST_CONTAINER_FORCEINLINE const_iterator begin() const
 { return this->cbegin(); }

 BOOST_CONTAINER_FORCEINLINE const_iterator cbegin() const
 { return this->m_data.m_vect.begin(); }

Answer 10

回答by Gandalf458

When you know that the function is going to be called in one place several times for a complicated calculation, then it is a good idea to use __forceinline. For instance, a matrix multiplication for animation may need to be called so many times that the calls to the function will start to be noticed by your profiler. As said by the others, the compiler can't really know about that, especially in a dynamic situation where the execution of the code is unknown at compile time.

当您知道该函数将在一个地方多次调用以进行复杂计算时，最好使用 __forceinline。例如，动画的矩阵乘法可能需要多次调用，以至于您的分析器会开始注意到对该函数的调用。正如其他人所说，编译器无法真正知道这一点，尤其是在编译时代码执行未知的动态情况下。

visual-studio 我什么时候应该使用 __forceinline 而不是内联？

提问by Michael Labbé

采纳答案by SmacL

回答by Don Neufeld

回答by peterchen

回答by Johann Gerell

回答by Greg Hewgill

回答by Greg Hewgill

回答by rxantos

回答by Soonts

回答by Cookie

回答by Gandalf458

相关推荐

最近更新

标签

visual-studio 我什么时候应该使用 __forceinline 而不是内联？

提问by Michael Labbé

采纳答案by SmacL

回答by Don Neufeld

回答by peterchen

回答by Johann Gerell

回答by Greg Hewgill

回答by Greg Hewgill

回答by rxantos

回答by Soonts

回答by Cookie

回答by Gandalf458

相关推荐

scala Spark 2.3.0 找不到数据源：kafka

scala 使用Scala中的列和索引将数组转换为数据框

scala 如何将时间戳列转换为纪元秒？

scala 如何计算 Spark Dataframe 中的列数？

相关推荐

最近更新

标签