理解 C++11 中 lambda 函数的开销

Question

提问by mcbulba

This was already touched in Why C++ lambda is slower than ordinary function when called multiple times?and C++0x Lambda overheadBut I think my example is a bit different from the discussion in the former and contradicts the result in the latter.

这是已经在接触中多次调用时，为什么C ++拉姆达比普通函数慢？和C++0x Lambda 开销但我认为我的例子与前者的讨论有点不同，并且与后者的结果相矛盾。

On the search for a bottleneck in my code I found a recusive template function that processes a variadic argument list with a given processor function, like copying the value into a buffer.

在我的代码中寻找瓶颈时，我发现了一个递归模板函数，它使用给定的处理器函数处理可变参数列表，例如将值复制到缓冲区中。

template <typename T>
void ProcessArguments(std::function<void(const T &)> process)
{}

template <typename T, typename HEAD, typename ... TAIL>
void ProcessArguments(std::function<void(const T &)> process, const HEAD &head, const TAIL &... tail)
{
  process(head);
  ProcessArguments(process, tail...);
}

I compared the runtime of a program that uses this code together with a lambda function as well as a global function that copies the arguments into a global buffer using a moving pointer:

我将使用此代码的程序的运行时与 lambda 函数以及使用移动指针将参数复制到全局缓冲区的全局函数进行了比较：

int buffer[10];
int main(int argc, char **argv)
{
  int *p = buffer;

  for (unsigned long int i = 0; i < 10E6; ++i)
  {
    p = buffer;
    ProcessArguments<int>([&p](const int &v) { *p++ = v; }, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  }
}

compiled with g++ 4.6 and -O3 measuring with the tool time takes more than 6 seconds on my machine while

使用 g++ 4.6 和 -O3 编译，使用工具时间在我的机器上测量需要 6 秒以上，而

int buffer[10];
int *p = buffer;
void CopyIntoBuffer(const int &value)
{
  *p++ = value;
}

int main(int argc, char **argv)
{
  int *p = buffer;

  for (unsigned long int i = 0; i < 10E6; ++i)
  {
    p = buffer;
    ProcessArguments<int>(CopyIntoBuffer, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
  }

  return 0;
}

takes about 1.4 seconds.

大约需要 1.4 秒。

I do not get what is going on behind the scenes that explains the time overhead and am wondering if I can change something to make use of lambda functions without paying with runtime.

我不明白解释时间开销的幕后发生了什么，我想知道我是否可以更改某些内容以使用 lambda 函数而无需支付运行时费用。

Answer 1

回答by Artem Tokmakov

The problem here is your usage of std::function. You send it by copy and therefore copying its contents (and doing that recursively as you unwind parameters).

这里的问题是您对 std::function 的使用。您通过复制发送它，因此复制其内容（并在展开参数时递归执行此操作）。

Now, for pointer to function, contents is, well, just pointer to function. For lambda, contents are at least pointer to function + reference that you captured. This is twice as much to copy. Plus, because of std::function's type erasure copying any data will most likely be slower (not inlined).

现在，对于指向函数的指针，内容只是指向函数的指针。对于 lambda，内容至少是指向您捕获的函数 + 引用的指针。这是复制的两倍。另外，由于 std::function 的类型擦除，复制任何数据很可能会变慢（未内联）。

There are several options here, and the best would probably be passing not std::function, but template instead. The benefits are that your method call is more likely to be inlined, no type erasure happens by std::function, no copying happens, everything is so very good. Like that:

这里有几个选项，最好的可能不是传递 std::function，而是传递模板。好处是你的方法调用更有可能被内联，std::function 不会发生类型擦除，不会发生复制，一切都非常好。像那样：

template <typename TFunc>
void ProcessArguments(const TFunc& process)
{}

template <typename TFunc, typename HEAD, typename ... TAIL>
void ProcessArguments(const TFunc& process, const HEAD &head, const TAIL &... tail)
{
  process(head);
  ProcessArguments(process, tail...);
}

Second option is doing the same, but sending the processby copy. Now, copying does happen, but still is neatly inlined.

第二个选项是做同样的事情，但发送process副本。现在，复制确实发生了，但仍然整齐地内联。

What's equally important is that process' body can also be inlined, especially for lamda. Depending on complexity of copying the lambda object and its size, passing by copy may or may not be faster than passing by reference. It may be faster because compiler may have harder time reasoning about reference than the local copy.

同样重要的是process' body 也可以内联，尤其是对于 lamda。根据复制 lambda 对象的复杂性及其大小，复制传递可能比引用传递快，也可能不快。它可能更快，因为编译器可能比本地副本更难推理引用。

template <typename TFunc>
void ProcessArguments(TFunc process)
{}

template <typename TFunc, typename HEAD, typename ... TAIL>
void ProcessArguments(TFunc process, const HEAD &head, const TAIL &... tail)
{
  process(head);
  ProcessArguments(process, tail...);
}

Third option is, well, try passing std::function<> by reference. This way you at least avoid copying, but calls will not be inlined.

第三个选项是，尝试通过引用传递 std::function<> 。这样您至少可以避免复制，但不会内联调用。

Here are some perf results (using ideones' C++11 compiler). Note that, as expected, inlined lambda body is giving you best performance:

这是一些性能结果（使用 ideones 的 C++11 编译器）。请注意，正如预期的那样，内联 lambda 主体为您提供最佳性能：

Original function:
0.483035s

Original lambda:
1.94531s


Function via template copy:
0.094748

### Lambda via template copy:
0.0264867s


Function via template reference:
0.0892594s

### Lambda via template reference:
0.0264201s


Function via std::function reference:
0.0891776s

Lambda via std::function reference:
0.09s

理解 C++11 中 lambda 函数的开销

提问by mcbulba

回答by Artem Tokmakov

相关推荐

最近更新

标签

理解 C++11 中 lambda 函数的开销

提问by mcbulba

回答by Artem Tokmakov

相关推荐

C++ 旋转矩阵的方向向量

C++ 矩阵类

C++ 对“__imp_WSACleanup”的未定义引用

C++ 指向布尔值的指针

相关推荐

最近更新

标签