C++ std::function 是如何实现的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18453145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 21:58:37  来源:igfitidea点击:

How is std::function implemented?

c++c++11lambda

提问by Miklós Homolya

According to the sources I have found, a lambda expressionis essentially implemented by the compiler creating a class with overloaded function call operator and the referenced variables as members. This suggests that the size of lambda expressions varies, and given enough references variables that size can be arbitrarily large.

根据我发现的来源,lambda 表达式本质上是由编译器实现的,它创建了一个具有重载函数调用运算符和引用变量作为成员的类。这表明 lambda 表达式的大小各不相同,并且给定足够多的引用变量,大小可以是任意大的

An std::functionshould have a fixed size, but it must be able to wrap any kind of callables, including any lambdas of the same kind. How is it implemented? If std::functioninternally uses a pointer to its target, then what happens, when the std::functioninstance is copied or moved? Are there any heap allocations involved?

Anstd::function应该有一个固定的 size,但它必须能够包装任何类型的可调用对象,包括任何相同类型的 lambdas。它是如何实施的?如果在std::function内部使用指向其目标的指针,那么在std::function复制或移动实例时会发生什么?是否涉及任何堆分配?

采纳答案by David Rodríguez - dribeas

The implementation of std::functioncan differ from one implementation to another, but the core idea is that it uses type-erasure. While there are multiple ways of doing it, you can imagine a trivial (not optimal) solution could be like this (simplified for the specific case of std::function<int (double)>for the sake of simplicity):

的实现std::function可以从一个实现到另一个不同,但核心思想是它使用类型擦除。虽然有多种方法可以做到这一点,但您可以想象一个微不足道的(不是最佳的)解决方案可能是这样的(std::function<int (double)>为了简单起见,针对特定情况进行了简化):

struct callable_base {
   virtual int operator()(double d) = 0;
   virtual ~callable_base() {}
};
template <typename F>
struct callable : callable_base {
   F functor;
   callable(F functor) : functor(functor) {}
   virtual int operator()(double d) { return functor(d); }
};
class function_int_double {
   std::unique_ptr<callable_base> c;
public:
   template <typename F>
   function(F f) {
      c.reset(new callable<F>(f));
   }
   int operator()(double d) { return c(d); }
// ...
};

In this simple approach the functionobject would store just a unique_ptrto a base type. For each different functor used with the function, a new type derived from the base is created and an object of that type instantiated dynamically. The std::functionobject is always of the same size and will allocate space as needed for the different functors in the heap.

在这种简单的方法中,function对象将只存储unique_ptr一个基本类型。对于与 一起使用的每个不同的函子function,会创建一个从基派生的新类型,并动态实例化该类型的对象。该std::function对象始终具有相同的大小,并将根据需要为堆中的不同函子分配空间。

In real life there are different optimizations that provide performance advantages but would complicate the answer. The type could use small object optimizations, the dynamic dispatch can be replaced by a free-function pointer that takes the functor as argument to avoid one level of indirection... but the idea is basically the same.

在现实生活中,有不同的优化可以提供性能优势,但会使答案复杂化。该类型可以使用小对象优化,动态分派可以由一个以函子为参数的自由函数指针代替,以避免一级间接......但想法基本相同。



Regarding the issue of how copies of the std::functionbehave, a quick test indicates that copies of the internal callable object are done, rather than sharing the state.

关于副本的std::function行为问题,快速测试表明内部可调用对象的副本已完成,而不是共享状态。

// g++4.8
int main() {
   int value = 5;
   typedef std::function<void()> fun;
   fun f1 = [=]() mutable { std::cout << value++ << '\n' };
   fun f2 = f1;
   f1();                    // prints 5
   fun f3 = f1;
   f2();                    // prints 5
   f3();                    // prints 6 (copy after first increment)
}

The test indicates that f2gets a copy of the callable entity, rather than a reference. If the callable entity was shared by the different std::function<>objects, the output of the program would have been 5, 6, 7.

测试表明f2获取可调用实体的副本,而不是引用。如果可调用实体由不同的std::function<>对象共享,则程序的输出将是 5、6、7。

回答by David Rodríguez - dribeas

For certain types of arguments ("if f's target is a callable object passed via reference_wrapperor a function pointer"), std::function's constructor disallows any exceptions, so using dynamic memory is out of the question. For this case, all the data must be stored directly inside the std::functionobject.

对于某些类型的参数(“如果 f 的目标是通过reference_wrapper或函数指针传递的可调用对象”),std::function的构造函数不允许任何异常,因此使用动态内存是不可能的。对于这种情况,所有数据必须直接存储在std::function对象内部。

In the general case, (including the lambda case), using dynamic memory (via either the standard allocator, or an allocator passed to the std::functionconstructor) is allowed as the implementation sees fit. The standard recommends implementations do not use dynamic memory if it can be avoided, but as you rightly say, if the function object (not the std::functionobject, but the object being wrapped inside it) is large enough, there is no way to prevent it, since std::functionhas a fixed size.

在一般情况下(包括 lambda 情况),std::function在实现认为合适时,允许使用动态内存(通过标准分配器或传递给构造函数的分配器)。如果可以避免,标准建议实现不要使用动态内存,但正如您所说,如果函数对象(不是std::function对象,而是包装在其中的对象)足够大,则无法阻止它,因为std::function有固定的大小。

This permission to throw exceptions is granted to both the normal constructor and the copy constructor, which fairly explicitly allows dynamic memory allocations during copying too. For moves, there is no reason why dynamic memory would be necessary. The standard does not seem to explicitly prohibit it, and probably cannot if the move might call the move constructor of the wrapped object's type, but you should be able to assume that if both the implementation and your objects are sensible, moving won't cause any allocations.

这种抛出异常的权限被授予普通构造函数和复制构造函数,它们相当明确地允许在复制期间进行动态内存分配。对于移动,没有理由需要动态记忆。该标准似乎没有明确禁止它,如果移动可能调用包装对象类型的移动构造函数,则可能不能,但您应该能够假设,如果实现和您的对象都是合理的,移动不会导致任何分配。

回答by neuront

The answer from @David Rodríguez - dribeas is good for demonstrating the type-erasure but not good enough since type-erasure also includes how types are copied (in that answer the function object won't be copy-constructible). Those behaviors are also stored in the functionobject, besides the functor data.

来自@David Rodríguez 的答案 - dribeas 很好地演示了类型擦除,但还不够好,因为类型擦除还包括如何复制类型(在该答案中,函数对象将不可复制构造)。function除了函子数据之外,这些行为也存储在对象中。

The trick, used in the STL implementation from Ubuntu 14.04 gcc 4.8, is to write one generic function, specialize it with each possible functor type, and cast them to a universal function pointer type. Therefore the type information is erased.

在 Ubuntu 14.04 gcc 4.8 的 STL 实现中使用的技巧是编写一个通用函数,用每种可能的函子类型专门化它,并将它们转换为通用函数指针类型。因此类型信息被擦除

I've cobbled up a simplified version of that. Hope it'll help

我拼凑了一个简化版本。希望它会有所帮助

#include <iostream>
#include <memory>

template <typename T>
class function;

template <typename R, typename... Args>
class function<R(Args...)>
{
    // function pointer types for the type-erasure behaviors
    // all these char* parameters are actually casted from some functor type
    typedef R (*invoke_fn_t)(char*, Args&&...);
    typedef void (*construct_fn_t)(char*, char*);
    typedef void (*destroy_fn_t)(char*);

    // type-aware generic functions for invoking
    // the specialization of these functions won't be capable with
    //   the above function pointer types, so we need some cast
    template <typename Functor>
    static R invoke_fn(Functor* fn, Args&&... args)
    {
        return (*fn)(std::forward<Args>(args)...);
    }

    template <typename Functor>
    static void construct_fn(Functor* construct_dst, Functor* construct_src)
    {
        // the functor type must be copy-constructible
        new (construct_dst) Functor(*construct_src);
    }

    template <typename Functor>
    static void destroy_fn(Functor* f)
    {
        f->~Functor();
    }

    // these pointers are storing behaviors
    invoke_fn_t invoke_f;
    construct_fn_t construct_f;
    destroy_fn_t destroy_f;

    // erase the type of any functor and store it into a char*
    // so the storage size should be obtained as well
    std::unique_ptr<char[]> data_ptr;
    size_t data_size;
public:
    function()
        : invoke_f(nullptr)
        , construct_f(nullptr)
        , destroy_f(nullptr)
        , data_ptr(nullptr)
        , data_size(0)
    {}

    // construct from any functor type
    template <typename Functor>
    function(Functor f)
        // specialize functions and erase their type info by casting
        : invoke_f(reinterpret_cast<invoke_fn_t>(invoke_fn<Functor>))
        , construct_f(reinterpret_cast<construct_fn_t>(construct_fn<Functor>))
        , destroy_f(reinterpret_cast<destroy_fn_t>(destroy_fn<Functor>))
        , data_ptr(new char[sizeof(Functor)])
        , data_size(sizeof(Functor))
    {
        // copy the functor to internal storage
        this->construct_f(this->data_ptr.get(), reinterpret_cast<char*>(&f));
    }

    // copy constructor
    function(function const& rhs)
        : invoke_f(rhs.invoke_f)
        , construct_f(rhs.construct_f)
        , destroy_f(rhs.destroy_f)
        , data_size(rhs.data_size)
    {
        if (this->invoke_f) {
            // when the source is not a null function, copy its internal functor
            this->data_ptr.reset(new char[this->data_size]);
            this->construct_f(this->data_ptr.get(), rhs.data_ptr.get());
        }
    }

    ~function()
    {
        if (data_ptr != nullptr) {
            this->destroy_f(this->data_ptr.get());
        }
    }

    // other constructors, from nullptr, from function pointers

    R operator()(Args&&... args)
    {
        return this->invoke_f(this->data_ptr.get(), std::forward<Args>(args)...);
    }
};

// examples
int main()
{
    int i = 0;
    auto fn = [i](std::string const& s) mutable
    {
        std::cout << ++i << ". " << s << std::endl;
    };
    fn("first");                                   // 1. first
    fn("second");                                  // 2. second

    // construct from lambda
    ::function<void(std::string const&)> f(fn);
    f("third");                                    // 3. third

    // copy from another function
    ::function<void(std::string const&)> g(f);
    f("forth - f");                                // 4. forth - f
    g("forth - g");                                // 4. forth - g

    // capture and copy non-trivial types like std::string
    std::string x("xxxx");
    ::function<void()> h([x]() { std::cout << x << std::endl; });
    h();

    ::function<void()> k(h);
    k();
    return 0;
}

There are also some optimizations in the STL version

STL版本也有一些优化

  • the construct_fand destroy_fare mixed into one function pointer (with an additional parameter that tells what to do) as to save some bytes
  • raw pointers are used to store the functor object, along with a function pointer in a union, so that when a functionobject is constructed from an function pointer, it will be stored directly in the unionrather than heap space
  • construct_fdestroy_f混合成一个函数指针(以告诉做什么额外的参数),以节省一些字节
  • 原始指针用于存储函子对象,以及在 a 中的函数指针union,因此当function从函数指针构造对象时,它将直接存储在union而不是堆空间中

Maybe the STL implementation is not the best solution as I've heard about some faster implementation. However I believe the underlying mechanism is the same.

也许 STL 实现不是最好的解决方案,因为我听说过一些更快的实现。但是我相信底层机制是相同的。

回答by aaronman

An std::functionoverloads the operator()making it a functor object, lambda's work the same way. It basically creates a struct with member variables that can be accessed inside the operator()function. So the basic concept to keep in mind is that a lambda is an object (called a functor or function object) not a function. The standard says not to use dynamic memory if it can be avoided.

Anstd::function重载operator()使其成为函子对象,lambda 的工作方式相同。它基本上创建了一个带有可以在operator()函数内部访问的成员变量的结构。所以要记住的基本概念是 lambda 是一个对象(称为函子或函数对象)而不是函数。该标准表示,如果可以避免,则不要使用动态内存。