C++ std::vector 比普通数组慢这么多吗？

Question

提问by kizzx2

I've always thought it's the general wisdom that std::vectoris "implemented as an array," blah blah blah. Today I went down and tested it, and it seems to be not so:

我一直认为这std::vector是“作为数组实现”的普遍智慧，等等。今天下下来测试了一下，好像不是这样：

Here's some test results:

下面是一些测试结果：

UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds

That's about 3 - 4 times slower! Doesn't really justify for the "vectormay be slower for a few nanosecs" comments.

这大约慢了 3 - 4 倍！并不能真正证明“vector可能会慢几纳秒”的评论。

And the code I used:

我使用的代码：

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector()
{
    TestTimer t("UseVector");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPushBack");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        free(pixels);
    }
}

int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

Am I doing it wrong or something? Or have I just busted this performance myth?

我做错了还是什么？还是我刚刚打破了这个性能神话？

I'm using Release mode in Visual Studio 2005.

我在Visual Studio 2005 中使用 Release 模式。

In Visual C++, #define _SECURE_SCL 0reduces UseVectorby half (bringing it down to 4 seconds). This is really huge, IMO.

在Visual C++ 中，#define _SECURE_SCL 0减少UseVector一半（减少到 4 秒）。这真的很大，IMO。

Answer 1

回答by Martin York

Using the following:

使用以下内容：

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray completed in 2.196 seconds
UseVector completed in 4.412 seconds
UseVectorPushBack completed in 8.017 seconds
The whole thing completed in 14.626 seconds

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseArray
在 2.196 秒
内完成 UseVector在 4.412 秒内完成 UseVectorPushBack在 8.017 秒内完成
整个事情在 14.626 秒内完成

So array is twice as quick as vector.

所以数组的速度是向量的两倍。

Butafter looking at the code in more detail this is expected; as you run across the vector twice and the array only once. Note: when you resize()the vector you are not only allocating the memory but also running through the vector and calling the constructor on each member.

但是在更详细地查看代码之后，这是意料之中的；当你穿过向量两次而数组只跑一次时。注意：当你resize()使用向量时，你不仅要分配内存，还要运行向量并在每个成员上调用构造函数。

Re-Arranging the code slightly so that the vector only initializes each object once:

稍微重新排列代码，以便向量只初始化每个对象一次：

 std::vector<Pixel>  pixels(dimensions * dimensions, Pixel(255,0,0));

Now doing the same timing again:

现在再次执行相同的计时：

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector completed in 2.216 seconds

g++ -O3 Time.cpp -I <MyBoost>
./a.out
UseVector 在 2.216 秒内完成

The vector now performance only slightly worse than the array. IMO this difference is insignificant and could be caused by a whole bunch of things not associated with the test.

向量现在的性能只比数组稍差。IMO 这种差异微不足道，可能是由与测试无关的一大堆事情引起的。

I would also take into account that you are not correctly initializing/Destroying the Pixel object in the UseArrray()method as neither constructor/destructor is not called (this may not be an issue for this simple class but anything slightly more complex (ie with pointers or members with pointers) will cause problems.

我还会考虑到您没有正确初始化/销毁方法中的 Pixel 对象，UseArrray()因为没有调用构造函数/析构函数（这可能不是这个简单类的问题，但稍微复杂一点的问题（即使用指针或成员）带指针）会导致问题。

Answer 2

回答by John Kugelman

Great question. I came in here expecting to find some simple fix that would speed the vector tests right up. That didn't work out quite like I expected!

很好的问题。我来到这里是希望找到一些简单的修复方法来加快矢量测试的速度。这并不像我预期的那样成功！

Optimization helps, but it's not enough. With optimization on I'm still seeing a 2X performance difference between UseArray and UseVector. Interestingly, UseVector was significantly slower than UseVectorPushBack without optimization.

优化有帮助，但这还不够。通过优化，我仍然看到 UseArray 和 UseVector 之间的 2X 性能差异。有趣的是，在没有优化的情况下，UseVector 比 UseVectorPushBack 慢得多。

# g++ -Wall -Wextra -pedantic -o vector vector.cpp
# ./vector
UseArray completed in 20.68 seconds
UseVector completed in 120.509 seconds
UseVectorPushBack completed in 37.654 seconds
The whole thing completed in 178.845 seconds
# g++ -Wall -Wextra -pedantic -O3 -o vector vector.cpp
# ./vector
UseArray completed in 3.09 seconds
UseVector completed in 6.09 seconds
UseVectorPushBack completed in 9.847 seconds
The whole thing completed in 19.028 seconds

Idea #1 - Use new[] instead of malloc

想法 #1 - 使用 new[] 而不是 malloc

I tried changing malloc()to new[]in UseArray so the objects would get constructed. And changing from individual field assignment to assigning a Pixel instance. Oh, and renaming the inner loop variable to j.

我尝试在 UseArray 中更改malloc()为new[]，以便构建对象。并从单个字段分配更改为分配 Pixel 实例。哦，并将内部循环变量重命名为j.

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {   
        int dimension = 999;

        // Same speed as malloc().
        Pixel * pixels = new Pixel[dimension * dimension];

        for(int j = 0 ; j < dimension * dimension; ++j)
            pixels[j] = Pixel(255, 0, 0);

        delete[] pixels;
    }
}

Surprisingly (to me), none of those changes made any difference whatsoever. Not even the change to new[]which will default construct all of the Pixels. It seems that gcc can optimize out the default constructor calls when using new[], but not when using vector.

令人惊讶的是（对我而言），这些变化都没有产生任何影响。甚至不是new[]默认构造所有像素的更改。似乎 gcc 在 using 时可以优化默认构造函数调用new[]，但在 using时则不能vector。

Idea #2 - Remove repeated operator[] calls

想法#2 - 删除重复的 operator[] 调用

I also attempted to get rid of the triple operator[]lookup and cache the reference to pixels[j]. That actually slowed UseVector down! Oops.

我还试图摆脱三重operator[]查找并缓存对pixels[j]. 这实际上减慢了 UseVector 的速度！哎呀。

for(int j = 0; j < dimension * dimension; ++j)
{
    // Slower than accessing pixels[j] three times.
    Pixel &pixel = pixels[j];
    pixel.r = 255;
    pixel.g = 0;
    pixel.b = 0;
}

# ./vector 
UseArray completed in 3.226 seconds
UseVector completed in 7.54 seconds
UseVectorPushBack completed in 9.859 seconds
The whole thing completed in 20.626 seconds

Idea #3 - Remove constructors

想法 #3 - 删除构造函数

What about removing the constructors entirely? Then perhaps gcc can optimize out the construction of all of the objects when the vectors are created. What happens if we change Pixel to:

完全删除构造函数怎么样？然后也许 gcc 可以在创建向量时优化所有对象的构造。如果我们将 Pixel 更改为：

struct Pixel
{
    unsigned char r, g, b;
};

Result: about 10% faster. Still slower than an array. Hm.

结果：大约快 10%。仍然比数组慢。嗯。

# ./vector 
UseArray completed in 3.239 seconds
UseVector completed in 5.567 seconds

Idea #4 - Use iterator instead of loop index

想法 #4 - 使用迭代器而不是循环索引

How about using a vector<Pixel>::iteratorinstead of a loop index?

使用 avector<Pixel>::iterator而不是循环索引怎么样？

for (std::vector<Pixel>::iterator j = pixels.begin(); j != pixels.end(); ++j)
{
    j->r = 255;
    j->g = 0;
    j->b = 0;
}

Result:

结果：

# ./vector 
UseArray completed in 3.264 seconds
UseVector completed in 5.443 seconds

Nope, no different. At least it's not slower. I thought this would have performance similar to #2 where I used a Pixel&reference.

不，没有什么不同。至少不会变慢。我认为这将具有类似于我使用Pixel&参考的#2 的性能。

Conclusion

结论

Even if some smart cookie figures out how to make the vector loop as fast as the array one, this does not speak well of the default behavior of std::vector. So much for the compiler being smart enough to optimize out all the C++ness and make STL containers as fast as raw arrays.

即使一些聪明的 cookie 想出如何使向量循环与数组一样快，这也不能很好地说明std::vector. 编译器足够聪明，可以优化所有 C++ 特性并使 STL 容器与原始数组一样快。

The bottom line is that the compiler is unable to optimize away the no-op default constructor calls when using std::vector. If you use plain new[]it optimizes them away just fine. But not with std::vector. Even if you can rewrite your code to eliminate the constructor calls that flies in face of the mantra around here: "The compiler is smarter than you. The STL is just as fast as plain C. Don't worry about it."

最重要的是，编译器在使用std::vector. 如果你使用普通的，new[]它可以很好地优化它们。但不是与std::vector. 即使你可以重写你的代码来消除那些面对这里的口头禅的构造函数调用：“编译器比你更聪明。STL 和普通 C 一样快。别担心。”

Answer 3

回答by Yakk - Adam Nevraumont

This is an old but popular question.

这是一个古老但流行的问题。

At this point, many programmers will be working in C++11. And in C++11 the OP's code as written runs equally fast for UseArrayor UseVector.

此时，许多程序员将使用 C++11。在 C++11 中，OP 编写的代码对于UseArrayor运行同样快UseVector。

UseVector completed in 3.74482 seconds
UseArray completed in 3.70414 seconds

The fundamental problem was that while your Pixelstructure was uninitialized, std::vector<T>::resize( size_t, T const&=T() )takes a default constructed Pixeland copies it. The compiler did not notice it was being asked to copy uninitialized data, so it actually performed the copy.

根本问题是，虽然您的Pixel结构未初始化，但std::vector<T>::resize( size_t, T const&=T() )采用默认构造Pixel并复制它。编译器没有注意到它被要求复制未初始化的数据，所以它实际上执行了复制。

In C++11, std::vector<T>::resizehas two overloads. The first is std::vector<T>::resize(size_t), the other is std::vector<T>::resize(size_t, T const&). This means when you invoke resizewithout a second argument, it simply default constructs, and the compiler is smart enough to realize that default construction does nothing, so it skips the pass over the buffer.

在 C++11 中，std::vector<T>::resize有两个重载。第一个是std::vector<T>::resize(size_t)，另一个是std::vector<T>::resize(size_t, T const&)。这意味着当您在resize没有第二个参数的情况下调用时，它只是默认构造，并且编译器足够聪明，可以意识到默认构造什么都不做，因此它跳过缓冲区的传递。

(The two overloads where added to handle movable, constructable and non-copyable types -- the performance improvement when working on uninitialized data is a bonus).

（添加用于处理可移动、可构造和不可复制类型的两个重载——处理未初始化数据时的性能改进是一个奖励）。

The push_backsolution also does fencepost checking, which slows it down, so it remains slower than the mallocversion.

该push_back解决方案还进行了围栏检查，这会减慢速度，因此它仍然比malloc版本慢。

live example(I also replaced the timer with chrono::high_resolution_clock).

现场示例（我也用替换了计时器chrono::high_resolution_clock）。

Note that if you have a structure that usually requires initialization, but you want to handle it after growing your buffer, you can do this with a custom std::vectorallocator. If you want to then move it into a more normal std::vector, I believe careful use of allocator_traitsand overriding of ==might pull that off, but am unsure.

请注意，如果您有一个通常需要初始化的结构，但您想在增加缓冲区后处理它，则可以使用自定义std::vector分配器来执行此操作。如果您想将其移动到更正常的状态std::vector，我相信仔细使用allocator_traits和覆盖==可能会实现这一点，但我不确定。

Answer 4

回答by camh

To be fair, you cannot compare a C++ implementation to a C implementation, as I would call your malloc version. malloc does not create objects - it only allocates raw memory. That you then treat that memory as objects without calling the constructor is poor C++ (possibly invalid - I'll leave that to the language lawyers).

公平地说，您不能将 C++ 实现与 C 实现进行比较，因为我会称您为 malloc 版本。malloc 不创建对象 - 它只分配原始内存。然后将该内存视为对象而不调用构造函数是糟糕的 C++（可能无效 - 我将把它留给语言律师）。

That said, simply changing the malloc to new Pixel[dimensions*dimensions]and free to delete [] pixelsdoes not make much difference with the simple implementation of Pixel that you have. Here's the results on my box (E6600, 64-bit):

也就是说，简单地将 malloc 更改为new Pixel[dimensions*dimensions]和 freedelete [] pixels与您拥有的 Pixel 的简单实现没有太大区别。这是我的盒子（E6600，64 位）上的结果：

UseArray completed in 0.269 seconds
UseVector completed in 1.665 seconds
UseVectorPushBack completed in 7.309 seconds
The whole thing completed in 9.244 seconds

But with a slight change, the tables turn:

但是稍有变化，表格就变成了：

Pixel.h

像素.h

struct Pixel
{
    Pixel();
    Pixel(unsigned char r, unsigned char g, unsigned char b);

    unsigned char r, g, b;
};

Pixel.cc

像素.cc

#include "Pixel.h"

Pixel::Pixel() {}
Pixel::Pixel(unsigned char r, unsigned char g, unsigned char b) 
  : r(r), g(g), b(b) {}

main.cc

主文件

#include "Pixel.h"
[rest of test harness without class Pixel]
[UseArray now uses new/delete not malloc/free]

Compiled this way:

这样编译：

$ g++ -O3 -c -o Pixel.o Pixel.cc
$ g++ -O3 -c -o main.o main.cc
$ g++ -o main main.o Pixel.o

we get very different results:

我们得到非常不同的结果：

UseArray completed in 2.78 seconds
UseVector completed in 1.651 seconds
UseVectorPushBack completed in 7.826 seconds
The whole thing completed in 12.258 seconds

With a non-inlined constructor for Pixel, std::vector now beats a raw array.

使用 Pixel 的非内联构造函数，std::vector 现在胜过原始数组。

It would appear that the complexity of allocation through std::vector and std:allocator is too much to be optimised as effectively as a simple new Pixel[n]. However, we can see that the problem is simply with the allocation not the vector access by tweaking a couple of the test functions to create the vector/array once by moving it outside the loop:

通过 std::vector 和 std:allocator 分配的复杂性似乎太多了，无法像简单的new Pixel[n]. 但是，我们可以通过调整几个测试函数将向量/数组移到循环外来创建向量/数组，我们可以看到问题仅仅是分配而不是向量访问：

void UseVector()
{
    TestTimer t("UseVector");

    int dimension = 999;
    std::vector<Pixel> pixels;
    pixels.resize(dimension * dimension);

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

and

和

void UseArray()
{
    TestTimer t("UseArray");

    int dimension = 999;
    Pixel * pixels = new Pixel[dimension * dimension];

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
    delete [] pixels;
}

We get these results now:

我们现在得到这些结果：

UseArray completed in 0.254 seconds
UseVector completed in 0.249 seconds
UseVectorPushBack completed in 7.298 seconds
The whole thing completed in 7.802 seconds

What we can learn from this is that std::vector is comparable to a raw array for access, but if you need to create and delete the vector/array many times, creating a complex object will be more time consuming that creating a simple array when the element's constructor is not inlined. I don't think that this is very surprising.

从中我们可以了解到，std::vector 相当于一个原始数组进行访问，但是如果你需要多次创建和删除向量/数组，创建一个复杂的对象会比创建一个简单的数组更耗时当元素的构造函数没有内联时。我不认为这是非常令人惊讶的。

Answer 5

回答by jalf

Try with this:

试试这个：

void UseVectorCtor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
    }
}

I get almost exactly the same performance as with array.

我得到的性能几乎与数组完全相同。

The thing about vectoris that it's a much more general tool than an array. And that means you have to consider howyou use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initialize Nelements with value X.

问题vector是它是一个比数组更通用的工具。这意味着您必须考虑如何使用它。它可以以多种不同的方式使用，提供数组甚至没有的功能。如果你为了你的目的“错误地”使用它，你会产生很多开销，但如果你正确使用它，它通常基本上是一个零开销的数据结构。在这种情况下，问题在于您单独初始化了向量（导致所有元素都调用了它们的默认构造函数），然后用正确的值单独覆盖每个元素。对于编译器来说，这比使用数组做同样的事情更难优化。这就是为什么 vector 提供了一个构造函数来让你做到这一点：NX.

And when you use that, the vector is just as fast as an array.

当你使用它时，向量和数组一样快。

So no, you haven't busted the performance myth. But you have shown that it's only true if you use the vector optimally, which is a pretty good point too. :)

所以不，你还没有打破性能神话。但是你已经证明只有当你最佳地使用向量时它才是正确的，这也是一个很好的观点。:)

On the bright side, it's really the simplestusage that turns out to be fastest. If you contrast my code snippet (a single line) with John Kugelman's answer, containing heaps and heaps of tweaks and optimizations, which still don't quite eliminate the performance difference, it's pretty clear that vectoris pretty cleverly designed after all. You don't have to jump through hoops to get speed equal to an array. On the contrary, you have to use the simplest possible solution.

从好的方面来说，它确实是最简单的用法，但结果却是最快的。如果您将我的代码片段（单行）与 John Kugelman 的答案进行对比，其中包含大量的调整和优化，但仍然不能完全消除性能差异，很明显这vector毕竟是非常巧妙的设计。您不必跳过箍来获得与数组相等的速度。相反，您必须使用最简单的解决方案。

Answer 6

回答by deceleratedcaviar

It was hardly a fair comparison when I first looked at your code; I definitely thought you weren't comparing apples with apples. So I thought, let's get constructors and destructors being called on all tests; and then compare.

当我第一次查看您的代码时，这几乎不是一个公平的比较。我绝对认为你不是在拿苹果和苹果比较。所以我想，让我们在所有测试中调用构造函数和析构函数；然后比较。

const size_t dimension = 1000;

void UseArray() {
    TestTimer t("UseArray");
    for(size_t j = 0; j < dimension; ++j) {
        Pixel* pixels = new Pixel[dimension * dimension];
        for(size_t i = 0 ; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
        delete[] pixels;
    }
}

void UseVector() {
    TestTimer t("UseVector");
    for(size_t j = 0; j < dimension; ++j) {
        std::vector<Pixel> pixels(dimension * dimension);
        for(size_t i = 0; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
    }
}

int main() {
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();

    return 0;
}

My thoughts were, that with this setup, they should be exactlythe same. It turns out, I was wrong.

我的想法是，通过这种设置，它们应该完全相同。事实证明，我错了。

UseArray completed in 3.06 seconds
UseVector completed in 4.087 seconds
The whole thing completed in 10.14 seconds

So why did this 30% performance loss even occur? The STL has everything in headers, so it should have been possible for the compiler to understand everything that was required.

那么为什么会出现这 30% 的性能损失呢？STL 包含头文件中的所有内容，因此编译器应该可以理解所需的所有内容。

My thoughts were that it is in how the loop initialises all values to the default constructor. So I performed a test:

我的想法是循环如何将所有值初始化为默认构造函数。所以我进行了一个测试：

class Tester {
public:
    static int count;
    static int count2;
    Tester() { count++; }
    Tester(const Tester&) { count2++; }
};
int Tester::count = 0;
int Tester::count2 = 0;

int main() {
    std::vector<Tester> myvec(300);
    printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);

    return 0;
}

The results were as I suspected:

结果如我所料：

Default Constructed: 1
Copy Constructed: 300

This is clearly the source of the slowdown, the fact that the vector uses the copy constructor to initialise the elements from a default constructed object.

这显然是减速的根源，事实上向量使用复制构造函数从默认构造对象初始化元素。

This means, that the following pseudo-operation order is happening during construction of the vector:

这意味着，在构建向量期间会发生以下伪操作顺序：

Pixel pixel;
for (auto i = 0; i < N; ++i) vector[i] = pixel;

Which, due to the implicit copy constructor made by the compiler, is expanded to the following:

由于编译器的隐式复制构造函数，其中扩展为以下内容：

Pixel pixel;
for (auto i = 0; i < N; ++i) {
    vector[i].r = pixel.r;
    vector[i].g = pixel.g;
    vector[i].b = pixel.b;
}

So the default Pixelremains un-initialised, while the rest are initialisedwith the default Pixel's un-initialisedvalues.

因此默认值Pixel保持未初始化，而其余的则使用默认值Pixel的未初始化值进行初始化。

Compared to the alternative situation with New[]/Delete[]:

与New[]/的替代情况相比Delete[]：

int main() {
    Tester* myvec = new Tester[300];

    printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);

    delete[] myvec;

    return 0;
}

Default Constructed: 300
Copy Constructed: 0

They are all left to their un-initialised values, and without the double iteration over the sequence.

它们都保留为未初始化的值，并且没有对序列进行双重迭代。

Armed with this information, how can we test it? Let's try over-writing the implicit copy constructor.

有了这些信息，我们如何测试呢？让我们尝试覆盖隐式复制构造函数。

Pixel(const Pixel&) {}

And the results?

结果呢？

UseArray completed in 2.617 seconds
UseVector completed in 2.682 seconds
The whole thing completed in 5.301 seconds

So in summary, if you're making hundreds of vectors very often: re-think your algorithm.

总而言之，如果您经常制作数百个向量：请重新考虑您的算法。

In any case, the STLimplementation isn't slower for some unknown reason, it just does exactly what you ask; hoping you know better.

在任何情况下，由于某些未知原因，STL实现并不会变慢，它只是完全按照您的要求执行；希望你知道得更好。

Answer 7

回答by kloffy

Try disabling checked iteratorsand building in release mode. You shouldn't see much of a performance difference.

尝试禁用已检查的迭代器并在发布模式下构建。您不应该看到太大的性能差异。

Answer 8

回答by Tony Delroy

GNU's STL (and others), given vector<T>(n), default constructs a prototypal object T()- the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's __uninitialized_fill_n_aux, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent stackoverflow question about this very point: the construct/copy can be more efficient for reference counted objects etc..

GNU 的 STL（和其他），vector<T>(n)默认构造一个原型对象T()- 编译器将优化掉空的构造函数 - 但是现在为对象保留的内存地址中发生的任何垃圾的副本都被 STL 获取__uninitialized_fill_n_aux，这循环填充该对象的副本作为向量中的默认值。所以，“我的”STL 不是循环构造，而是构造然后循环/复制。这是违反直觉的，但我应该记住，因为我评论了最近关于这一点的 stackoverflow 问题：构造/复制对于引用计数对象等可以更有效。

So:

所以：

vector<T> x(n);

or

或者

vector<T> x;
x.resize(n);

is - on many STL implementations - something like:

是 - 在许多 STL 实现中 - 类似于：

T temp;
for (int i = 0; i < n; ++i)
    x[i] = temp;

The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.

问题在于，从 temp 是未初始化的垃圾的洞察力来看，当前一代编译器优化器似乎不起作用，并且无法优化循环和默认复制构造函数调用。您可以可信地争辩说编译器绝对不应该优化它，因为编写上述内容的程序员有一个合理的期望，即所有对象在循环后都是相同的，即使是垃圾（关于“相同”/operator== vs memcmp/operator= 等适用）。不能期望编译器对 std::vector<> 的更大上下文有任何额外的了解，或者对表明此优化安全的数据的后期使用有任何额外的了解。

This can be contrasted with the more obvious, direct implementation:

这可以与更明显的直接实现形成对比：

for (int i = 0; i < n; ++i)
    x[i] = T();

Which we can expect a compiler to optimise out.

我们可以期望编译器对其进行优化。

To be a bit more explicit about the justification for this aspect of vector's behaviour, consider:

为了更明确地说明 vector 行为的这一方面的理由，请考虑：

std::vector<big_reference_counted_object> x(10000);

Clearly it's a major difference if we make 10000 independent objects versus 10000 referencing the same data. There's a reasonable argument that the advantage of protecting casual C++ users from accidentally doing something so expensive outweights the very small real-world cost of hard-to-optimise copy construction.

显然，如果我们制作 10000 个独立对象与 10000 个引用相同数据，这是一个主要区别。有一个合理的论点是，保护临时 C++ 用户免于意外地做如此昂贵的事情的优势超过了难以优化的副本构造的非常小的现实成本。

ORIGINAL ANSWER (for reference / making sense of the comments): No chance. vector is as fast as an array, at least if you reserve space sensibly. ...

原始答案（供参考/理解评论）：没有机会。vector 与数组一样快，至少如果您明智地保留空间。...

Answer 9

回答by j_random_hacker

Martin York's answerbothers me because it seems like an attempt to brush the initialisation problem under the carpet. But he is right to identify redundant default construction as the source of performance problems.

Martin York 的回答让我感到困扰，因为这似乎是试图将初始化问题隐藏在地毯下。但他将多余的默认构造确定为性能问题的根源是正确的。

[EDIT: Martin's answer no longer suggests changing the default constructor.]

[编辑：马丁的回答不再建议更改默认构造函数。]

For the immediate problem at hand, you could certainly call the 2-parameter version of the vector<Pixel>ctor instead:

对于手头的直接问题，您当然可以调用vector<Pixel>ctor的 2 参数版本：

std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));

That works if you want to initialise with a constant value, which is a common case. But the more general problem is: How can you efficiently initialise with something more complicated than a constant value?

如果你想用一个常数值初始化，这是一种常见的情况。但更普遍的问题是：如何用比常量值更复杂的东西有效地初始化？

For this you can use a back_insert_iterator, which is an iterator adaptor. Here's an example with a vector of ints, although the general idea works just as well for Pixels:

为此，您可以使用 a back_insert_iterator，它是一个迭代器适配器。这是一个带有ints向量的示例，尽管总体思路同样适用于Pixels：

#include <iterator>
// Simple functor return a list of squares: 1, 4, 9, 16...
struct squares {
    squares() { i = 0; }
    int operator()() const { ++i; return i * i; }

private:
    int i;
};

...

std::vector<int> v;
v.reserve(someSize);     // To make insertions efficient
std::generate_n(std::back_inserter(v), someSize, squares());

Alternatively you could use copy()or transform()instead of generate_n().

或者，您可以使用copy()或transform()代替generate_n()。

The downside is that the logic to construct the initial values needs to be moved into a separate class, which is less convenient than having it in-place (although lambdas in C++1x make this much nicer). Also I expect this will still not be as fast as a malloc()-based non-STL version, but I expect it will be close, since it only does one construction for each element.

缺点是构造初始值的逻辑需要移到一个单独的类中，这比将其放在适当的位置方便（尽管 C++1x 中的 lambda 使这更好）。此外，我预计这仍然不会像malloc()基于 -based 的非 STL 版本一样快，但我预计它会很接近，因为它只对每个元素进行一个构造。

Answer 10

回答by Graham Perks

The vector ones are additionally calling Pixel constructors.

向量的另外调用像素构造函数。

Each is causing almost a million ctor runs that you're timing.

每个都会导致您计时的近一百万次 ctor 运行。

edit: then there's the outer 1...1000 loop, so make that a billion ctor calls!

编辑：然后是外部 1...1000 循环，因此调用 10 亿个 ctor！

edit 2: it'd be interesting to see the disassembly for the UseArray case. An optimizer could optimize the whole thing away, since it has no effect other than burning CPU.

编辑 2：看到 UseArray 案例的反汇编会很有趣。优化器可以优化整个事情，因为它除了消耗 CPU 之外没有任何影响。

C++ std::vector 比普通数组慢这么多吗？

提问by kizzx2

回答by Martin York

回答by John Kugelman

Idea #1 - Use new[] instead of malloc

想法 #1 - 使用 new[] 而不是 malloc

Idea #2 - Remove repeated operator[] calls

想法#2 - 删除重复的 operator[] 调用

Idea #3 - Remove constructors

想法 #3 - 删除构造函数

Idea #4 - Use iterator instead of loop index

想法 #4 - 使用迭代器而不是循环索引

Conclusion

结论

回答by Yakk - Adam Nevraumont

回答by camh

Pixel.h

像素.h

Pixel.cc

像素.cc

main.cc

主文件

回答by jalf

回答by deceleratedcaviar

回答by kloffy

回答by Tony Delroy

回答by j_random_hacker

回答by Graham Perks

相关推荐

最近更新

标签

C++ std::vector 比普通数组慢这么多吗？

提问by kizzx2

回答by Martin York

回答by John Kugelman

Idea #1 - Use new[] instead of malloc

想法 #1 - 使用 new[] 而不是 malloc

Idea #2 - Remove repeated operator[] calls

想法#2 - 删除重复的 operator[] 调用

Idea #3 - Remove constructors

想法 #3 - 删除构造函数

Idea #4 - Use iterator instead of loop index

想法 #4 - 使用迭代器而不是循环索引

Conclusion

结论

回答by Yakk - Adam Nevraumont

回答by camh

Pixel.h

像素.h

Pixel.cc

像素.cc

main.cc

主文件

回答by jalf

回答by deceleratedcaviar

回答by kloffy

回答by Tony Delroy

回答by j_random_hacker

回答by Graham Perks

相关推荐

如何在 C/C++ 中编写一个简单的整数循环缓冲区？

编写跨平台 C++ 代码（Windows、Linux 和 Mac OSX）

C++ 多个文件中的全局变量

C++ shared_ptr：可怕的速度

相关推荐

最近更新

标签