C++ 向量:初始化还是保留?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8928547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Vector: initialization or reserve?
提问by Ale
I know the size of a vector, which is the best procedure to initialize it?:
我知道向量的大小,哪个是初始化它的最佳程序?:
option 1
选项1
vector<int> vec(3); //in .h
vec.at(0)=var1; //in .cpp
vec.at(1)=var2; //in .cpp
vec.at(2)=var3; //in .cpp
option2
选项2
vector<int> vec; //in .h
vec.reserve(3); //in .cpp
vec.push_back(var1); //in .cpp
vec.push_back(var2); //in .cpp
vec.push_back(var3); //in .cpp
I guess the option 2 is better than 1. is it? other options?
我想选项 2 比 1 好。是吗?其他选择?
采纳答案by Sebastian Mach
Both variants have different semantics, i.e. you are comparing apples and oranges.
两种变体都有不同的语义,即您正在比较苹果和橙子。
The first gives you a vector of n default-initialized values, the second variant reserves the memory, but does not initialize them.
第一个为您提供一个包含 n 个默认初始化值的向量,第二个变体保留内存,但不初始化它们。
Choose what better fits your needs, i.e. what is "better" in a certain situation.
选择更适合您的需求,即在特定情况下什么是“更好”的。
回答by UncleBens
The "best" way would be:
“最好”的方式是:
vector<int> vec = {var1, var2, var3};
available with a C++11 capable compiler.
可与支持 C++11 的编译器一起使用。
Not sure exactly what you mean by doing things in a header or implementation files. A mutable global is a no-no for me. If it is a class member, then it can be initialized in the constructor initialization list.
不确定在头文件或实现文件中执行操作的确切含义。一个可变的全局对我来说是禁忌。如果是类成员,则可以在构造函数初始化列表中进行初始化。
Otherwise, option 1 would be generally used if you know how many items you are going to use and the default values (0 for int) would be useful.
Using at
here means that you can't guarantee the index is valid. A situation like that is alarming itself. Even though you will be able to reliably detect problems, it's definitely simpler to use push_back
and stop worrying about getting the indexes right.
否则,如果您知道要使用多少项并且默认值(0 表示 int)会很有用,则通常会使用选项 1。在这里
使用at
意味着您不能保证索引是有效的。这样的情况本身就令人震惊。尽管您将能够可靠地检测问题,但使用起来绝对更简单,push_back
并且无需担心索引是否正确。
In case of option 2, generally it makes zero performance difference whether you reserve memory or not, so it's simpler not to reserve*. Unless perhaps if the vector contains types that are very expensive to copy (and don't provide fast moving in C++11), or the size of the vector is going to be enormous.
在选项2的情况下,无论您是否保留内存,通常性能差异为零,因此不保留更简单*。除非向量包含复制非常昂贵的类型(并且在 C++11 中不提供快速移动),或者向量的大小将是巨大的。
* From Stroustrups C++ Style and Technique FAQ:
* 来自 Stroustrups C++ 风格和技术常见问题解答:
People sometimes worry about the cost of std::vector growing incrementally. I used to worry about that and used reserve() to optimize the growth. After measuring my code and repeatedly having trouble finding the performance benefits of reserve() in real programs, I stopped using it except where it is needed to avoid iterator invalidation (a rare case in my code). Again: measure before you optimize.
人们有时会担心 std::vector 逐渐增长的成本。我曾经担心这一点并使用 Reserve() 来优化增长。在测量我的代码并在实际程序中反复发现 Reserve() 的性能优势后,我停止使用它,除非需要避免迭代器失效(在我的代码中很少见)。再次:在优化之前进行测量。
回答by Apollys supports Monica
Somehow, a non-answer answer that is completely wrong has remained accepted and most upvoted for ~7 years. This is not an apples and oranges question. This is not a question to be answered with vague cliches.
不知何故,一个完全错误的非答案答案仍然被接受,并且在大约 7 年的时间里得到了最多的支持。这不是一个苹果和橘子的问题。这不是一个可以用含糊的陈词滥调来回答的问题。
For a simple rule to follow:
要遵循一个简单的规则:
...but this probably shouldn't be your biggest concern.
...但这可能不是您最关心的问题。
Firstly, the difference is pretty minor. Secondly, as we crank up the compiler optimization, the difference becomes even smaller. For example, on my gcc-5.4.0, the difference is arguably trivial when running level 3 compiler optimization (-O3
):
首先,差异很小。其次,随着我们加快编译器优化,差异变得更小。例如,在我的 gcc-5.4.0 上,运行级别 3 编译器优化 ( -O3
)时,差异可以说是微不足道的:
So in general, I would recommending using method #1 whenever you encounter this situation. However, if you can't remember which one is optimal, it's probably not worth the effort to find out. Just pick either one and move on, because this is unlikely to ever cause a noticeable slowdown in your program as a whole.
所以一般来说,我会建议您在遇到这种情况时使用方法 #1。但是,如果您不记得哪一个是最佳的,则可能不值得努力找出答案。只需选择其中一个并继续,因为这不太可能导致整个程序明显放缓。
These tests were run by sampling random vector sizes from a normal distribution, and then timing the initialization of vectors of these sizes using the two methods. We keep a dummy sum variable to ensure the vector initialization is not optimized out, and we randomize vector sizes and values to make an effort to avoid any errors due to branch prediction, caching, and other such tricks.
这些测试是通过从正态分布中抽样随机向量大小来运行的,然后使用这两种方法对这些大小的向量进行定时初始化。我们保留一个虚拟总和变量以确保向量初始化不会被优化,并且我们随机化向量大小和值以努力避免由于分支预测、缓存和其他此类技巧引起的任何错误。
main.cpp
:
main.cpp
:
/*
* Test constructing and filling a vector in two ways: construction with size
* then assignment versus construction of empty vector followed by push_back
* We collect dummy sums to prevent the compiler from optimizing out computation
*/
#include <iostream>
#include <vector>
#include "rng.hpp"
#include "timer.hpp"
const size_t kMinSize = 1000;
const size_t kMaxSize = 100000;
const double kSizeIncrementFactor = 1.2;
const int kNumVecs = 10000;
int main() {
for (size_t mean_size = kMinSize; mean_size <= kMaxSize;
mean_size = static_cast<size_t>(mean_size * kSizeIncrementFactor)) {
// Generate sizes from normal distribution
std::vector<size_t> sizes_vec;
NormalIntRng<size_t> sizes_rng(mean_size, mean_size / 10.0);
for (int i = 0; i < kNumVecs; ++i) {
sizes_vec.push_back(sizes_rng.GenerateValue());
}
Timer timer;
UniformIntRng<int> values_rng(0, 5);
// Method 1: construct with size, then assign
timer.Reset();
int method_1_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec[i] = values_rng.GenerateValue();
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_1_sum += vec[i];
}
}
double method_1_seconds = timer.GetSeconds();
// Method 2: reserve then push_back
timer.Reset();
int method_2_sum = 0;
for (size_t num_els : sizes_vec) {
std::vector<int> vec;
vec.reserve(num_els);
for (size_t i = 0; i < num_els; ++i) {
vec.push_back(values_rng.GenerateValue());
}
// Compute sum - this part identical for two methods
for (size_t i = 0; i < num_els; ++i) {
method_2_sum += vec[i];
}
}
double method_2_seconds = timer.GetSeconds();
// Report results as mean_size, method_1_seconds, method_2_seconds
std::cout << mean_size << ", " << method_1_seconds << ", " << method_2_seconds;
// Do something with the dummy sums that cannot be optimized out
std::cout << ((method_1_sum > method_2_sum) ? "" : " ") << std::endl;
}
return 0;
}
The header files I used are located here:
我使用的头文件位于这里:
回答by Troyseph
While your examples are essentially the same, it may be that when the type used is not an int
the choice is taken from you. If your type doesn't have a default constructor, or if you'll have to re-construct each element later anyway, I would use reserve
. Just don't fall into the trap I did and use reserve
and then the operator[]
for initialisation!
虽然您的示例本质上是相同的,但当使用的类型不是 an 时,您可能会做出int
选择。如果您的类型没有默认构造函数,或者您以后无论如何都必须重新构造每个元素,我会使用reserve
. 只是不要落入我所做的陷阱reserve
,然后使用operator[]
for 初始化!
Constructor
构造函数
std::vector<MyType> myVec(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
In this first case, using the constructor, size
and numberOfElementsToStart
will be equal and capacity
will be greater than or equal to them.
在第一种情况下,使用构造函数,size
并且numberOfElementsToStart
将等于并且capacity
将大于或等于它们。
Think of myVec as a vector containing a number of items of MyType
which can be accessed and modified, push_back(anotherInstanceOfMyType)
will append it the the end of the vector.
将 myVec 视为一个向量,其中包含许多MyType
可以访问和修改的项目,push_back(anotherInstanceOfMyType)
并将其附加到向量的末尾。
Reserve
预订
std::vector<MyType> myVec;
myVec.reserve(numberOfElementsToStart);
int size = myVec.size();
int capacity = myVec.capacity();
When using the reserve
function, size
will be 0
until you add an element to the array and capacity
will be equal to or greater than numberOfElementsToStart
.
使用该reserve
函数时,size
将是0
直到您向数组添加一个元素并且capacity
将等于或大于numberOfElementsToStart
。
Think of myVec as an emptyvector which can have new items appended to it using push_back
with no memory allocationfor at least the first numberOfElementsToStart
elements.
将 myVec 视为一个空向量,它可以在push_back
不为至少第一个numberOfElementsToStart
元素分配内存的情况下将新项目附加到它。
Note that push_back()
still requires an internal check to ensure that size < capacityand to increment size, so you may want to weigh this against the cost of default construction.
请注意,push_back()
仍然需要进行内部检查以确保size < capacity并增加 size,因此您可能需要将此与默认构造的成本进行权衡。
List initialisation
列表初始化
std::vector<MyType> myVec{ var1, var2, var3 };
This is an additional option for initialising your vector, and while it is only feasible for very small vectors, it is a clear way to initialise a small vector with known values. size
will be equal to the number of elements you initialised it with, and capacity
will be equal to or greater than size. Modern compilers may optimise away the creation of temporary objects and prevent unnecessary copying.
这是初始化向量的附加选项,虽然它仅适用于非常小的向量,但它是初始化具有已知值的小向量的清晰方法。size
将等于您初始化它的元素数,并且capacity
将等于或大于 size。现代编译器可以优化临时对象的创建并防止不必要的复制。
回答by nob
Option 2 is better, as reserve only needs to reserve memory (3 * sizeof(T)), while the first option calls the constructor of the base type for each cell inside the container.
选项2更好,因为reserve只需要预留内存(3 * sizeof(T)),而第一个选项为容器内的每个单元格调用基类型的构造函数。
For C-like types it will probably be the same.
对于类似 C 的类型,它可能是相同的。
回答by Shital Shah
How it Works
这个怎么运作
This is implementation specific however in general Vector data structure internally will have pointer to the memory block where the elements would actually resides. Both GCC and VC++ allocate for 0 elements by default. So you can think of Vector's internal memory pointer to be nullptr
by default.
这是特定于实现的,但一般而言,Vector 数据结构内部将具有指向元素实际驻留的内存块的指针。默认情况下,GCC 和 VC++ 都分配 0 个元素。所以你可以认为 Vector 的内部内存指针是nullptr
默认的。
When you call vector<int> vec(N);
as in your Option 1, the N objects are created using default constructor. This is called fill constructor.
当您vector<int> vec(N);
在选项 1 中调用时,N 个对象是使用默认构造函数创建的。这称为填充构造函数。
When you do vec.reserve(N);
afterdefault constructor as in Option 2, you get data block to hold 3 elements but no objects are created unlike in option 1.
当您像选项 2 一样在默认构造函数vec.reserve(N);
之后执行时,您将获得包含 3 个元素的数据块,但与选项 1 不同,不会创建任何对象。
Why to Select Option 1
为什么选择选项 1
If you know the number of elements vector will hold and you might leave most of the elements to its default values then you might want to use this option.
如果您知道 vector 将保持的元素数量,并且您可能会将大部分元素保留为其默认值,那么您可能想要使用此选项。
Why to Select Option 2
为什么选择选项 2
This option is generally better of the two as it only allocates data block for the future use and not actually filling up with objects created from default constructor.
此选项通常在两者中更好,因为它只分配数据块供将来使用,而不实际填充从默认构造函数创建的对象。
回答by Bo Persson
Another option is to Trust Your Compiler(tm) and do the push_back
s without calling reserve
first. It has to allocate some space when you start adding elements. Perhaps it does that just as well as you would?
另一种选择是信任您的编译器(tm)并在push_back
不reserve
先调用的情况下执行s 。当您开始添加元素时,它必须分配一些空间。也许它和你一样好?
It is "better" to have simpler code that does the same job.
有更简单的代码来完成同样的工作是“更好的”。
回答by haberdar
In the long run, it depends on the usage and numbers of the elements.
从长远来看,这取决于元素的使用和数量。
Run the program below to understand how the compiler reserves space:
运行以下程序以了解编译器如何保留空间:
vector<int> vec; for(int i=0; i<50; i++) { cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl; vec.push_back(i); }
vector<int> vec; for(int i=0; i<50; i++) { cout << "size=" << vec.size() << "capacity=" << vec.capacity() << endl; vec.push_back(i); }
size is the number of actual elements and capacity is the actual size of the array to imlement vector. In my computer, till 10, both are the same. But, when size is 43 the capacity is 63. depending on the number of elements, either may be better. For example, increasing the capacity may be expensive.
size 是实际元素的数量,容量是要实现向量的数组的实际大小。在我的电脑中,直到 10,两者都是相同的。但是,当 size 为 43 时,容量为 63。取决于元素的数量,两者都可能更好。例如,增加容量可能是昂贵的。
回答by Apollys supports Monica
Since it seems 5 years have passed and a wrong answer is still the accepted one, and the most-upvoted answer is completely useless (missed the forest for the trees), I will add a real response.
由于似乎已经过去了 5 年,错误的答案仍然是公认的答案,而最高投票的答案完全没有用(只见树木不见森林),我将添加一个真实的回复。
Method #1: we pass an initial size parameter into the vector (let's call it n
. That means the vector is filled with n
elements, which will be initialized to their default value. For example, if the vector holds int
s, it will be filled with n
zeros.
方法#1:我们通过初始尺寸参数到载体(我们称之为n
这意味着所述载体填充。n
元素,这将被初始化为它们的缺省值。例如,如果向量成立。int
s时,将具有填充n
零。
Method #2: we first create an empty vector. Then we reserve space for n
elements. In this case, we never create the n
elements and thus we never perform any initialization of the elements in the vector. Since we plan to overwrite the values of every element immediately, the lack of initialization will do us no harm. On the other hand, since we have done less overall, this would be the better* option.
方法#2:我们首先创建一个空向量。然后我们为n
元素保留空间。在这种情况下,我们从不创建n
元素,因此我们从不执行向量中元素的任何初始化。由于我们计划立即覆盖每个元素的值,因此缺少初始化不会对我们造成任何伤害。另一方面,由于我们总体上做得较少,这将是更好的*选择。
* better- real definition: never worse. It's always possible a smart compiler will figure out what you're trying to do and optimize it for you.
*更好- 真正的定义:永远不会更糟。智能编译器总是有可能弄清楚您要做什么并为您优化它。
Conclusion: use method #2.
结论:使用方法#2。