现实世界中的 C++ std::vector vs 数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6462985/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C++ std::vector vs array in the real world
提问by GRardB
I'm new to C++. I'm reading "Beginning C++ Through Game Programming" by Michael Dawson. However, I'm not new to programming in general. I just finished a chapter that dealt with vectors, so I've got a question about their use in the real world (I'm a computer science student, so I don't have much real-world experience yet).
我是 C++ 的新手。我正在阅读 Michael Dawson 的“通过游戏编程开始 C++”。但是,我对编程并不陌生。我刚刚完成了处理向量的一章,所以我有一个关于它们在现实世界中使用的问题(我是一名计算机科学专业的学生,所以我还没有太多的现实世界经验)。
The author has a Q/A at the end of each chapter, and one of them was:
作者在每章末尾都有一个问答,其中一个是:
Q: When should I use a vector instead of an array?
A: Almost always. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
问:我什么时候应该使用向量而不是数组?
答:几乎总是。矢量是高效且灵活的。它们确实需要比数组多一点的内存,但这种权衡几乎总是值得的。
What do you guys think? I remember learning about vectors in a Java book, but we didn't cover them at all in my Intro to Comp. Sci. class, nor my Data Structures class at college. I've also never seen them used in any programming assignments (Java and C). This makes me feel like they're not used very much, although I know that school code and real-world code can be extremely different.
你们有什么感想?我记得在一本 Java 书中学习了向量,但我们在 Comp 的介绍中根本没有涉及它们。科学。课,也不是我在大学的数据结构课。我也从未见过它们用于任何编程作业(Java 和 C)。这让我觉得它们并没有被广泛使用,尽管我知道学校代码和现实世界的代码可能非常不同。
I don't need to be told about the differences between the two data structures; I'm very aware of them. All I want to know is if the author is giving good advice in his Q/A, or if he's simply trying to save beginner programmers from destroying themselves with complexities of managing fixed-size data structures. Also, regardless of what you think of the author's advice, what do you seein the real-world more often?
我不需要被告知这两种数据结构之间的区别;我非常了解他们。我只想知道作者是否在 Q/A 中给出了很好的建议,或者他是否只是试图让初学者避免因管理固定大小数据结构的复杂性而毁了自己。此外,无论您对作者的建议有何看法,您在现实世界中更常看到的是什么?
回答by Tony Delroy
A: Almost always [use a vector instead of an array]. Vectors are efficient and flexible. They do require a little more memory than arrays, but this tradeoff is almost always worth the benefits.
答:几乎总是[使用向量而不是数组]。矢量是高效且灵活的。它们确实需要比数组多一点的内存,但这种权衡几乎总是值得的。
That's an over-simplification. It's fairly common to use arrays, and can be attractive when:
这是一种过度简化。使用数组相当普遍,并且在以下情况下可能很有吸引力:
the elements are specified at compile time, e.g.
const char project[] = "Super Server";
,const Colours colours[] = { Green, Yellow }
;- with C++11 it will be equally concise to initialise
std::vector
s with values
- with C++11 it will be equally concise to initialise
the number of elements is inherently fixed, e.g.
const char* const bool_to_str[] = { "false", "true" };
,Piece chess_board[8][8];
first-use performance is critical: with arrays of constants the compiler can often write a memory snapshot of the fully pre-initialised objects into the executable image, which is then page-faulted directly into place ready for use, so it's typically much faster that run-time heap allocation (
new[]
) followed by serialised construction of objectscompiler-generated tables of
const
data can always be safely read by multiple threads, whereas data constructed at run-time must complete construction before other code triggered by constructors for non-function-localstatic
variables attempts to use that data: you end up needing some manner of Singleton (possibly threadsafe which will be even slower)In C++03,
vector
s created with an initial size would construct one prototypical element object then copy construct each data member. That meant that even for types where construction was deliberately left as a no-operation, there was still a cost to copy the data elements - replicating their whatever-garbage-was-left-in-memory values. Clearly an array of uninitialised elements is faster.
One of the powerful features of C++ is that often you can write a
class
(orstruct
) that exactly models the memory layout required by a specific protocol, then aim a class-pointer at the memory you need to work with to conveniently interpret or assign values. For better or worse, many such protocols often embed small fixed sized arrays.There's a decades-old hack for putting an array of 1 element (or even 0 if your compiler allows it as an extension) at the end of a struct/class, aiming a pointer to the struct type at some larger data area, and accessing array elements off the end of the struct based on prior knowledge of the memory availability and content (if reading before writing) - see What's the need of array with zero elements?
classes/structures containing arrays can still be POD types
arrays facilitate access in shared memory from multiple processes (by default
vector
's internal pointers to the actual dynamically allocated data won't be in shared memory or meaningful across processes, and it was famously difficult to force C++03vector
s to use shared memory like this even when specifying a custom allocator template parameter).embedding arrays can localise memory access requirement, improving cache hits and therefore performance
元素在编译时指定,例如
const char project[] = "Super Server";
,const Colours colours[] = { Green, Yellow }
;- 使用 C++11,
std::vector
用值初始化s将同样简洁
- 使用 C++11,
元素的数量本质上是固定的,例如
const char* const bool_to_str[] = { "false", "true" };
,Piece chess_board[8][8];
首次使用性能至关重要:对于常量数组,编译器通常可以将完全预初始化对象的内存快照写入可执行映像,然后将其直接进行页面错误处理以备使用,因此通常要快得多运行时堆分配 (
new[]
),然后是对象的序列化构造编译器生成的
const
数据表始终可以由多个线程安全读取,而在运行时构造的数据必须在非函数局部static
变量的构造函数触发的其他代码尝试使用该数据之前完成构造:您最终需要某种方式单例(可能是线程安全的,这会更慢)在 C++03 中,
vector
使用初始大小创建的 s 将构造一个原型元素对象,然后复制构造每个数据成员。这意味着即使对于故意将构造保留为无操作的类型,复制数据元素仍然存在成本 - 复制它们在内存中留下的任何垃圾值。显然,未初始化元素的数组更快。
C++ 的强大功能之一是,您通常可以编写一个
class
(或struct
)来精确模拟特定协议所需的内存布局,然后将类指针指向您需要使用的内存以方便地解释或分配值。无论好坏,许多此类协议通常嵌入固定大小的小型数组。在结构/类的末尾放置一个包含 1 个元素的数组(如果您的编译器允许它作为扩展,甚至是 0),则有一个几十年前的技巧,将指向结构类型的指针指向某个更大的数据区域,然后访问基于对内存可用性和内容的先验知识(如果在写入之前读取),结构末尾的数组元素 - 请参阅具有零元素的数组有什么需要?
包含数组的类/结构仍然可以是 POD 类型
数组有助于从多个进程访问共享内存(默认情况下,
vector
指向实际动态分配数据的内部指针不会在共享内存中或跨进程有意义,并且众所周知很难强制 C++03vector
使用共享内存即使在指定自定义分配器模板参数时也是如此)。嵌入数组可以本地化内存访问需求,提高缓存命中率,从而提高性能
That said, if it's not an active pain to use a vector
(in code concision, readability or performance) then you're better off doing so: they've size()
, checked random access via at()
, iterators, resizing (which often becomes necessary as an application "matures") etc.. It's also often easier to change from vector
to some other Standard container should there be a need, and safer/easier to apply Standard algorithms (x.end()
is better than x + sizeof x / sizeof x[0]
any day).
也就是说,如果使用 a vector
(在代码简洁、可读性或性能方面)不是一个积极的痛苦,那么你最好这样做:他们已经size()
通过at()
,迭代器检查随机访问,调整大小(这通常作为应用程序变得必要“成熟”)等。如果vector
需要,从其他标准容器更改通常也更容易,并且更安全/更容易应用标准算法(x.end()
比x + sizeof x / sizeof x[0]
任何一天都好)。
UPDATE: C++11 introduced a std::array<>
, which avoids some of the costs of vector
s - internally using a fixed-sized array to avoid an extra heap allocation/deallocation - while offering some of the benefits and API features: http://en.cppreference.com/w/cpp/container/array.
更新:C++11 引入了 a std::array<>
,它避免了vector
s 的一些成本- 在内部使用固定大小的数组来避免额外的堆分配/释放 - 同时提供一些好处和 API 功能:http://en。 cppreference.com/w/cpp/container/array。
回答by Dan
One of the best reasons to use a vector
as opposed to an array is the RAIIidiom. Basically, in order for c++ code to be exception-safe, any dynamically allocated memory or other resources should be encapsulated within objects. These objects should have destructors that free these resources.
使用 avector
而不是数组的最佳理由之一是RAII习惯用法。基本上,为了使 C++ 代码异常安全,任何动态分配的内存或其他资源都应该封装在对象中。这些对象应该有释放这些资源的析构函数。
When an exception goes unhandled, the ONLY things that are gaurenteed to be called are the destructors of objects on the stack. If you dynamically allocate memory outside of an object, and an uncaught exception is thrown somewhere before it is deleted, you have a memory leak.
当异常未得到处理时,唯一需要调用的是堆栈上对象的析构函数。如果您在对象之外动态分配内存,并且在删除之前在某处抛出未捕获的异常,则会发生内存泄漏。
It's also a nice way to avoid having to remember to use delete
.
这也是避免必须记住使用delete
.
You should also check out std::algorithm
, which provides a lot of common algorithms for vector
and other STL containers.
您还应该查看std::algorithm
,它为vector
和其他 STL 容器提供了许多常用算法。
I have on a few occasions written code with vector
that, in retrospect, probably would have been better with a native array. But in all of these cases, either a Boost::multi_array
or a Blitz::Array
would have been better than either of them.
我有几次用vector
它编写代码,回想起来,使用本机数组可能会更好。但在所有这些情况下,aBoost::multi_array
或 aBlitz::Array
都比它们中的任何一个都好。
回答by Zachary Kraus
I am going to pop my opinion in here for coding large sized array/vectors used in science and engineering.
我将在这里发表我的意见,对科学和工程中使用的大型数组/向量进行编码。
The pointer based arrays in this case can be quite a bit faster especially for standard types. But the pointers add the danger of possible memory leaks. These memory leaks can lead to longer debug cycle. Additionally if you want to make the pointer based array dynamic you have to code this by hand.
在这种情况下,基于指针的数组可以快得多,尤其是对于标准类型。但是指针增加了可能内存泄漏的危险。这些内存泄漏会导致更长的调试周期。此外,如果您想让基于指针的数组动态化,则必须手动编码。
On the other hand vectors are slower for standard types. They also are both dynamic and memory safe as long as you are not storing dynamically allocated pointers in the stl vector.
另一方面,标准类型的向量较慢。只要您不在 stl 向量中存储动态分配的指针,它们也是动态和内存安全的。
In science and engineering the choice depends on the project. how important is speed vs debug time? For example LAAMPS which is a simulation software uses raw pointers that are handled through their memory management class. Speed is priority for this software. A software I am building, i have to balance speed, with memory footprint and debug time. I really dont want to spend a lot of time debugging so i am using the STL vector.
在科学和工程中,选择取决于项目。速度与调试时间有多重要?例如,模拟软件 LAAMPS 使用通过其内存管理类处理的原始指针。该软件优先考虑速度。我正在构建的软件,我必须平衡速度、内存占用和调试时间。我真的不想花很多时间调试,所以我使用了 STL 向量。
I wanted to add some more information to this answer that I discovered from extensive testing of large scale arrays and lots of reading the web. So, another problem with stl vector and large sized arrays (one million +) occurs in how memory gets allocated for these arrays. Stl vector uses the std::allocator class for handling memory. This class is a pool based memory allocator. Under small scale loading the pool based allocation is extremely efficient in terms of speed and memory use. As the size of the vector gets into the millions, the pool based strategy becomes a memory hog. This happens because the pools tendency is to always hold more space than is being currently used by the stl vector.
我想为这个答案添加更多信息,这些信息是我从大规模阵列的广泛测试和大量阅读网络中发现的。因此,stl 向量和大型数组(一百万+)的另一个问题是如何为这些数组分配内存。Stl 向量使用 std::allocator 类来处理内存。这个类是一个基于池的内存分配器。在小规模加载下,基于池的分配在速度和内存使用方面非常有效。随着向量的大小达到数百万,基于池的策略成为内存占用。发生这种情况是因为池倾向于始终持有比 stl 向量当前使用的空间更多的空间。
For large scale vectors you are either better off writing your own vector class or using pointers (raw or some sort of memory management system from boost or the c++ library). There are advantages and disadvantages to both approaches. The choice really depends on the exact problem you are tackling (too many variables to add in here). If you do happen to write your own vector class make sure to allow the vector an easy way to clear its memory. Currently for the Stl vector you need to use swap operations to do something that really should have been built into the class in the first place.
对于大规模向量,您最好编写自己的向量类或使用指针(原始或某种来自 boost 或 c++ 库的内存管理系统)。这两种方法各有优缺点。选择实际上取决于您要解决的确切问题(此处添加的变量太多)。如果您碰巧编写了自己的向量类,请确保为向量提供一种简单的方法来清除其内存。目前对于 Stl 向量,您需要使用交换操作来做一些真正应该首先内置到类中的事情。
回答by Nicol Bolas
A std::vector is just a resizable array. It's not much more than that. It's not something you would learn in a Data Structures class, because it isn't an intelligent data structure.
std::vector 只是一个可调整大小的数组。仅此而已。这不是您在数据结构课程中会学到的东西,因为它不是智能数据结构。
In the real world, I see a lot of arrays. But I also see a lot of legacy codebases that use "C with Classes"-style C++ programming. That doesn't mean that you shouldprogram that way.
在现实世界中,我看到了很多数组。但我也看到很多使用“C with Classes”风格的 C++ 编程的遗留代码库。这并不意味着您应该以这种方式进行编程。
回答by Perception
It's a rare case in the real world where you deal with fixed collections, of a known size. In almost all cases there is a degree of the unknown in exactly what size of data set you will be accommodating in your program. Indeed it is the hallmark of a goodprogram that it can accomodate a wide range of possible scenarios.
在现实世界中,处理已知大小的固定集合是一种罕见的情况。在几乎所有情况下,对于您将在程序中容纳的数据集的确切大小,都有一定程度的未知。事实上,一个好的程序的标志是它可以适应各种可能的场景。
Take these (trivial) scenarios as examples:
以这些(琐碎的)场景为例:
- You have implemented a view controller to track AI combatants in a FPS. The game logic spawns a random number of combatants in various zones every couple of seconds. The player is downing AI combatants at a rate known only at run time.
- A lawyer has accessed the Municipal Court website in his state and is querying the number of new DUI cases that came in over the night. He chooses to filter the list by a set of variables including time the accident occurred, zip code, and arresting officer.
- The operating system needs to maintain a list of memory addresses in use by the various programs running on it. The number of programs and their memory usage changes in unpredictable ways.
- 您已经实现了一个视图控制器来跟踪 FPS 中的 AI 战斗人员。游戏逻辑每隔几秒就会在不同区域产生随机数量的战斗员。玩家正在以仅在运行时已知的速度击落 AI 战斗人员。
- 一位律师访问了他所在州的市法院网站,正在查询当晚收到的新酒驾案件数量。他选择通过一组变量过滤列表,包括事故发生时间、邮政编码和逮捕官员。
- 操作系统需要维护运行在其上的各种程序所使用的内存地址列表。程序的数量及其内存使用以不可预测的方式变化。
In any of these cases a good argument can be made that a variable size list (that accommodates dynamic inserts and deletes) will perform better than a simple array. With the main benefits coming from reduced need to alloc/dealloc memory space for the fixed array as you add or remove elements from it.
在这些情况中的任何一种情况下,都可以提出一个很好的论点,即可变大小的列表(适应动态插入和删除)将比简单的数组表现得更好。主要好处是减少了在固定数组中添加或删除元素时为固定数组分配/释放内存空间的需求。
回答by zvrba
Rule of thumb: if you don't know the number of elements in advance, or if the number of elements is expected to be large (say, more than 10), use vector. Otherwise, you could also use an array. For example, I write a lot of geometry-processing code and I define a line as an ARRAY of 2 coordinates. A line is defined by two points, and it will ALWAYS be defined by exactly two points. Using a vector instead of an array would be overkill in many ways, also performance-wise.
经验法则:如果您事先不知道元素的数量,或者预计元素的数量会很大(比如超过 10 个),请使用向量。否则,您也可以使用数组。例如,我编写了很多几何处理代码,并将一条线定义为 2 个坐标的 ARRAY。一条线由两个点定义,它总是由两个点定义。使用向量而不是数组在很多方面都是过度的,在性能方面也是如此。
Another thing: when I say "array" I really DO MEAN array: a variable declared using an array syntax, such as int evenOddCount[2];
If you consider choosing between a vector and a dynamically-allocated block of memory, such as int *evenOddCount = new int[2];
, the answer is clear: USE VECTOR!
另一件事:当我说“数组”时,我确实是在说数组:使用数组语法声明的变量,例如int evenOddCount[2];
如果您考虑在向量和动态分配的内存块之间进行选择,例如int *evenOddCount = new int[2];
,答案很明确:使用向量!
回答by Geek
As far as arrays are considered, simple integer or string arrays are very easy to use. On the other hand, for common functions like searching,sorting,insertion,removal, you can achieve much faster speed using standard algorithms (built in library functions) on vectors. Specially if you are using vectors of objects. Secondly there is this huge difference that vectors can grow in size dynamically as more objects are inserted. Hope that helps.
就数组而言,简单的整数或字符串数组非常易于使用。另一方面,对于搜索、排序、插入、删除等常见功能,您可以在向量上使用标准算法(内置库函数)实现更快的速度。特别是如果您使用对象的向量。其次,存在巨大的差异,即随着插入更多对象,向量的大小可以动态增长。希望有帮助。