C++ STL：数组与向量：原始元素访问性能

Question

提问by oh boy

I'm building an interpreter and as I'm aiming for raw speed this time, every clock cycle matters for me in this (raw) case.

我正在构建一个解释器，因为这次我的目标是原始速度，所以在这种（原始）情况下，每个时钟周期对我都很重要。

Do you have any experience or information what of the both is faster: Vector or Array? All what matters is the speed I can access an element (opcode receiving), I don't care about inserting, allocation, sorting, etc.

您是否有任何经验或信息：向量或数组哪个更快？重要的是我可以访问元素（操作码接收）的速度，我不关心插入、分配、排序等。

I'm going to lean myself out of the window now and say:

我现在要靠在窗外说：

Arrays are at least a bit faster than vectors in terms of accessing an element i.

在访问元素 i 方面，数组至少比向量快一点。

It seems really logical for me. With vectors you have all those security and controlling overhead which doesn't exist for arrays.

这对我来说似乎很合乎逻辑。使用向量，您可以拥有数组不存在的所有安全性和控制开销。

(Why) Am I wrong?

（为什么）我错了吗？

No, I can't ignore the performance difference - even if it is sosmall - I have already optimized and minimized every other part of the VM which executes the opcodes :)

不，我不能忽视的性能差异-即使它是那么小-我已经进行了优化，尽量减少执行该操作码的虚拟机的所有其他部分:)

Answer 1

回答by AnT

Element access time in a typical implementation of a std::vectoris the same as element access time in an ordinary array available through a pointer object(i.e. a run-timepointer value)

a 的典型实现中的std::vector元素访问时间与通过指针对象（即运行时指针值）可用的普通数组中的元素访问时间相同

std::vector<int> v;
int *pa;
...
v[i];
pa[i]; 
// Both have the same access time

However, the access time to an element of an array available as an array objectis better than both of the above accesses (equivalent to access through a compile-timepointer value)

但是，对可用作数组对象的数组元素的访问时间优于上述两种访问（相当于通过编译时指针值访问）

int a[100];
...
a[i];
// Faster than both of the above

For example, a typical read access to an intarray available through a run-time pointer value will look as follows in the compiled code on x86 platform

例如，对int通过运行时指针值可用的数组的典型读取访问在 x86 平台上的编译代码中如下所示

// pa[i]
mov ecx, pa // read pointer value from memory
mov eax, i
mov <result>, dword ptr [ecx + eax * 4]

Access to vector element will look pretty much the same.

对矢量元素的访问看起来几乎相同。

A typical access to a local intarray available as an array object will look as follows

对int可用作数组对象的本地数组的典型访问如下所示

// a[i]
mov eax, i
mov <result>, dword ptr [esp + <offset constant> + eax * 4]

A typical access to a global intarray available as an array object will look as follows

对int可用作数组对象的全局数组的典型访问如下所示

// a[i]
mov eax, i
mov <result>, dword ptr [<absolute address constant> + eax * 4]

The difference in perfromance arises from that extra movinstruction in the first variant, which has to make an extra memory access.

性能差异源于mov第一个变体中的额外指令，它必须进行额外的内存访问。

However, the difference is negligible. And it is easily optimized to the point of being exactly the same in multiple-access context (by loading the target address in a register).

但是，差异可以忽略不计。它很容易优化到在多访问上下文中完全相同（通过将目标地址加载到寄存器中）。

So the statement about "arrays being a bit faster" is correct in narrow case when the array is accessible directly through the array object, not through a pointer object. But the practical value of that difference is virtually nothing.

因此，当数组可通过数组对象直接访问而不是通过指针对象访问时，关于“数组有点快”的说法在窄情况下是正确的。但这种差异的实际价值几乎没有。

Answer 2

回答by Jive Dadson

You may be barking up the wrong tree. Cache misses can be much more important than the number of instructions that get executed.

你可能在吠错树。缓存未命中可能比执行的指令数量重要得多。

Answer 3

回答by Potatoswatter

No. Under the hood, both std::vectorand C++0x std::arrayfind the pointer to element nby adding nto the pointer to the first element.

不。在幕后，std::vectorC++0x 和 C++0x都通过添加到指向第一个元素的指针来std::array找到指向元素的指针。nn

vector::atmay be slower than array::atbecause the former must compare against a variable while the latter compares against a constant. Thoseare the functions that provide bounds checking, not operator[].

vector::at可能array::at比前者慢，因为前者必须与变量进行比较，而后者必须与常量进行比较。这些是提供边界检查的函数，而不是operator[].

If you mean C-style arraysinstead of C++0x std::array, then there is no atmember, but the point remains.

如果您的意思是 C 样式数组而不是 C++0x std::array，那么没有at成员，但重点仍然存在。

EDIT:If you have an opcode table, a global array (such as with externor staticlinkage) may be faster. Elements of a global array would be addressable individually as global variables when a constant is put inside the brackets, and opcodes are often constants.

编辑：如果您有操作码表，全局数组（例如 withextern或staticlinks）可能会更快。当常量放在括号内时，全局数组的元素可以作为全局变量单独寻址，并且操作码通常是常量。

Anyway, this is all premature optimization. If you don't use any of vector's resizing features, it looks enough like an array that you should be able to easily convert between the two.

无论如何，这都是过早的优化。如果您不使用任何vector调整大小的功能，它看起来就像一个数组，您应该能够轻松地在两者之间进行转换。

Answer 4

回答by GManNickG

You're comparing apples to oranges. Arrays have a constant-size and are automatically allocated, while vectors have a dynamic size and are dynamically allocated. Which you use depends on what you need.

你在比较苹果和橙子。数组具有恒定大小并自动分配，而向量具有动态大小并动态分配。您使用哪种取决于您的需要。

Generally, arrays are "faster" to allocate (in quotes because comparison is meaningless) because dynamic allocation is slower. However, accessing an element should be the same. (Granted an array is probably more likely to be in cache, though that doesn't matter after the first access.)

通常，数组分配“更快”（在引号中，因为比较没有意义），因为动态分配更慢。但是，访问元素应该是相同的。（授予一个数组可能更可能在缓存中，尽管在第一次访问后这无关紧要。）

Also, I don't know what "security" you're talking about, vector's have plenty of ways to get undefined behavior just like arrays. Though they have at(), which you don't need to use if you know the index is valid.

另外，我不知道您在谈论什么“安全性”，vector有很多方法可以像数组一样获得未定义的行为。尽管它们有at()，但如果您知道索引有效，则不需要使用它。

Lastly, profile and look at the generated assembly. Nobody's guess is gonna solve anything.

最后，分析并查看生成的程序集。没有人的猜测会解决任何问题。

Answer 5

回答by GManNickG

For decent results, use std::vectoras the backing storage and take a pointer to its first element before your main loop or whatever:

为了获得不错的结果，请std::vector用作后备存储并在主循环或其他任何之前获取指向其第一个元素的指针：

std::vector<T> mem_buf;
// stuff
uint8_t *mem=&mem_buf[0];
for(;;) {
    switch(mem[pc]) {
    // stuff
    }
}

This avoids any issues with over-helpful implementations that perform bounds checking in operator[], and makes single-stepping easier when stepping into expressions such as mem_buf[pc]later in the code.

这避免了执行边界检查的过度帮助实现的任何问题operator[]，并在步入表达式（例如mem_buf[pc]代码的后面）时使单步更容易。

If each instruction does enough work, and the code is varied enough, this should be quicker than using a global array by some negligible amount. (If the difference is noticeable, the opcodes need to be made more complicated.)

如果每条指令都做了足够多的工作，并且代码变化得足够多，那么这应该比使用全局数组快一些可以忽略不计。（如果差异很明显，则需要使操作码更复杂。）

Compared to using a global array, on x86 the instructions for this sort of dispatch should be more concise (no 32-bit displacement fields anywhere), and for more RISC-like targets there should be fewer instructions generated (no TOC lookups or awkward 32-bit constants), as the commonly-used values are all in the stack frame.

与使用全局数组相比，在 x86 上，这种调度的指令应该更简洁（任何地方都没有 32 位位移字段），对于更多类似 RISC 的目标，应该生成更少的指令（没有 TOC 查找或笨拙的 32 -bit 常量），因为常用的值都在堆栈帧中。

I'm not really convinced that optimizing an interpreter's dispatch loop in this way will produce a good return on time invested -- the instructions should really be made to do more, if it's an issue -- but I suppose it shouldn't take long to try out a few different approaches and measure the difference. As always in the event of unexpected behaviour the generated assembly language (and, on x86, the machine code, as instruction length can be a factor) should be consulted to check for obvious inefficiencies.

我真的不相信以这种方式优化解释器的调度循环会在投入的时间上产生良好的回报——如果这是一个问题，指令应该做得更多——但我认为它不应该花很长时间尝试几种不同的方法并衡量差异。与往常一样，在出现意外行为时，应咨询生成的汇编语言（以及在 x86 上，机器代码，因为指令长度可能是一个因素）以检查明显的低效率。

C++ STL：数组与向量：原始元素访问性能

提问by oh boy

回答by AnT

回答by Jive Dadson

回答by Potatoswatter

回答by GManNickG

回答by GManNickG

相关推荐

最近更新

标签

C++ STL：数组与向量：原始元素访问性能

提问by oh boy

回答by AnT

回答by Jive Dadson

回答by Potatoswatter

回答by GManNickG

回答by GManNickG

相关推荐

C++ 如何指定库路径的偏好？

C++ 错误：基类构造函数必须显式初始化父类构造函数

C++ 头文件中的多重定义

C++ CMD.EXE 以上述路径作为当前目录启动。不支持 UNC 路径。默认为 Windows 目录

相关推荐

最近更新

标签