C++ 一维或二维数组，哪个更快？

Question

提问by graywolf

I'm in need of representing a 2D field (axes x, y) and I face a problem: Should I use an 1D array or a 2D array?

我需要表示一个二维字段（x、y 轴），但我面临一个问题：我应该使用一维数组还是二维数组？

I can imagine, that recalculating indices for 1D arrays (y + x*n) could be slower than using 2D array (x, y) but I could image that 1D could be in CPU cache..

我可以想象，重新计算一维数组 (y + x*n) 的索引可能比使用二维数组 (x, y) 慢，但我可以想象一维可能在 CPU 缓存中。

I did some googling, but only found pages regarding static array (and stating that 1D and 2D are basically the same). But my arrays must me dynamic.

我做了一些谷歌搜索，但只找到了关于静态数组的页面（并说明一维和二维基本相同）。但是我的数组必须是动态的。

So, what's

有啥

faster,
smaller (RAM)

快点，
更小 (RAM)

dynamic 1D arrays or dynamic 2D arrays?

动态一维数组还是动态二维数组？

Thanks :)

谢谢：）

Answer 1

回答by Pixelchemist

tl;dr : You should probably use a one-dimensional approach.

tl;dr ：您可能应该使用一维方法。

Note: One cannot dig into detail affecting performance when comparing dynamic 1d or dynamic 2d storage patterns without filling books since the performance of code is dependent one a very large number of parameters. Profile if possible.

注意：在比较动态 1d 或动态 2d 存储模式时，无法深入研究影响性能的细节而无需填写书籍，因为代码的性能取决于大量参数。如果可能，配置文件。

1. What's faster?

1. 什么更快？

For dense matrices the 1D approach is likely to be faster since it offers better memory locality and less allocation and deallocation overhead.

对于密集矩阵，一维方法可能更快，因为它提供更好的内存局部性和更少的分配和释放开销。

2. What's smaller?

2. 什么更小？

Dynamic-1D consumes less memory than the 2D approach. The latter also requires more allocations.

动态一维比二维方法消耗更少的内存。后者也需要更多的分配。

Remarks

?get_1d@@YAHPAHII@Z PROC
push    ebp
mov ebp, esp
mov eax, DWORD PTR _c$[ebp]
lea eax, DWORD PTR [eax+edx*4]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

?get_2d@@YAHPAPAHII@Z PROC
push ebp
mov ebp, esp
mov ecx, DWORD PTR [ecx+edx*4]
mov eax, DWORD PTR _c$[ebp]
mov eax, DWORD PTR [ecx+eax*4]
pop ebp
ret 0

The difference is mov(2d) vs. lea(1d). The former has a latency of 3 cycles and a a maximum throughput of 2 per cycle while the latter has a latency of 2 cycles and a maximum throughput of 3 per cycle. (According to Instruction tables - Agner FogSince the differences are minor, I think there should not be a big performance difference arising from index recalculation. I expect it to be very unlikely to identify this difference itself to be the bottleneck in any program.

差异是mov(2d) 与lea(1d)。前者有3个周期的延迟，每个周期的最大吞吐量为2个，而后者的延迟为2个周期，每个周期的最大吞吐量为3个。（根据指令表 - Agner Fog由于差异很小，我认为应该不会因索引重新计算而产生很大的性能差异。我预计这种差异本身不太可能成为任何程序中的瓶颈。

This brings us to the next (and more interesting) point:

这将我们带到下一个（也是更有趣的）点：

... but I could image that 1D could be in CPU cache ...

...但我可以想象 1D 可能在 CPU 缓存中...

True, but 2d could be in CPU cache, too. See The Downsides: Memory localityfor an explanation why 1d is still better.

没错，但 2d 也可以在 CPU 缓存中。有关为什么 1d 仍然更好的解释，请参阅The Downsides: Memory locality。

The long answer, or why dynamic 2 dimensional data storage (pointer-to-pointer or vector-of-vector) is "bad" for simple/ small matrices.

长答案，或者为什么动态二维数据存储（指针到指针或向量向量）对于简单/小矩阵“不好” 。

Note: This is about dynamic arrays/allocation schemes [malloc/new/vector etc.]. A static 2d array is a contiguous block of memory and therefore not subject to the downsides I'm going to present here.

注意：这是关于动态数组/分配方案 [malloc/new/vector 等]。静态二维数组是一个连续的内存块，因此不受我将在此处介绍的缺点的影响。

The Problem

问题

To be able to understand why a dynamic array of dynamic arrays or a vector of vectors is most likely not the data storage pattern of choice, you are required to understand the memory layout of such structures.

为了能够理解为什么动态数组的动态数组或向量的向量很可能不是首选的数据存储模式，您需要了解此类结构的内存布局。

Example case using pointer to pointer syntax

使用指向指针语法的指针的示例案例

int main (void)
{
    // allocate memory for 4x4 integers; quick & dirty
    int ** p = new int*[4];
    for (size_t i=0; i<4; ++i) p[i] = new int[4]; 

    // do some stuff here, using p[x][y] 

    // deallocate memory
    for (size_t i=0; i<4; ++i) delete[] p[i];
    delete[] p;
}

The downsides

缺点

Memory locality

内存位置

For this “matrix” you allocate one block of four pointers and four blocks of four integers. All of the allocations are unrelatedand can therefore result in an arbitrary memory position.

对于这个“矩阵”，你分配了一个由四个指针组成的块和四个由四个整数组成的块。所有分配都是不相关的，因此可能导致任意内存位置。

The following image will give you an idea of how the memory may look like.

下图将让您了解内存的外观。

For the real 2d case:

对于真正的 2d 情况：

The violet square is the memory position occupied by pitself.
The green squares assemble the memory region ppoints to (4 x int*).
The 4 regions of 4 contiguous blue squares are the ones pointed to by each int*of the green region

紫色方块是p自己占据的内存位置。
绿色方块组装了p指向 (4 x int*)的内存区域。
4 个连续的蓝色方块的 4 个区域是每个int*绿色区域所指向的区域

For the 2d mapped on 1d case:

对于映射到 1d 案例的2d：

The green square is the only required pointer int *
The blue squares ensemble the memory region for all matrix elements (16 x int).

绿色方块是唯一需要的指针 int *
蓝色方块集成了所有矩阵元素 (16 x int)的内存区域。

This means that (when using the left layout) you will probably observe worse performance than for a contiguous storage pattern (as seen on the right), due to caching for instance.

这意味着（当使用左侧布局时）您可能会观察到比连续存储模式（如右侧所示）更糟糕的性能，例如由于缓存。

Let's say a cache line is "the amount of data transfered into the cache at once" and let's imagine a program accessing the whole matrix one element after another.

假设缓存行是“一次传输到缓存中的数据量”，让我们想象一个程序一个接一个地访问整个矩阵。

If you have a properly aligned 4 times 4 matrix of 32 bit values, a processor with a 64 byte cache line (typical value) is able to "one-shot" the data (4*4*4 = 64 bytes). If you start processing and the data isn't already in the cache you'll face a cache miss and the data will be fetched from main memory. This load can fetch the whole matrix at once since it fits into a cache line, if and only if it is contiguously stored (and properly aligned). There will probably not be any more misses while processing that data.

如果您有一个正确对齐的 32 位值的 4 乘 4 矩阵，则具有 64 字节缓存线（典型值）的处理器能够“一次性”处理数据（4*4*4 = 64 字节）。如果您开始处理并且数据不在缓存中，您将面临缓存未命中，数据将从主内存中获取。此加载可以一次获取整个矩阵，因为它适合缓存行，当且仅当它连续存储（并正确对齐）时。处理该数据时可能不会再有任何遗漏。

In case of a dynamic, "real two-dimensional" system with unrelated locations of each row/column, the processor needs to load every memory location seperately. Eventhough only 64 bytes are required, loading 4 cache lines for 4 unrelated memory positions would -in a worst case scenario- actually transfer 256 bytes and waste 75% throughput bandwidth. If you process the data using the 2d-scheme you'll again (if not already cached) face a cache miss on the first element. But now, only the first row/colum will be in the cache after the first load from main memory because all other rows are located somewhere else in memory and not adjacent to the first one. As soon as you reach a new row/column there will again be a cache miss and the next load from main memory is performed.

在动态的、“真正的二维”系统中，每行/列的位置都不相关，处理器需要单独加载每个内存位置。尽管只需要 64 个字节，但为 4 个不相关的内存位置加载 4 个缓存行 - 在最坏的情况下 - 实际上会传输 256 个字节并浪费 75% 的吞吐量带宽。如果您使用 2d-scheme 处理数据，您将再次（如果尚未缓存）在第一个元素上遇到缓存未命中。但是现在，在第一次从主内存加载后，只有第一行/列会在缓存中，因为所有其他行都位于内存中的其他位置，而不是与第一行相邻。一旦到达新的行/列，就会再次发生缓存未命中，并执行从主内存的下一次加载。

Long story short: The 2d pattern has a higher chance of cache misses with the 1d scheme offering better potential for performance due to locality of the data.

长话短说：2d 模式有更高的缓存未命中机会，1d 方案由于数据的局部性提供了更好的性能潜力。

Frequent Allocation / Deallocation

频繁分配/解除分配

As many as N + 1(4 + 1 = 5) allocations (using either new, malloc, allocator::allocate or whatever) are necessary to create the desired NxM (4×4) matrix.
The same number of proper, respective deallocation operations must be applied as well.

N + 1创建所需的 NxM (4×4) 矩阵需要多达(4 + 1 = 5) 次分配（使用 new、malloc、allocator::allocate 或其他方法）。
还必须应用相同数量的适当的相应解除分配操作。

Therefore, it is more costly to create/copy such matrices in contrast to a single allocation scheme.

因此，与单一分配方案相比，创建/复制此类矩阵的成本更高。

This is getting even worse with a growing number of rows.

随着行数的增加，情况变得更糟。

Memory consumption overhead

内存消耗开销

I'll asumme a size of 32 bits for int and 32 bits for pointers. (Note: System dependency.)

我将假设 int 的大小为 32 位，指针的大小为 32 位。（注意：系统依赖。）

Let's remember: We want to store a 4×4 int matrix which means 64 bytes.

让我们记住：我们想要存储一个 4×4 int 矩阵，这意味着 64 个字节。

For a NxM matrix, stored with the presented pointer-to-pointer scheme we consume

对于 NxM 矩阵，与我们使用的所呈现的指针到指针方案一起存储

N*M*sizeof(int)[the actual blue data] +
N*sizeof(int*)[the green pointers] +
sizeof(int**)[the violet variable p] bytes.

N*M*sizeof(int)[实际蓝色数据] +
N*sizeof(int*)[绿色指针] +
sizeof(int**)[紫色变量 p] 字节。

That makes 4*4*4 + 4*4 + 4 = 84bytes in case of the present example and it gets even worse when using std::vector<std::vector<int>>. It will require N * M * sizeof(int)+ N * sizeof(vector<int>)+ sizeof(vector<vector<int>>)bytes, that is 4*4*4 + 4*16 + 16 = 144bytes in total, intead of 64 bytes for 4 x 4 int.

4*4*4 + 4*4 + 4 = 84在本示例的情况下，这会产生字节，并且在使用std::vector<std::vector<int>>. 这将需要N * M * sizeof(int)+ N * sizeof(vector<int>)+sizeof(vector<vector<int>>)字节，也就是4*4*4 + 4*16 + 16 = 144在总字节数，这一翻译的64个字节为4×4 INT。

In addition -depending on the used allocator- each single allocation may well (and most likely will) have another 16 bytes of memory overhead. (Some “Infobytes” which store the number of allocated bytes for the purpose of proper deallocation.)

此外——取决于使用的分配器——每个单独的分配很可能（并且很可能会）有另外 16 字节的内存开销。（一些“信息字节”存储分配的字节数，以便正确释放。）

This means the worst case is:

这意味着最坏的情况是：

N*(16+M*sizeof(int)) + 16+N*sizeof(int*) + sizeof(int**)
= 4*(16+4*4) + 16+4*4 + 4 = 164 bytes ! _Overhead: 156%_

The share of the overhead will reduce as the size of the matrix grows but will still be present.

开销份额将随着矩阵大小的增加而减少，但仍会存在。

Risk of memory leaks

内存泄漏风险

The bunch of allocations requires an appropriate exception handling in order to avoid memory leaks if one of the allocations will fail! You'll need to keep track of allocated memory blocks and you must not forget them when deallocating the memory.

大量分配需要适当的异常处理，以避免在其中一个分配失败时发生内存泄漏！您需要跟踪分配的内存块，并且在释放内存时不能忘记它们。

If newruns of of memory and the next row cannot be allocated (especially likely when the matrix is very large), a std::bad_allocis thrown by new.

如果new无法分配内存运行和下一行（尤其是当矩阵非常大时），std::bad_alloc则抛出a new。

Example:

例子：

In the above mentioned new/delete example, we'll face some more code if we want to avoid leaks in case of bad_allocexceptions.

在上面提到的新建/删除示例中，如果我们想在bad_alloc异常情况下避免泄漏，我们将面临更多代码。

  // allocate memory for 4x4 integers; quick & dirty
  size_t const N = 4;
  // we don't need try for this allocation
  // if it fails there is no leak
  int ** p = new int*[N];
  size_t allocs(0U);
  try 
  { // try block doing further allocations
    for (size_t i=0; i<N; ++i) 
    {
      p[i] = new int[4]; // allocate
      ++allocs; // advance counter if no exception occured
    }
  }
  catch (std::bad_alloc & be)
  { // if an exception occurs we need to free out memory
    for (size_t i=0; i<allocs; ++i) delete[] p[i]; // free all alloced p[i]s
    delete[] p; // free p
    throw; // rethrow bad_alloc
  }
  /*
     do some stuff here, using p[x][y] 
  */
  // deallocate memory accoding to the number of allocations
  for (size_t i=0; i<allocs; ++i) delete[] p[i];
  delete[] p;

Summary

概括

There are cases where "real 2d" memory layouts fit and make sense (i.e. if the number of columns per row is not constant) but in the most simple and common 2D data storage cases they just bloat the complexity of your code and reduce the performance and memory efficiency of your program.

在某些情况下，“真正的 2d”内存布局适合且有意义（即，如果每行的列数不是恒定的），但在最简单和常见的 2D 数据存储情况下，它们只会增加代码的复杂性并降低性能和程序的内存效率。

Alternative

选择

You should use a contiguous block of memory and map your rows onto that block.

您应该使用一个连续的内存块并将您的行映射到该块上。

The "C++ way" of doing it is probably to write a class that manages your memory while considering important things like

这样做的“C++ 方式”可能是编写一个类来管理您的内存，同时考虑诸如

Example

例子

To provide an idea of how such a class may look like, here's a simple example with some basic features:

为了让您了解此类类的外观，这里有一个具有一些基本功能的简单示例：

2d-size-constructible
2d-resizable
operator(size_t, size_t)for 2d- row major element access
at(size_t, size_t)for checked 2d-row major element access
Fulfills Concept requirements for Container

二维尺寸可建造
二维可调整大小
operator(size_t, size_t)用于二维行主要元素访问
at(size_t, size_t)用于检查的二维行主要元素访问
满足容器的概念要求

Source:

来源：

#include <vector>
#include <algorithm>
#include <iterator>
#include <utility>

namespace matrices
{

  template<class T>
  class simple
  {
  public:
    // misc types
    using data_type  = std::vector<T>;
    using value_type = typename std::vector<T>::value_type;
    using size_type  = typename std::vector<T>::size_type;
    // ref
    using reference       = typename std::vector<T>::reference;
    using const_reference = typename std::vector<T>::const_reference;
    // iter
    using iterator       = typename std::vector<T>::iterator;
    using const_iterator = typename std::vector<T>::const_iterator;
    // reverse iter
    using reverse_iterator       = typename std::vector<T>::reverse_iterator;
    using const_reverse_iterator = typename std::vector<T>::const_reverse_iterator;

    // empty construction
    simple() = default;

    // default-insert rows*cols values
    simple(size_type rows, size_type cols)
      : m_rows(rows), m_cols(cols), m_data(rows*cols)
    {}

    // copy initialized matrix rows*cols
    simple(size_type rows, size_type cols, const_reference val)
      : m_rows(rows), m_cols(cols), m_data(rows*cols, val)
    {}

    // 1d-iterators

    iterator begin() { return m_data.begin(); }
    iterator end() { return m_data.end(); }
    const_iterator begin() const { return m_data.begin(); }
    const_iterator end() const { return m_data.end(); }
    const_iterator cbegin() const { return m_data.cbegin(); }
    const_iterator cend() const { return m_data.cend(); }
    reverse_iterator rbegin() { return m_data.rbegin(); }
    reverse_iterator rend() { return m_data.rend(); }
    const_reverse_iterator rbegin() const { return m_data.rbegin(); }
    const_reverse_iterator rend() const { return m_data.rend(); }
    const_reverse_iterator crbegin() const { return m_data.crbegin(); }
    const_reverse_iterator crend() const { return m_data.crend(); }

    // element access (row major indexation)
    reference operator() (size_type const row,
      size_type const column)
    {
      return m_data[m_cols*row + column];
    }
    const_reference operator() (size_type const row,
      size_type const column) const
    {
      return m_data[m_cols*row + column];
    }
    reference at() (size_type const row, size_type const column)
    {
      return m_data.at(m_cols*row + column);
    }
    const_reference at() (size_type const row, size_type const column) const
    {
      return m_data.at(m_cols*row + column);
    }

    // resizing
    void resize(size_type new_rows, size_type new_cols)
    {
      // new matrix new_rows times new_cols
      simple tmp(new_rows, new_cols);
      // select smaller row and col size
      auto mc = std::min(m_cols, new_cols);
      auto mr = std::min(m_rows, new_rows);
      for (size_type i(0U); i < mr; ++i)
      {
        // iterators to begin of rows
        auto row = begin() + i*m_cols;
        auto tmp_row = tmp.begin() + i*new_cols;
        // move mc elements to tmp
        std::move(row, row + mc, tmp_row);
      }
      // move assignment to this
      *this = std::move(tmp);
    }

    // size and capacity
    size_type size() const { return m_data.size(); }
    size_type max_size() const { return m_data.max_size(); }
    bool empty() const { return m_data.empty(); }
    // dimensionality
    size_type rows() const { return m_rows; }
    size_type cols() const { return m_cols; }
    // data swapping
    void swap(simple &rhs)
    {
      using std::swap;
      m_data.swap(rhs.m_data);
      swap(m_rows, rhs.m_rows);
      swap(m_cols, rhs.m_cols);
    }
  private:
    // content
    size_type m_rows{ 0u };
    size_type m_cols{ 0u };
    data_type m_data{};
  };
  template<class T>
  void swap(simple<T> & lhs, simple<T> & rhs)
  {
    lhs.swap(rhs);
  }
  template<class T>
  bool operator== (simple<T> const &a, simple<T> const &b)
  {
    if (a.rows() != b.rows() || a.cols() != b.cols())
    {
      return false;
    }
    return std::equal(a.begin(), a.end(), b.begin(), b.end());
  }
  template<class T>
  bool operator!= (simple<T> const &a, simple<T> const &b)
  {
    return !(a == b);
  }

}

Note several things here:

请注意以下几点：

Tneeds to fulfill the requirements of the used std::vectormember functions
operator()doesn't do any "of of range" checks
No need to manage data on your own
No destructor, copy constructor or assignment operators required

T需要满足使用的std::vector成员函数的要求
operator()不做任何“范围内”检查
无需自己管理数据
不需要析构函数、复制构造函数或赋值运算符

So you don't have to bother about proper memory handling for each application but only once for the class you write.

因此，您不必为每个应用程序进行适当的内存处理，而只需为您编写的类处理一次。

Restrictions

限制

There may be cases where a dynamic "real" two dimensional structure is favourable. This is for instance the case if

在某些情况下，动态“真实”二维结构可能是有利的。例如，如果

the matrix is very large and sparse (if any of the rows do not even need to be allocated but can be handled using a nullptr) or if
the rows do not have the same number of columns (that is if you don't have a matrix at all but another two-dimensional construct).

矩阵非常大且稀疏（如果任何行甚至不需要分配但可以使用 nullptr 处理）或者如果
行的列数不同（也就是说，如果您根本没有矩阵，只有另一个二维结构）。

Answer 2

回答by Konrad Rudolph

Unlessyou are talking about static arrays, 1D is faster.

除非您在谈论静态数组，否则1D 更快。

Here's the memory layout of a 1D array (std::vector<T>):

这是一维数组 ( std::vector<T>)的内存布局：

+---+---+---+---+---+---+---+---+---+
|   |   |   |   |   |   |   |   |   |
+---+---+---+---+---+---+---+---+---+

And here's the same for a dynamic 2D array (std::vector<std::vector<T>>):

动态二维数组 ( std::vector<std::vector<T>>) 也是如此：

+---+---+---+
| * | * | * |
+-|-+-|-+-|-+
  |   |   V
  |   | +---+---+---+
  |   | |   |   |   |
  |   | +---+---+---+
  |   V
  | +---+---+---+
  | |   |   |   |
  | +---+---+---+
  V
+---+---+---+
|   |   |   |
+---+---+---+

Clearly the 2D case loses the cache locality and uses more memory. It also introduces an extra indirection (and thus an extra pointer to follow) but the first array has the overhead of calculating the indices so these even out more or less.

显然，2D 情况会丢失缓存位置并使用更多内存。它还引入了一个额外的间接性（因此需要一个额外的指针），但第一个数组具有计算索引的开销，因此这些索引或多或少都是一致的。

Answer 3

回答by Mohamad Ali Baydoun

1D and 2D Static Arrays

一维和二维静态数组

Size:Both will require the same amount of memory.
Speed:You can assume that there will be no speed difference because the memory for both of these arrays should be contiguous (The whole 2D array should appear as one chunk in memory rather than a bunch of chunks spread across memory). (This could be compiler dependent however.)

大小：两者都需要相同数量的内存。
速度：您可以假设没有速度差异，因为这两个数组的内存应该是连续的（整个 2D 数组应该在内存中显示为一个块，而不是一堆分布在内存中的块）。（但这可能取决于编译器。）

1D and 2D Dynamic Arrays

一维和二维动态阵列

Size:The 2D array will require a tiny bit more memory than the 1D array because of the pointers needed in the 2D array to point to the set of allocated 1D arrays. (This tiny bit is only tiny when we're talking about really big arrays. For small arrays, the tiny bit could be pretty big relatively speaking.)
Speed:The 1D array may be faster than the 2D array because the memory for the 2D array would not be contiguous, so cache misses would become a problem.

大小：二维数组需要比一维数组多一点的内存，因为二维数组中需要指针来指向分配的一维数组集。（当我们谈论真正的大阵列时，这个微小的位只是很小的。对于小阵列，相对而言，微小的位可能相当大。）
速度：一维数组可能比二维数组快，因为二维数组的内存不会是连续的，所以缓存未命中会成为一个问题。

Use what works and seems most logical, and if you face speed problems, then refactor.

使用有效且看起来最合乎逻辑的方法，如果遇到速度问题，请重构。

Answer 4

回答by M.M

The existing answers all only compare 1-D arrays against arrays of pointers.

现有的答案都只将一维数组与指针数组进行比较。

In C (but not C++) there is a third option; you can have a contiguous 2-D array that is dynamically allocated and has runtime dimensions:

在 C（但不是 C++）中有第三种选择；您可以拥有一个动态分配并具有运行时维度的连续二维数组：

int (*p)[num_columns] = malloc(num_rows * sizeof *p);

and this is accessed like p[row_index][col_index].

这可以像p[row_index][col_index].

I would expect this to have very similar performance to the 1-D array case, but it gives you nicer syntax for accessing the cells.

我希望这与一维数组情况具有非常相似的性能，但它为您提供了更好的访问单元格的语法。

In C++ you can achieve something similar by defining a class which maintains a 1-D array internally, but can expose it via 2-D array access syntax using overloaded operators. Again I would expect that to have similar or identical performance to the plain 1-D array.

在 C++ 中，您可以通过定义一个在内部维护一维数组的类来实现类似的功能，但可以使用重载运算符通过二维数组访问语法公开它。我再次希望它具有与普通一维数组相似或相同的性能。

Answer 5

回答by Polymorphism

Another difference of 1D and 2D arrays appears in memory allocation. We cannot be sure that members of 2D array be sequental.

一维和二维数组的另一个区别出现在内存分配上。我们不能确定二维数组的成员是连续的。

Answer 6

回答by cup

It really depends on how your 2D array is implemented.

这实际上取决于您的二维数组是如何实现的。

consider the code below:

考虑下面的代码：

int a[200], b[10][20], *c[10], *d[10];
for (ii = 0; ii < 10; ++ii)
{
   c[ii] = &b[ii][0];
   d[ii] = (int*) malloc(20 * sizeof(int));    // The cast for C++ only.
}

There are 3 implementations here: b, c and d

这里有 3 个实现：b、c 和 d

There won't be a lot of difference accessing b[x][y]or a[x*20 + y], since one is you doing the computation and the other is the compiler doing it for you. c[x][y]and d[x][y]are slower, because the machine has to find the address that c[x]points to and then access the yth element from there. It is not one straight computation. On some machines (eg AS400 which has 36 byte (not bit) pointers), pointer access is extremely slow. It all depends on the architecture in use. On x86 type architectures, a and b are the same speed, c and d are slower than b.

访问b[x][y]or不会有太大区别a[x*20 + y]，因为一个是你在做计算，另一个是编译器为你做。 c[x][y]并且d[x][y]更慢，因为机器必须找到c[x]指向的地址，然后从那里访问第 y 个元素。这不是一种直接的计算。在某些机器上（例如 AS400，它有 36 字节（不是位）指针），指针访问非常慢。这一切都取决于所使用的架构。在 x86 类型架构上，a 和 b 的速度相同，c 和 d 比 b 慢。

Answer 7

回答by Adam Erickson

I love the thorough answer provided by Pixelchemist. A simpler version of this solution may be as follows. First, declare the dimensions:

我喜欢Pixelchemist提供的详尽答案。此解决方案的更简单版本可能如下所示。首先，声明维度：

constexpr int M = 16; // rows
constexpr int N = 16; // columns
constexpr int P = 16; // planes

Next, create an alias and, get and set methods:

接下来，创建别名和、get 和 set 方法：

template<typename T>
using Vector = std::vector<T>;

template<typename T>
inline T& set_elem(vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
    // check indexes here...
    return m_[i_*N*P + j_*P + k_];
}

template<typename T>
inline const T& get_elem(const vector<T>& m_, size_t i_, size_t j_, size_t k_)
{
    // check indexes here...
    return m_[i_*N*P + j_*P + k_];
}

Finally, a vector may be created and indexed as follows:

最后，可以按如下方式创建和索引向量：

Vector array3d(M*N*P, 0);            // create 3-d array containing M*N*P zero ints
set_elem(array3d, 0, 0, 1) = 5;      // array3d[0][0][1] = 5
auto n = get_elem(array3d, 0, 0, 1); // n = 5

Defining the vector size at initialization provides optimal performance. This solution is modified from this answer. The functions may be overloaded to support varying dimensions with a single vector. The downside of this solution is that the M, N, P parameters are implicitly passed to the get and set functions. This can be resolved by implementing the solution within a class, as done by Pixelchemist.

在初始化时定义向量大小可提供最佳性能。这个解决方案是从这个答案修改而来的。这些函数可能会被重载以支持单个向量的不同维度。此解决方案的缺点是 M、N、P 参数被隐式传递给 get 和 set 函数。这可以通过在类中实现解决方案来解决，如Pixelchemist所做的那样。