C语言 效率:数组 vs 指针

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2305770/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 04:33:47  来源:igfitidea点击:

Efficiency: arrays vs pointers

carrayspointersperformancememory-access

提问by Abhijith Madhav

Memory access through pointers is said to be more efficient than memory access through an array. I am learning C and the above is stated in K&R. Specifically they say

据说通过指针访问内存比通过数组访问内存更有效。我正在学习 C,以上内容在 K&R 中有说明。他们具体说

Any operation that can be achieved by array subscripting can also be done with pointers. The pointer version will in general be faster

任何可以通过数组下标实现的操作,也可以用指针来完成。指针版本通常会更快

I dis-assembled the following code using visual C++.(Mine is a 686 processor. I have disabled all optimizations.)

我使用 Visual C++ 反汇编了以下代码。(我的是 686 处理器。我已禁用所有优化。)

int a[10], *p = a, temp;

void foo()
{
    temp = a[0];
    temp = *p;
}

To my surprise I see that memory access through a pointer takes 3 instructions to the two taken by memory access through an array. Below is the corresponding code.

令我惊讶的是,我发现通过指针访问内存需要 3 条指令,而通过数组访问内存需要 3 条指令。下面是对应的代码。

; 5    : temp = a[0];

    mov eax, DWORD PTR _a
    mov DWORD PTR _temp, eax

; 6    : temp = *p;

    mov eax, DWORD PTR _p
    mov ecx, DWORD PTR [eax]
    mov DWORD PTR _temp, ecx

Please help me understand. What am I missing here??

请帮我理解。我在这里错过了什么?



As pointed out by many answers and comments I had used a compile time constant as the array index thus making it arguably easier for the access through an array. Below is the assembly code with a variable as the index. I now have equal number of instructions for access through pointer and arrays. My broader questions still holds good. The memory access through a pointer is not lending itself as being more efficient.

正如许多答案和评论所指出的那样,我使用编译时常量作为数组索引,因此可以说更容易通过数组进行访问。下面是以变量为索引的汇编代码。我现在有相同数量的指令用于通过指针和数组进行访问。我的更广泛的问题仍然有效。通过指针访问内存并不能提高效率。

; 7    :        temp = a[i];

    mov eax, DWORD PTR _i
    mov ecx, DWORD PTR _a[eax*4]
    mov DWORD PTR _temp, ecx

; 8    : 
; 9    :    
; 10   :        temp = *p;

    mov eax, DWORD PTR _p
    mov ecx, DWORD PTR [eax]
    mov DWORD PTR _temp, ecx

回答by paxdiablo

Memory access through pointers is said to be more efficient than memory access through an array.

据说通过指针访问内存比通过数组访问内存更有效。

That may have been true in the past when compilers were relatively stupid beasts. You only need to look at some of the code output by gccin high optimisation modes to know that it is no longer true. Some of that code is very hard to understand but, once you do, its brilliance is evident.

在过去,当编译器是相对愚蠢的野兽时,这可能是正确的。你只需要看看gcc高优化模式下输出的一些代码就知道它不再是真的了。其中一些代码很难理解,但是一旦你理解了,它的魅力就显而易见了。

A decent compiler will generate the same code for pointer accesses and array accesses and you should probably not be worrying about that level of performance. The people that write compilers know far more about their target architectures than we mere mortals. Concentrate more on the macro level when optimising your code (algorithm selection and so on) and trust in your tool-makers to do their job.

一个体面的编译器将为指针访问和数组访问生成相同的代码,您可能不必担心该级别的性能。编写编译器的人比我们普通人更了解他们的目标架构。在优化您的代码(算法选择等)时更多地关注宏观层面,并信任您的工具制造商来完成他们的工作。



In fact, I'm surprised the compiler didn't optimise the entire

事实上,我很惊讶编译器没有优化整个

temp = a[0];

line out of existence since tempis over-written in the very next line with a different value and ais in no way marked volatile.

行不存在,因为temp在下一行用不同的值覆盖并且a没有任何标记volatile

I remember an urban myth from long ago about a benchmark for the latest VAX Fortran compiler (showing my age here) that outperformed its competitors by several orders of magnitude.

我记得很久以前有一个关于最新 VAX Fortran 编译器的基准测试的城市神话(这里显示了我的年龄),它的性能比竞争对手高出几个数量级。

Turns out the compiler figured out that the result from the benchmark calculation wasn't used anywhere so it optimised the entire calculation loop into oblivion. Hence the substantial improvement in run speed.

结果编译器发现基准计算的结果没有在任何地方使用,所以它优化了整个计算循环。因此,运行速度的显着提高。



Update:The reason that optimised code is more efficient in your particular case is because of the way you find the location. awill be at a fixed location decided at link/load time and the reference to it will be fixed up at the same time. So a[0]or indeed a[any constant]will be at a fixed location.

更新:优化代码在您的特定情况下更有效的原因是因为您找到位置的方式。a将在链接/加载时间决定的固定位置,并且将同时修复对其的引用。所以a[0]或确实a[any constant]会在一个固定的位置。

And pitself will also be at a fixed location for the same reason. But*p(the contents of p) is variable and therefore will have an extra lookup involved to find the correct memory location.

p出于同样的原因,它本身也会在一个固定的位置。但是*p( 的内容p)是可变的,因此需要额外的查找来找到正确的内存位置。

You'll probably find that having yet another variable xset to 0 (not const) and using a[x]would also introduce extra calculations.

您可能会发现将另一个变量x设置为 0(不是const)并使用a[x]也会引入额外的计算。



In one of your comments, you state:

在您的评论之一中,您声明:

Doing as you suggested resulted in 3 instructions for memory access through arrays too (fetch index, fetch value of array element, store in temp). But I am still unable to see the efficiency. :-(

按照您的建议进行操作也导致了 3 条通过数组访问内存的指令(获取索引、获取数组元素的值、存储在临时文件中)。但我仍然无法看到效率。:-(

My response to that is that you very likely won'tsee an efficiency in using pointers. Modern compilers are more than up to the task of figuring out that array operations and pointer operations can be turned into the same underlying machine code.

我对此的回应是,您很可能不会看到使用指针的效率。现代编译器的任务不仅仅是弄清楚数组操作和指针操作可以转换成相同的底层机器代码。

In fact, without optimisation turned on, pointer code can be lessefficient. Consider the following translations:

事实上,如果没有开启优化,指针代码的效率会降低。考虑以下翻译:

int *pa, i, a[10];

for (i = 0; i < 10; i++)
    a[i] = 100;
/*
    movl    
    xorl    %eax, %eax
L5:
    movl    0, %edx
    movl    %edx, -56(%ebp,%eax,4)
    incl    %eax
    cmpl    , %eax
    jle     L5
, -16(%ebp) ; this is i, init to 0 L2: cmpl , -16(%ebp) ; from 0 to 9 jg L3 movl -16(%ebp), %eax ; load i into register movl 0, -72(%ebp,%eax,4) ; store 100 based on array/i leal -16(%ebp), %eax ; get address of i incl (%eax) ; increment jmp L2 ; and loop L3: */ for (pa = a; pa < a + 10; pa++) *pa = 100; /* leal -72(%ebp), %eax movl %eax, -12(%ebp) ; this is pa, init to &a[0] L5: leal -72(%ebp), %eax addl , %eax cmpl -12(%ebp), %eax ; is pa at &(a[10]) jbe L6 ; yes, stop movl -12(%ebp), %eax ; get pa movl 0, (%eax) ; store 100 leal -12(%ebp), %eax ; get pa addl , (%eax) ; add 4 (sizeof int) jmp L5 ; loop around L6: */

From that example, you can actually see that the pointer example is longer, and unnecessarily so. It loads painto %eaxmultiple times without it changing and indeed alternates %eaxbetween paand &(a[10]). The default optimisation here is basically none at all.

从该示例中,您实际上可以看到指针示例更长,并且不必要地更长。它加载pa%eax多次,没有它改变确实交替%eax之间pa&(a[10])。这里的默认优化基本上是没有的。

When you switch up to optimisation level 2, the code you get is:

当你切换到优化级别 2 时,你得到的代码是:

    leal    -56(%ebp), %eax
    leal    -16(%ebp), %edx
    jmp     L14
L16:
    movl    0, (%eax)
    addl    , %eax
L14:
    cmpl    %eax, %edx
    ja      L16

for the array version, and:

对于阵列版本,以及:

getint(pn)
int *pn;
{
    ...
}

for the pointer version.

对于指针版本。

I'm not going to do an analysis on clock cycles here (since it's too much work and I'm basically lazy) but I will point out one thing. There's not a huge difference in the code for both versions in terms of assembler instructions and, given the speeds that modern CPUs actually run at, you won't notice a difference unless you're doing billionsof these operations. I always tend to prefer writing code for readability and only worrying about performance if it becomes an issue.

我不打算在这里对时钟周期进行分析(因为工作量太大而且我基本上很懒)但我会指出一件事。两个版本的代码在汇编指令方面没有太大区别,而且鉴于现代 CPU 的实际运行速度,除非您执行数十亿次这样的操作,否则您不会注意到差异。我总是倾向于为了可读性而编写代码,并且只在它成为问题时才担心性能。

As an aside, that statement you reference:

顺便说一句,您引用的那句话是:

5.3 Pointers and Arrays: The pointer version will in general be faster but, at least to the uninitiated, somewhat harder to grasp immediately.

5.3 指针和数组:指针版本通常会更快,但至少对于初学者来说,有点难以立即掌握。

dates back to the earliest versions of K&R, including my ancient 1978 one where functions are still written:

可以追溯到 K&R 的最早版本,包括我 1978 年的古老版本,其中仍然编写函数:

struct bar a[10], *p;

void foo()
{
    int i;

    // slow loop
    for (i = 0; i < 10; ++i)
        printf( a[i].value);

    // faster loop
    for (p = a; p < &a[10]; ++p)
        printf( p->value);
}

Compilers have come an awfully long way since back then.

从那时起,编译器已经走了很长一段路。

回答by tomlogic

If you're programming embedded platforms, you quickly learn that the pointer method is a lot faster than using an index.

如果您正在对嵌入式平台进行编程,您很快就会了解到指针方法比使用索引要快得多。

int a[10],b[10],c[10];
int *pa=a, *pb=b, *pc=c;
int i;

// fill a and b.
fill_arrays(a,b);

// set c[i] = a[i]+b[i];
for (i = 0; i<10; i++)
{
   c[i] = a[i] + b[i];
}

// set *pc++ = *pa++ + *pb++;
for (i = 0; i<10; i++)
{
   *pc++ = *pa++ + *pb++;
}

The slow loop has to calculate a + (i * sizeof(struct bar)) each time through, whereas the second just has to add sizeof(struct bar) to p each time through. The multiply operation uses more clock cycles than the add on many processors.

慢速循环每次都必须计算一个 + (i * sizeof(struct bar)),而第二个每次都必须将 sizeof(struct bar) 添加到 p。在许多处理器上,乘法运算比加法运算使用更多的时钟周期。

You really start to see improvements if you reference a[i] multiple times inside the loop. Some compilers don't cache that address, so it may be recalculated multiple times inside the loop.

如果您在循环中多次引用 a[i],您就会真正开始看到改进。某些编译器不缓存该地址,因此可能会在循环内多次重新计算。

Try updating your sample to use a struct and reference multiple elements.

尝试更新您的示例以使用结构并引用多个元素。

回答by Alexander Gessler

In the first case, the compiler directly knows the address of the array (which is also the address of the first element) and accesses it. In the second case, he knows the address of the pointer and reads the pointer's value, which points to that memory location. That's actually one additional indirection, so it's presumably slower here.

在第一种情况下,编译器直接知道数组的地址(也是第一个元素的地址)并访问它。在第二种情况下,他知道指针的地址并读取指向该内存位置的指针值。这实际上是一个额外的间接访问,所以这里可能会更慢。

回答by Wim ten Brink

The speed is gained in loops, most of all. When you use an array, you would use a counter which you increment. To calculate the position, the system multiplies this counter with the size of the array element, then adds the address of the first element to get the address. With pointers, all you need to do to go to the next element is to increase the current pointer with the size of the element to get the next one, assuming all elements are next to each other in-memory.

速度是在循环中获得的,最重要的是。当您使用数组时,您将使用一个递增的计数器。为了计算位置,系统将这个计数器乘以数组元素的大小,然后加上第一个元素的地址得到地址。使用指针,您需要做的就是使用元素的大小增加当前指针以获得下一个元素,假设所有元素在内存中彼此相邻。

Pointer arithmetic thus takes a bit less calculations when doing loops. Also, having pointers to the right element is faster than using an index within an array.

因此,在执行循环时,指针算术需要的计算更少。此外,使用指向正确元素的指针比在数组中使用索引更快。

Modern development is slowly getting rid of many pointer operations, though. Processors are getting faster and faster and arrays are easier to manage than pointers. Also, arrays tend to reduce the amount of bugs in code. Array will allow index checks, making sure you're not accessing data outside the array.

不过,现代开发正在慢慢摆脱许多指针操作。处理器越来越快,数组比指针更容易管理。此外,数组倾向于减少代码中的错误数量。数组将允许索引检查,确保您没有访问数组外的数据。

回答by Yousf

As paxdiablo said, Any new compiler will make them very similar.

正如paxdiablo所说,任何新的编译器都会使它们非常相似。

Even more, I saw situations where array was faster then pointers. This was on a DSP processor which uses vector operations.

更重要的是,我看到了数组比指针快的情况。这是在使用矢量运算的 DSP 处理器上。

In this case, using arrays was similar to using restrictpointers. Because by using two arrays the compiler -implicitly- knows that they don't point to the same location. But if you deal with 2 pointer, the compiler may think that they point to same location and will skip pipe lining.

在这种情况下,使用数组类似于使用限制指针。因为通过使用两个数组,编译器隐式地知道它们没有指向同一个位置。但是如果你处理 2 个指针,编译器可能会认为它们指向相同的位置并且会跳过管道。

for example:

例如:

##代码##

In case 1, the compiler will easily do pipe-lining of adding a and b, and storing value to c.

在第 1 种情况下,编译器将轻松执行将 a 和 b 相加并将值存储到 c 的流水线操作。

In case 2, the compiler will not pipe-line, because he might be overwriting a or b while saving to C.

在情况 2 中,编译器不会进行流水线操作,因为他可能会在保存到 C 时覆盖 a 或 b。

回答by DigitalRoss

Pointers naturally express simple induction variables while subscripts somewhat intrinsically require more sophisticated compiler optimizations

指针自然地表达简单的归纳变量,而下标本质上需要更复杂的编译器优化



In many cases just using a subscripted expression requires that an extra layer be added to the problem. A loop that increments a subscript ican be though of as a state machine, and the expression a[i]technically requires, each time it is used, that ibe multiplied times the size of each element and added to the base address.

在许多情况下,仅使用带下标的表达式就需要向问题添加额外的层。一个增加下标i 的循环可以被看作是一个状态机,并且表达式a[i] 在技术上要求,每次使用它时,i乘以每个元素的大小并添加到基地址。

In order to transform that access pattern to use pointers the compiler must analyze the entire loop and determine that, say, each element is being accessed. Then the compiler can replace the multiple instances of multiplying the subscript by the element size with a simple increment of the previous loop value. This process combines optimizations called common subexpression eliminationand induction variable strength reduction.

为了将该访问模式转换为使用指针,编译器必须分析整个循环并确定每个元素都被访问。然后编译器可以用前一个循环值的简单增量替换下标乘以元素大小的多个实例。这个过程结合了称为公共子表达式消除归纳变量强度减少的优化

When writing with pointers, the entire optimization process is not necessary because the programmer will typically just step through the array to start with.

使用指针写入时,整个优化过程不是必需的,因为程序员通常只是从数组开始。

Sometimes the compiler can do the optimization and sometimes it can't. It's more common in recent years to have a sophisticated compiler at hand, so pointer-based code is not always faster.

有时编译器可以进行优化,有时则不能。近年来,手头有一个复杂的编译器变得更加普遍,因此基于指针的代码并不总是更快

Because arrrays must usually be contiguous, another advantage for pointers is in creating incrementally allocated composite structures.

因为数组通常必须是连续的,指针的另一个优点是创建增量分配的复合结构。

回答by RcnRcf

This is a very old question and has been answered, as such I do not need to answer! However, I didn't notice a simple answer so am providing one.

这是一个很老的问题,已经回答了,所以我不需要回答!但是,我没有注意到一个简单的答案,所以我提供了一个。

ANSWER: An indirect access (pointer/array) "might" add one additional instruction to load the (base) address, but all accesses after that (elements in case of array / members in case of pointer to struct) should be just one instruction because it is mere addition of an offset to the (base) address that is already loaded. Thus, in a way it is going to be as good as direct access. As such, in majority of the cases, access through array/pointer is equivalent and element accesses are also as good as a direct access to a variable.

答案:间接访问(指针/数组)“可能”添加一条额外的指令来加载(基)地址,但之后的所有访问(数组中的元素/结构指针中的成员)应该只是一条指令因为它只是向已经加载的(基)地址添加一个偏移量。因此,在某种程度上,它将与直接访问一样好。因此,在大多数情况下,通过数组/指针访问是等效的,元素访问也与直接访问变量一样好。

Ex. if I have an array (or pointer) with 10 elements or a struct with 10 members (accessed through a pointer to the struct), and I am accessing an element/member, the one possible additional instruction is required only once at the beginning. All the element/member accesses should be just one instruction after that.

前任。如果我有一个包含 10 个元素的数组(或指针)或一个包含 10 个成员的结构(通过指向结构的指针访问),并且我正在访问一个元素/成员,则在开始时只需要一次可能的附加指令。所有元素/成员访问应该只是之后的一条指令。

回答by Mike Dunlavey

You're getting good answers to your question here, but since you are learning, it is worth pointing out that efficiencies at that level are seldom noticeable.

在这里,您的问题得到了很好的答案,但由于您正在学习,因此值得指出的是,该级别的效率很少引起注意。

When you are tuning a program for maximum performance, you should give at least as much attention to finding and fixing larger issues in the structure of the program. After those have been fixed, low-level optimizations can make a further difference.

当您调整程序以获得最大性能时,您至少应该尽可能多地注意查找和修复程序结构中的较大问题。修复这些问题后,低级优化可以产生进一步的影响。

Here's an example of how this can be done.

这是如何做到这一点的示例。

回答by John Knoeller

Pointers used to befaster than arrays. Certainly back when the C language was designed pointers were quite a bit faster. But these days, optimizers can usually do a better job optimizing arrays than it can with pointers because arrays are more restricted.

指针曾经比数组更快。当然,在设计 C 语言时,指针的速度要快得多。但是现在,优化器通常可以比使用指针更好地优化数组,因为数组受到更多限制。

Instruction sets of modern processors have also been designed to help optimize array access.

现代处理器的指令集也旨在帮助优化阵列访问。

So the bottom line is that arrays are often faster these days, especially when used in loops with index variables.

所以最重要的是,现在数组通常更快,尤其是在带有索引变量的循环中使用时。

Of course you would still want to use pointers for things like linked lists, but the old time optimization of walking a pointer through an array rather than using an index variable is now likely to be a dis-optimization.

当然,您仍然希望将指针用于链接列表之类的东西,但是过去通过数组遍历指针而不是使用索引变量的优化现在可能是一种非优化。

回答by Vlad

"The pointer version will in general be faster" means that in most cases it's easier for the compiler to generate more efficient code having a pointer (which just needs to be dereferenced) than having an array and subscript (which means that the compiler needs to shift the address from the start of the array). With the modern processors and optimizing compilers, however, array access in the typical case is not slower than pointer access.

“指针版本通常会更快”意味着在大多数情况下,编译器生成具有指针(只需要取消引用)的代码比具有数组和下标(这意味着编译器需要从数组的开头移动地址)。然而,使用现代处理器和优化编译器,典型情况下的数组访问并不比指针访问慢。

Specifically in your case, you would need to switch on the optimization, in order to get the same result.

特别是在您的情况下,您需要打开优化,以获得相同的结果。