C语言 为什么这个 for 循环在某些平台上退出而在其他平台上不退出?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31016660/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does this for loop exit on some platforms and not on others?
提问by JonCav
I have recently started to learn C and I am taking a class with C as the subject. I'm currently playing around with loops and I'm running into some odd behaviour which I don't know how to explain.
我最近开始学习 C,我正在上一门以 C 为主题的课程。我目前正在玩循环,我遇到了一些我不知道如何解释的奇怪行为。
#include <stdio.h>
int main()
{
int array[10],i;
for (i = 0; i <=10 ; i++)
{
array[i]=0; /*code should never terminate*/
printf("test \n");
}
printf("%d \n", sizeof(array)/sizeof(int));
return 0;
}
On my laptop running Ubuntu 14.04, this code does not break. It runs to completion. On my school's computer running CentOS 6.6, it also runs fine. On Windows 8.1, the loop never terminates.
在我运行 Ubuntu 14.04 的笔记本电脑上,此代码不会中断。它运行到完成。在我学校运行 CentOS 6.6 的计算机上,它也运行良好。在 Windows 8.1 上,循环永远不会终止。
What's even more strange is that when I edit the condition of the forloop to: i <= 11, the code only terminates on my laptop running Ubuntu. It never terminates in CentOS and Windows.
更奇怪的是,当我将for循环的条件编辑为: 时i <= 11,代码仅在运行 Ubuntu 的笔记本电脑上终止。它永远不会在 CentOS 和 Windows 中终止。
Can anyone explain what's happening in the memory and why the different OSes running the same code give different outcomes?
谁能解释一下内存中发生了什么以及为什么运行相同代码的不同操作系统会产生不同的结果?
EDIT: I know the for loop goes out of bounds. I'm doing it intentionally. I just can't figure out how the behaviour can be different across different OSes and computers.
编辑:我知道 for 循环越界。我是故意这样做的。我只是无法弄清楚不同操作系统和计算机的行为有何不同。
回答by QuestionC
On my laptop running Ubuntu 14.04, this code does not break it runs to completion. On my school's computer running CentOS 6.6, it also runs fine. On Windows 8.1, the loop never terminates.
What is more strange is when I edit the conditional of the
forloop to:i <= 11, the code only terminates on my laptop running Ubuntu. CentOS and Windows never terminates.
在我运行 Ubuntu 14.04 的笔记本电脑上,此代码不会中断它的运行直至完成。在我学校运行 CentOS 6.6 的计算机上,它也运行良好。在 Windows 8.1 上,循环永远不会终止。
更奇怪的是,当我将
for循环的条件编辑为: 时i <= 11,代码仅在运行 Ubuntu 的笔记本电脑上终止。CentOS 和 Windows 永远不会终止。
You've just discovered memory stomping. You can read more about it here: What is a “memory stomp”?
你刚刚发现了内存跺脚。您可以在此处阅读更多相关信息:什么是“记忆踩踏”?
When you allocate int array[10],i;, those variables go into memory (specifically, they're allocated on the stack, which is a block of memory associated with the function). array[]and iare probably adjacent to each other in memory. It seems that on Windows 8.1, iis located at array[10]. On CentOS, iis located at array[11]. And on Ubuntu, it's in neither spot (maybe it's at array[-1]?).
当您分配时int array[10],i;,这些变量进入内存(具体来说,它们被分配在堆栈上,这是与函数关联的内存块)。 array[]并且i可能在内存中彼此相邻。似乎在 Windows 8.1 上,i位于array[10]. 在 CentOS 上,i位于array[11]. 而在 Ubuntu 上,它不在两个位置(也许在array[-1]?)。
Try adding these debugging statements to your code. You should notice that on iteration 10 or 11, array[i]points at i.
尝试将这些调试语句添加到您的代码中。您应该注意到,在迭代 10 或 11 中,array[i]指向i.
#include <stdio.h>
int main()
{
int array[10],i;
printf ("array: %p, &i: %p\n", array, &i);
printf ("i is offset %d from array\n", &i - array);
for (i = 0; i <=11 ; i++)
{
printf ("%d: Writing 0 to address %p\n", i, &array[i]);
array[i]=0; /*code should never terminate*/
}
return 0;
}
回答by o11c
The bug lies between these pieces of code:
错误位于以下代码段之间:
int array[10],i;
for (i = 0; i <=10 ; i++)
array[i]=0;
Since arrayonly has 10 elements, in the last iteration array[10] = 0;is a buffer overflow. Buffer overflows are UNDEFINED BEHAVIOR, which means they might format your hard drive or cause demons to fly out of your nose.
由于array只有 10 个元素,因此在最后一次迭代中array[10] = 0;是缓冲区溢出。缓冲区溢出是UNDEFINED BEHAVIOR,这意味着它们可能会格式化您的硬盘驱动器或导致恶魔飞出您的鼻子。
It is fairly common for all stack variables to be laid out adjacent to each other. If iis located where array[10]writes to, then the UB will reset ito 0, thus leading to the unterminated loop.
所有堆栈变量彼此相邻布置是相当普遍的。如果i位于array[10]写入到的位置,则 UB 将重置i为0,从而导致未终止的循环。
To fix, change the loop condition to i < 10.
要修复,请将循环条件更改为i < 10。
回答by Gilles 'SO- stop being evil'
In what should be the last run of the loop,you write to array[10], but there are only 10 elements in the array, numbered 0 through 9. The C language specification says that this is “undefined behavior”. What this means in practice is that your program will attempt to write to the int-sized piece of memory that lies immediately after arrayin memory. What happens then depends on what does, in fact, lie there, and this depends not only on the operating system but more so on the compiler, on the compiler options (such as optimization settings), on the processor architecture, on the surrounding code, etc. It could even vary from execution to execution, e.g. due to address space randomization(probably not on this toy example, but it does happen in real life). Some possibilities include:
在应该是循环的最后一次运行中,您写入array[10],但数组中只有 10 个元素,编号为 0 到 9。C 语言规范说这是“未定义行为”。这在实践中意味着您的程序将尝试写入int内存中紧随其后的-sizearray内存块。然后发生的事情取决于实际上有什么,这不仅取决于操作系统,还取决于编译器、编译器选项(例如优化设置)、处理器架构、周围的代码等等。它甚至可能因执行而异,例如由于地址空间随机化(可能不是在这个玩具示例中,但它确实发生在现实生活中)。一些可能性包括:
- The location wasn't used. The loop terminates normally.
- The location was used for something which happened to have the value 0. The loop terminates normally.
- The location contained the function's return address. The loop terminates normally, but then the program crashes because it tries to jump to the address 0.
- The location contains the variable
i. The loop never terminates becauseirestarts at 0. - The location contains some other variable. The loop terminates normally, but then “interesting” things happen.
- The location is an invalid memory address, e.g. because
arrayis right at the end of a virtual memory page and the next page isn't mapped. - Demons fly out of your nose. Fortunately most computers lack the requisite hardware.
- 未使用该位置。循环正常终止。
- 该位置用于某些碰巧具有值 0 的东西。循环正常终止。
- 该位置包含函数的返回地址。循环正常终止,但随后程序崩溃,因为它试图跳转到地址 0。
- 该位置包含变量
i。循环永远不会终止,因为i从 0 重新开始。 - 该位置包含一些其他变量。循环正常终止,但随后发生了“有趣”的事情。
- 该位置是一个无效的内存地址,例如因为
array它正好位于虚拟内存页面的末尾并且下一页没有被映射。 - 恶魔从你的鼻子里飞出来。幸运的是,大多数计算机缺乏必要的硬件。
What you observed on Windows was that the compiler decided to place the variable iimmediately after the array in memory, so array[10] = 0ended up assigning to i. On Ubuntu and CentOS, the compiler didn't place ithere. Almost all C implementations do group local variables in memory, on a memory stack, with one major exception: some local variables can be placed entirely in registers. Even if the variable is on the stack, the order of variables is determined by the compiler, and it may depend not only on the order in the source file but also on their types (to avoid wasting memory to alignment constraints that would leave holes), on their names, on some hash value used in a compiler's internal data structure, etc.
您在 Windows 上观察到的是,编译器决定将变量i立即放在内存中的数组之后,因此array[10] = 0最终分配给i. 在 Ubuntu 和 CentOS 上,编译器没有放在i那里。几乎所有的 C 实现都将内存中的局部变量分组在内存堆栈上,但一个主要的例外是:一些局部变量可以完全放在寄存器中。即使变量在堆栈上,变量的顺序也是由编译器决定的,它可能不仅取决于源文件中的顺序,还取决于它们的类型(以避免将内存浪费在会留下漏洞的对齐约束上) , 在他们的名字上,在编译器内部数据结构中使用的一些哈希值上,等等。
If you want to find out what your compiler decided to do, you can tell it to show you the assembler code. Oh, and learn to decipher assembler (it's easier than writing it). With GCC (and some other compilers, especially in the Unix world), pass the option -Sto produce assembler code instead of a binary. For example, here's the assembler snippet for the loop from compiling with GCC on amd64 with the optimization option -O0(no optimization), with comments added manually:
如果你想知道你的编译器决定做什么,你可以告诉它向你展示汇编代码。哦,学习破译汇编程序(它比编写它更容易)。使用 GCC(以及其他一些编译器,尤其是在 Unix 世界中),传递选项-S以生成汇编代码而不是二进制代码。例如,这是使用优化选项-O0(无优化)在 amd64 上使用 GCC 编译的循环的汇编程序片段,并手动添加了注释:
.L3:
movl -52(%rbp), %eax ; load i to register eax
cltq
movl movl , %ebx
.L3:
movl $.LC0, %edi
call puts
subl , %ebx
jne .L3
, -48(%rbp,%rax,4) ; set array[i] to 0
movl $.LC0, %edi
call puts ; printf of a constant string was optimized to puts
addl , -52(%rbp) ; add 1 to i
.L2:
cmpl , -52(%rbp) ; compare i to 10
jle .L3
Here the variable iis 52 bytes below the top of the stack, while the array starts 48 bytes below the top of the stack. So this compiler happens to have placed ijust before the array; you'd overwrite iif you happened to write to array[-1]. If you change array[i]=0to array[9-i]=0, you'll get an infinite loop on this particular platform with these particular compiler options.
此处变量i位于栈顶下方 52 个字节处,而数组在栈顶下方 48 个字节处开始。所以这个编译器恰好放在i数组之前;你会覆盖i,如果你碰巧写array[-1]。如果更改array[i]=0为array[9-i]=0,您将在此特定平台上使用这些特定编译器选项获得无限循环。
Now let's compile your program with gcc -O1.
现在让我们用gcc -O1.
void use_array(int *array) {}
That's shorter! The compiler has not only declined to allocate a stack location for i— it's only ever stored in the register ebx— but it hasn't bothered to allocate any memory for array, or to generate code to set its elements, because it noticed that none of the elements are ever used.
那个更短!编译器不仅拒绝为其分配堆栈位置i——它只存储在寄存器中ebx——而且它没有费心为 分配任何内存array,或生成代码来设置其元素,因为它注意到没有任何元素曾经使用过。
To make this example more telling, let's ensure that the array assignments are performed by providing the compiler with something it isn't able to optimize away. An easy way to do that is to use the array from another file —?because of separate compilation, the compiler doesn't know what happens in another file (unless it optimizes at link time, which gcc -O0or gcc -O1doesn't). Create a source file use_array.ccontaining
为了让这个例子更有说服力,让我们通过向编译器提供一些它无法优化的东西来确保数组分配被执行。一个简单的方法是使用另一个文件中的数组——因为单独编译,编译器不知道另一个文件中发生了什么(除非它在链接时优化,gcc -O0或者gcc -O1不优化)。创建一个use_array.c包含的源文件
#include <stdio.h>
void use_array(int *array);
int main()
{
int array[10],i;
for (i = 0; i <=10 ; i++)
{
array[i]=0; /*code should never terminate*/
printf("test \n");
}
printf("%zd \n", sizeof(array)/sizeof(int));
use_array(array);
return 0;
}
and change your source code to
并将您的源代码更改为
gcc -c use_array.c
gcc -O1 -S -o with_use_array1.c with_use_array.c use_array.o
Compile with
编译
movq %rsp, %rbx
leaq 44(%rsp), %rbp
.L3:
movl int array[10]
, (%rbx)
movl $.LC0, %edi
call puts
addq , %rbx
cmpq %rbp, %rbx
jne .L3
This time the assembler code looks like this:
这次汇编代码看起来像这样:
for (i = 0; i <=10 ; i++)
Now the array is on the stack, 44 bytes from the top. What about i? It doesn't appear anywhere! But the loop counter is kept in the register rbx. It's not exactly i, but the address of the array[i]. The compiler has decided that since the value of iwas never used directly, there was no point in performing arithmetic to calculate where to store 0 during each run of the loop. Instead that address is the loop variable, and the arithmetic to determine the boundaries was performed partly at compile time (multiply 11 iterations by 4 bytes per array element to get 44) and partly at run time but once and for all before the loop starts (perform a subtraction to get the initial value).
现在数组位于堆栈上,距顶部 44 个字节。怎么样i?它不会出现在任何地方!但循环计数器保存在寄存器中rbx。它不完全是i,而是 的地址array[i]。编译器已经决定,由于i从未直接使用的值,因此在每次循环运行期间执行算术来计算将 0 存储在哪里是没有意义的。相反,该地址是循环变量,确定边界的算术部分在编译时执行(将 11 次迭代乘以每个数组元素 4 个字节得到 44),部分在运行时执行,但在循环开始之前一劳永逸(执行减法以获得初始值)。
Even on this very simple example, we've seen how changing compiler options (turn on optimization) or changing something minor (array[i]to array[9-i]) or even changing something apparently unrelated (adding the call to use_array) can make a significant difference to what the executable program generated by the compiler does. Compiler optimizations can do a lot of things that may appear unintuitive on programs that invoke undefined behavior. That's why undefined behavior is left completely undefined. When you deviate ever so slightly from the tracks, in real-world programs, it can be very hard to understand the relationship between what the code does and what it should have done, even for experienced programmers.
即使在这个非常简单的例子中,我们也看到了如何改变编译器选项(打开优化)或改变一些小东西(array[i]to array[9-i])甚至改变一些明显不相关的东西(添加对 的调用use_array)可以对可执行程序生成的内容产生重大影响由编译器来做。编译器优化可以做很多在调用未定义行为的程序上看起来不直观的事情。这就是未定义行为完全未定义的原因。当你稍微偏离轨道时,在现实世界的程序中,即使对于有经验的程序员来说,也很难理解代码做了什么和应该做什么之间的关系。
回答by Yu Hao
Unlike Java, C doesn't do array boundary check, i.e, there's no ArrayIndexOutOfBoundsException, the job of making sure the array index is valid is left to the programmer. Doing this on purpose leads to undefined behavior, anything could happen.
与 Java 不同,C 不进行数组边界检查,即没有ArrayIndexOutOfBoundsException,确保数组索引有效的工作留给了程序员。故意这样做会导致未定义的行为,任何事情都可能发生。
For an array:
对于数组:
for (i = 0; i < 10; i++)
indexes are only valid in the range 0to 9. However, you are trying to:
指标仅在范围内有效0至9。但是,您正在尝试:
for (i = 0; i <=10 ; i++)
access array[10]here, change the condition to i < 10
访问array[10]这里,将条件更改为i < 10
回答by Derek T. Jones
You have a bounds violation, and on the non-terminating platforms, I believe you are inadvertently setting ito zero at the end of the loop, so that it starts over again.
你有一个边界违规,在非终止平台上,我相信你i在循环结束时无意中设置为零,以便它重新开始。
array[10]is invalid; it contains 10 elements, array[0]through array[9], and array[10]is the 11th. Your loop should be written to stop before10, as follows:
array[10]是无效的; 它包含 10 个元素,array[0]通过array[9],并且array[10]是第 11 个。您的循环应编写为 stop before10,如下所示:
for (i = 0; i < 10 ; i++)
Where array[10]lands is implementation-defined, and amusingly, on two of your platforms, it lands on i, which those platforms apparently lay out directly after array. iis set to zero and the loop continues forever. For your other platforms, imay be located before array, or arraymay have some padding after it.
其中array[10]土地是实现定义的,并且有趣的是,在您的两个平台,它的土地上i,其中这些平台之后显然奠定了直接array。 i设置为零,循环将永远继续。对于您的其他平台,i可能位于 之前array,也array可能在其之后有一些填充。
回答by rakeb.mazharul
You declare int array[10]means arrayhas index 0to 9(total 10integer elements it can hold). But the following loop,
你申报int array[10]的手段array有指数0,以9(总10能容纳整数元素)。但是下面的循环,
for (i = 0; i <= 9 ; i++)
will loop 0to 10means 11time. Hence when i = 10it will overflow the buffer and cause Undefined Behavior.
将循环0到10意味着11时间。因此,i = 10它何时会溢出缓冲区并导致Undefined Behavior。
So try this:
所以试试这个:
int array[10],i;
for (i = 0; i <10 ; i++)
{
or,
或者,
##代码##回答by DDPWNAGE
It is undefined at array[10], and gives undefined behavioras described before. Think about it like this:
它在 处是未定义的array[10],并给出如前所述的未定义行为。像这样思考:
I have 10 items in my grocery cart. They are:
我的购物车中有 10 件商品。他们是:
0: A box of cereal
1: Bread
2: Milk
3: Pie
4: Eggs
5: Cake
6: A 2 liter of soda
7: Salad
8: Burgers
9: Ice cream
0:一盒麦片
1:面包
2:牛奶
3:派
4:鸡蛋
5:蛋糕
6:2 升苏打水
7:沙拉
8:汉堡
9:冰淇淋
cart[10]is undefined, and may give an out of bounds exception in some compilers. But, a lot apparently don't. The apparent 11th item is an item not actually in the cart.The 11th item is pointing to, what I'm going to call, a "poltergeist item." It never existed, but it was there.
cart[10]未定义,并且可能在某些编译器中给出越界异常。但是,很多人显然没有。明显的第 11 件商品实际上不在购物车中。第 11 项指向,我将称之为“恶作剧项目”。它从未存在过,但它就在那里。
Why some compilers give ian index of array[10]or array[11]or even array[-1]is because of your initialization/declaration statement. Some compilers interpret this as:
为什么有些编译器会给出i的指数array[10]或者array[11]甚至array[-1]是因为你的初始化/声明语句。一些编译器将其解释为:
- "Allocate 10 blocks of
ints forarray[10]and anotherintblock. to make it easier,put them right next to each other." - Same as before, but move it a space or two away, so that
array[10]doesn't point toi. - Do the same as before, but allocate
iatarray[-1](because an index of an array can't, or shouldn't, be negative), or allocate it at a completely different spot because the OS can handle it, and it's safer.
- “为
intsarray[10]和另一个int块分配 10 个块。为了更容易,将它们并排放置。” - 和以前一样,但将它移开一两个空格,这样它就
array[10]不会指向i. - 和以前一样,但分配
iatarray[-1](因为数组的索引不能或不应该是负数),或者在完全不同的位置分配它,因为操作系统可以处理它,而且更安全。
Some compilers want things to go quicker, and some compilers prefer safety. It's all about the context. If I was developing an app for the ancient BREW OS (the OS of a basic phone), for example, it wouldn't care about safety. If I was developing for an iPhone 6, then it could run fast no matter what, so I would need an emphasis on safety. (Seriously, have you read Apple's App Store Guidelines, or read up on the development of Swift and Swift 2.0?)
一些编译器希望事情进展得更快,而一些编译器更喜欢安全。这都是关于上下文的。例如,如果我正在为古老的 BREW 操作系统(基本手机的操作系统)开发应用程序,它就不会关心安全性。如果我为 iPhone 6 开发,那么它无论如何都可以运行得很快,所以我需要强调安全性。(说真的,你读过苹果的 App Store 指南,或者读过 Swift 和 Swift 2.0 的开发吗?)
回答by Steephen
Since you created an array of size 10, for loop condition should be as follows:
由于您创建了一个大小为 10 的数组,因此 for 循环条件应如下所示:
##代码##Currently you are trying to access the unassigned location from the memory using array[10]and it is causing the undefined behavior. Undefined behavior means your program will behave undetermined fashion, so it can give different outputs in each execution.
当前,您正在尝试使用内存访问未分配的位置array[10],这导致了未定义的行为。未定义的行为意味着您的程序将以不确定的方式运行,因此它可以在每次执行中给出不同的输出。
回答by unxnut
Well, C compiler traditionally does not check for bounds. You can get a segmentation fault in case you refer to a location that does not "belong" to your process. However, the local variables are allocated on stack and depending on the way the memory is allocated, the area just beyond the array (array[10]) may belong to the process' memory segment. Thus, no segmentation fault trap is thrown and that is what you seem to experience. As others have pointed out, this is undefined behavior in C and your code may be considered erratic. Since you are learning C, you are better off getting into the habit of checking for bounds in your code.
好吧,C 编译器传统上不检查边界。如果您引用的位置不“属于”您的流程,您可能会遇到分段错误。但是,局部变量是在堆栈上分配的,并且根据内存分配的方式,数组 ( array[10])之外的区域可能属于进程的内存段。因此,不会抛出任何分段错误陷阱,而这正是您所经历的。正如其他人指出的那样,这是 C 中未定义的行为,您的代码可能被认为是不稳定的。由于您正在学习 C,因此最好养成在代码中检查边界的习惯。
回答by supercat
Beyond the possibility that memory might be laid out so that an attempt to write to a[10]actually overwrites i, it would also be possible that an optimizing compiler might determine that the loop test cannot be reached with a value of igreater than ten without code having first accessed the non-existent array element a[10].
除了内存布局可能会导致写入a[10]实际覆盖的i可能性之外,优化编译器也有可能确定无法在i代码未首先访问的情况下使用大于 10的值到达循环测试。不存在的数组元素a[10]。
Since an attempt to access that element would be undefined behavior, the compiler would have no obligations with regard to what the program might do after that point. More specifically, since the compiler would have no obligation to generate code to check the loop index in any case where it might be greater than ten, it would have no obligation to generate code to check it at all; it could instead assume that the <=10test will always yield true. Note that this would be true even if the code would read a[10]rather than writing it.
由于尝试访问该元素将是未定义的行为,因此编译器对程序在那之后可能执行的操作没有义务。更具体地说,由于编译器没有义务在循环索引可能大于 10 的任何情况下生成代码来检查它,因此它根本没有义务生成代码来检查它;相反,它可以假设<=10测试总是会产生真值。请注意,即使代码会读取a[10]而不是写入,这也是正确的。

