Linux 一个 C hello world 的汇编输出的每一行是什么意思?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5325326/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:16:52  来源:igfitidea点击:

What is the meaning of each line of the assembly output of a C hello world?

linuxgccassemblyx86

提问by Mohammed

I ran gcc -S over this:

我在这个上运行了 gcc -S :

int main()
{
printf ("Hello world!");
}

and I got this assembly code:

我得到了这个汇编代码:

        .file   "test.c"
        .section        .rodata
.LC0:
        .string "Hello world!"
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    , %esp
        movl    $.LC0, (%esp)
        call    printf
        addl    , %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)"
        .section        .note.GNU-stack,"",@progbits

I am curious to understand this output. Can someone share some pointers in understanding this output, or if someone could mark comments against each of these lines/group of lines explaining what it does it would be great.

我很想了解这个输出。有人可以分享一些理解此输出的指针,或者如果有人可以针对这些行/行组中的每一行标记注释,解释它的作用,那就太好了。

采纳答案by Thomas Pornin

Here how it goes:

这是怎么回事:

        .file   "test.c"

The original source file name (used by debuggers).

原始源文件名(由调试器使用)。

        .section        .rodata
.LC0:
        .string "Hello world!"

A zero-terminated string is included in the section ".rodata" ("ro" means "read-only": the application will be able to read the data, but any attempt at writing into it will trigger an exception).

“.rodata”部分包含一个以零结尾的字符串(“ro”表示“只读”:应用程序将能够读取数据,但任何写入数据的尝试都会触发异常)。

        .text

Now we write things into the ".text" section, which is where code goes.

现在我们将内容写入“.text”部分,这是代码所在的位置。

.globl main
        .type   main, @function
main:

We define a function called "main" and globally visible (other object files will be able to invoke it).

我们定义了一个名为“main”且全局可见的函数(其他目标文件将能够调用它)。

        leal    4(%esp), %ecx

We store in register %ecxthe value 4+%esp(%espis the stack pointer).

我们将%ecx值存储在寄存器中4+%esp%esp是堆栈指针)。

        andl    $-16, %esp

%espis slightly modified so that it becomes a multiple of 16. For some data types (the floating-point format corresponding to C's doubleand long double), performance is better when the memory accesses are at addresses which are multiple of 16. This is not really needed here, but when used without the optimization flag (-O2...), the compiler tends to produce quite a lot of generic useless code (i.e. code which could be useful in some cases but not here).

%esp稍作修改,使其成为 16 的倍数。对于某些数据类型(对应于 Cdouble和的浮点格式long double),当内存访问位于 16 的倍数的地址时,性能会更好。这里并不真正需要,但是在没有优化标志 ( -O2...) 的情况下使用时,编译器往往会产生大量通用的无用代码(即,在某些情况下可能有用但在这里没有用的代码)。

        pushl   -4(%ecx)

This one is a bit weird: at that point, the word at address -4(%ecx)is the word which was on top of the stack prior to the andl. The code retrieves that word (which should be the return address, by the way) and pushes it again. This kind of emulates what would be obtained with a call from a function which had a 16-byte aligned stack. My guess is that this pushis a remnant of an argument-copying sequence. Since the function has adjusted the stack pointer, it must copy the function arguments, which were accessible through the old value of the stack pointer. Here, there is no argument, except the function return address. Note that this word will not be used (yet again, this is code without optimization).

这个有点奇怪:在这一点上,地址-4(%ecx)处的字是在andl. 代码检索那个词(顺便说一下,它应该是返回地址)并再次推送它。这种模拟从具有 16 字节对齐堆栈的函数调用将获得的内容。我的猜测是这push是一个参数复制序列的残余。由于函数已经调整了堆栈指针,它必须复制函数参数,这些参数可以通过堆栈指针的旧值访问。这里没有参数,除了函数返回地址。请注意,不会使用这个词(同样,这是没有优化的代码)。

        pushl   %ebp
        movl    %esp, %ebp

This is the standard function prologue: we save %ebp(since we are about to modify it), then set %ebpto point to the stack frame. Thereafter, %ebpwill be used to access the function arguments, making %espfree again. (Yes, there is no argument, so this is useless for that function.)

这是标准的函数序言:我们保存%ebp(因为我们要修改它),然后设置%ebp为指向堆栈帧。此后,%ebp将用于访问函数参数,%esp再次释放。(是的,没有参数,所以这对那个函数没用。)

        pushl   %ecx

We save %ecx(we will need it at function exit, to restore %espat the value it had before the andl).

我们保存%ecx(我们将在函数退出时需要它,以恢复%esp它在 之前的值andl)。

        subl    , %esp

We reserve 32 bytes on the stack (remember that the stack grows "down"). That space will be used to storea the arguments to printf()(that's overkill, since there is a single argument, which will use 4 bytes [that's a pointer]).

我们在堆栈上保留了 32 个字节(记住堆栈是“向下”增长的)。该空间将用于存储参数printf()(这是矫枉过正,因为只有一个参数,它将使用 4 个字节 [这是一个指针])。

        movl    $.LC0, (%esp)
        call    printf

We "push" the argument to printf()(i.e. we make sure that %esppoints to a word which contains the argument, here $.LC0, which is the address of the constant string in the rodata section). Then we call printf().

我们将参数“推送”到printf()(即我们确保%esp指向包含参数的单词,这里$.LC0是rodata 部分中常量字符串的地址)。然后我们调用printf().

        addl    , %esp

When printf()returns, we remove the space allocated for the arguments. This addlcancels what the sublabove did.

printf()返回时,我们删除为参数分配的空间。这addl取消了subl上面所做的。

        popl    %ecx

We recover %ecx(pushed above); printf()may have modified it (the call conventions describe which register can a function modify without restoring them upon exit; %ecxis one such register).

我们恢复%ecx(上面推);printf()可能已经修改了它(调用约定描述了函数可以修改哪个寄存器而不在退出时恢复它们;%ecx就是这样的一个寄存器)。

        popl    %ebp

Function epilogue: this restores %ebp(corresponding to the pushl %ebpabove).

功能结语:这个恢复%ebp(对应pushl %ebp上面)。

        leal    -4(%ecx), %esp

We restore %espto its initial value. The effect of this opcode is to store in %espthe value %ecx-4. %ecxwas set in the first function opcode. This cancels any alteration to %esp, including the andl.

我们恢复%esp到它的初始值。此操作码的作用是存储在%espvalue 中%ecx-4%ecx在第一个函数操作码中设置。这将取消对 的任何更改%esp,包括andl

        ret

Function exit.

函数退出。

        .size   main, .-main

This sets the size of the main()function: at any point during assembly, "." is an alias for "the address at which we are adding things right now". If another instruction was added here, it would go at the address specified by ".". Thus, ".-main", here, is the exact size of the code of the function main(). The .sizedirective instructs the assembler to write that information in the object file.

这设置了main()函数的大小:在汇编过程中的任何时候,“ .”是“我们现在添加东西的地址”的别名。如果在此处添加了另一条指令,它将转到“ .”指定的地址。因此,.-main这里的“ ”是函数代码的确切大小main()。该.size指令指示汇编器将该信息写入目标文件。

        .ident  "GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)"

GCC just loves to leave traces of its action. This string ends up as a kind of comment in the object file. The linker will remove it.

GCC 只是喜欢留下其行动的痕迹。该字符串最终作为目标文件中的一种注释。链接器将删除它。

        .section        .note.GNU-stack,"",@progbits

A special section where GCC writes that the code can accommodate a non-executable stack. This is the normal case. Executable stacks are needed for some special usages (not standard C). On modern processors, the kernel can make a non-executable stack (a stack which triggers an exception if someone tries to execute as code some data which is on the stack); this is viewed by some people as a "security feature" because putting code on the stack is a common way to exploit buffer overflows. With this section, the executable will be marked as "compatible with a non-executable stack" which the kernel will happily provide as such.

GCC 写的一个特殊部分,代码可以容纳不可执行的堆栈。这是正常情况。某些特殊用途(非标准 C)需要可执行堆栈。在现代处理器上,内核可以创建一个不可执行的堆栈(如果有人试图将堆栈上的某些数据作为代码执行,该堆栈会触发异常);这被一些人视为“安全功能”,因为将代码放在堆栈上是利用缓冲区溢出的常见方法。有了这个部分,可执行文件将被标记为“与非可执行堆栈兼容”,内核很乐意提供它。

回答by BlackBear

    leal    4(%esp), %ecx
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ecx
    subl    , %esp

these instructions don't compare in your c program, they're always executed at the beginning of every function (but it depends on compiler/platform)

这些指令不会在您的 c 程序中进行比较,它们总是在每个函数的开头执行(但这取决于编译器/平台)

    movl    $.LC0, (%esp)
    call    printf

this block corresponds to your printf() call. the first instruction places on the stack its argument (a pointer to "hello world") then calls the function.

此块对应于您的 printf() 调用。第一条指令将其参数(指向“hello world”的指针)放在堆栈上,然后调用该函数。

    addl    , %esp
    popl    %ecx
    popl    %ebp
    leal    -4(%ecx), %esp
    ret

these instructions are opposite to the first block, they're some sort of stack manipulation stuffs. always executed too

这些指令与第一个块相反,它们是某种堆栈操作的东西。也总是被执行

回答by Eric Wang

Here is some supplement to @Thomas Pornin's answer.

这是对@Thomas Pornin's answer 的一些补充。

  • .LC0local constant, e.g string literal.
  • .LFB0local function beginning,
  • .LFE0local function ending,
  • .LC0局部常量,例如字符串文字。
  • .LFB0本地函数开始,
  • .LFE0本地函数结束,

The suffix of these label is a number, and start from 0.

这些标签的后缀是一个数字,从 0 开始。

This is gcc assembler convention.

这是 gcc 汇编器约定。