C语言查询 gcc 的 -ffunction-section 和 -fdata-sections 选项

Question

提问by Jay

The below mentioned in the GCC Page for the function sections and data sections options:

以下在 GCC 页面中提到的功能部分和数据部分选项：

-ffunction-sections
-fdata-sections
Place each function or data item into its own section in the output file if the target supports arbitrary sections. The name of the function or the name of the data item determines the section's name in the output file. Use these options on systems where the linker can perform optimizations to improve locality of reference in the instruction space. Most systems using the ELF object format and SPARC processors running Solaris 2 have linkers with such optimizations. AIX may have these optimizations in the future.
Only use these options when there are significant benefits from doing so. When you specify these options, the assembler and linker will create larger object and executable files and will also be slower.You will not be able to use gprof on all systems if you specify this option and you may have problems with debugging if you specify both this option and -g.

-ffunction-sections
-fdata-sections
如果目标支持任意部分，则将每个函数或数据项放入输出文件中自己的部分。函数的名称或数据项的名称决定了输出文件中部分的名称。在链接器可以执行优化以改进指令空间中引用的局部性的系统上使用这些选项。大多数使用 ELF 对象格式和运行 Solaris 2 的 SPARC 处理器的系统都具有具有此类优化的链接器。AIX 将来可能会有这些优化。
仅当这样做有显着好处时才使用这些选项。当您指定这些选项时，汇编器和链接器将创建更大的目标文件和可执行文件，并且速度也会变慢。如果您指定此选项，您将无法在所有系统上使用 gprof，如果您同时指定此选项和 -g，您可能会遇到调试问题。

I was under the impression that these options will help in reducing the executable file size. Why does this page say that it will create larger executable files? Am I missing something?

我的印象是这些选项将有助于减少可执行文件的大小。为什么这个页面说它会创建更大的可执行文件？我错过了什么吗？

Answer 1

采纳答案by leppie

When using those compiler options, you can add the linker option -Wl,--gc-sectionsthat will remove all unused code.

使用这些编译器选项时，您可以添加-Wl,--gc-sections将删除所有未使用代码的链接器选项。

Answer 2

回答by Anton Staaf

Interestingly, using -fdata-sectionscan make the literal pools of your functions, and thus your functions themselves larger. I've noticed this on ARM in particular, but it's likely to be true elsewhere. The binary I was testing only grew by a quarter of a percent, but it did grow. Looking at the disassembly of the changed functions it was clear why.

有趣的是， using-fdata-sections可以使函数的文字池变大，从而使函数本身变大。我特别在 ARM 上注意到了这一点，但在其他地方可能也是如此。我正在测试的二进制文件仅增长了四分之一，但确实增长了。查看更改后的功能的反汇编很清楚原因。

If all of the BSS (or DATA) entries in your object file are allocated to a single section then the compiler can store the address of that section in the functions literal pool and generate loads with known offsets from that address in the function to access your data. But if you enable -fdata-sectionsit puts each piece of BSS (or DATA) data into its own section, and since it doesn't know which of these sections might be garbage collected later, or what order the linker will place all of these sections into the final executable image, it can no longer load data using offsets from a single address. So instead, it has to allocate an entry in the literal pool per used data, and once the linker has figured out what is going into the final image and where, then it can go and fix up these literal pool entries with the actual address of the data.

如果目标文件中的所有 BSS（或 DATA）条目都分配给单个部分，那么编译器可以将该部分的地址存储在函数文字池中，并从函数中的该地址生成具有已知偏移量的负载以访问您的数据。但是如果你启用-fdata-sections它将每个 BSS（或 DATA）数据放入自己的部分，并且由于它不知道这些部分中的哪些部分可能会在以后被垃圾收集，或者链接器将所有这些部分放入最终的可执行映像中的顺序是什么，它不能再使用来自单个地址的偏移量加载数据。因此，相反，它必须为每个使用过的数据在文字池中分配一个条目，一旦链接器确定了进入最终图像的内容和位置，它就可以使用实际地址修复这些文字池条目数据。

So yes, even with -Wl,--gc-sectionsthe resulting image can be larger because the actual function text is larger.

所以是的，即使-Wl,--gc-sections生成的图像可能更大，因为实际的函数文本更大。

Below I've added a minimal example

下面我添加了一个最小的例子

The code below is enough to see the behavior I'm talking about. Please don't be thrown off by the volatile declaration and use of global variables, both of which are questionable in real code. Here they ensure the creation of two data sections when -fdata-sections is used.

下面的代码足以看到我正在谈论的行为。请不要被 volatile 声明和全局变量的使用所抛弃，这两者在实际代码中都是有问题的。在这里，它们确保在使用 -fdata-sections 时创建两个数据节。

static volatile int head;
static volatile int tail;

int queue_empty(void)
{
    return head == tail;
}

The version of GCC used for this test is:

本次测试使用的 GCC 版本为：

gcc version 6.1.1 20160526 (Arch Repository)

First, without -fdata-sections we get the following.

首先，如果没有 -fdata-sections，我们会得到以下结果。

> arm-none-eabi-gcc -march=armv6-m \
                    -mcpu=cortex-m0 \
                    -mthumb \
                    -Os \
                    -c \
                    -o test.o \
                    test.c

> arm-none-eabi-objdump -dr test.o

00000000 <queue_empty>:
 0: 4b03     ldr   r3, [pc, #12]   ; (10 <queue_empty+0x10>)
 2: 6818     ldr   r0, [r3, #0]
 4: 685b     ldr   r3, [r3, #4]
 6: 1ac0     subs  r0, r0, r3
 8: 4243     negs  r3, r0
 a: 4158     adcs  r0, r3
 c: 4770     bx    lr
 e: 46c0     nop                   ; (mov r8, r8)
10: 00000000 .word 0x00000000
             10: R_ARM_ABS32 .bss

> arm-none-eabi-nm -S test.o

00000000 00000004 b head
00000000 00000014 T queue_empty
00000004 00000004 b tail

From arm-none-eabi-nmwe see that queue_empty is 20 bytes long (14 hex), and the arm-none-eabi-objdumpoutput shows that there is a single relocation word at the end of the function, it's the address of the BSS section (the section for uninitialized data). The first instruction in the function loads that value (the address of the BSS) into r3. The next two instructions load relative to r3, offsetting by 0 and 4 bytes respectively. These two loads are the loads of the values of head and tail. We can see those offsets in the first column of the output from arm-none-eabi-nm. The nopat the end of the function is to word align the address of the literal pool.

从arm-none-eabi-nm我们看到 queue_empty 是 20 字节长（14 十六进制），并且arm-none-eabi-objdump输出显示在函数末尾有一个重定位字，它是 BSS 部分（未初始化数据部分）的地址。函数中的第一条指令将该值（BSS 的地址）加载到 r3 中。接下来的两条指令相对于 r3 加载，分别偏移 0 和 4 个字节。这两个负载是 head 和 tail 值的负载。我们可以在输出的第一列中看到这些偏移量arm-none-eabi-nm。将nop在函数到底是对准字文字池中的地址。

Next we'll see what happens when -fdata-sections is added.

接下来我们将看到添加 -fdata-sections 时会发生什么。

arm-none-eabi-gcc -march=armv6-m \
                  -mcpu=cortex-m0 \
                  -mthumb \
                  -Os \
                  -fdata-sections \
                  -c \
                  -o test.o \
                  test.c

arm-none-eabi-objdump -dr test.o

00000000 <queue_empty>:
 0: 4b03     ldr   r3, [pc, #12]    ; (10 <queue_empty+0x10>)
 2: 6818     ldr   r0, [r3, #0]
 4: 4b03     ldr   r3, [pc, #12]    ; (14 <queue_empty+0x14>)
 6: 681b     ldr   r3, [r3, #0]
 8: 1ac0     subs  r0, r0, r3
 a: 4243     negs  r3, r0
 c: 4158     adcs  r0, r3
 e: 4770     bx    lr
    ...
             10: R_ARM_ABS32 .bss.head
             14: R_ARM_ABS32 .bss.tail

arm-none-eabi-nm -S test.o

00000000 00000004 b head
00000000 00000018 T queue_empty
00000000 00000004 b tail

Immediately we see that the length of queue_empty has increased by four bytes to 24 bytes (18 hex), and that there are now two relocations to be done in queue_empty's literal pool. These relocations correspond to the addresses of the two BSS sections that were created, one for each global variable. There need to be two addresses here because the compiler can't know the relative position that the linker will end up putting the two sections in. Looking at the instructions at the beginning of queue_empty, we see that there is an extra load, the compiler has to generate separate load pairs to get the address of the section and then the value of the variable in that section. The extra instruction in this version of queue_empty doesn't make the body of the function longer, it just takes the spot that was previously a nop, but that won't be the case in general.

我们立即看到 queue_empty 的长度增加了 4 个字节到 24 个字节（十六进制），并且现在在 queue_empty 的文字池中有两个重定位要完成。这些重定位对应于创建的两个 BSS 部分的地址，每个全局变量一个。这里需要有两个地址，因为编译器无法知道链接器最终会把这两个段放入的相对位置。查看 queue_empty 开头的指令，我们看到有一个额外的负载，编译器必须生成单独的加载对以获取该部分的地址，然后是该部分中变量的值。这个版本的 queue_empty 中的额外指令不会使函数体变长，它只是占据了以前是 nop 的位置，

Answer 3

回答by fwhacking

You can use -ffunction-sectionsand -fdata-sectionson static libraries, which will increase the size of the static library, as each function and global data variable will be put in a separate section.

您可以在静态库上使用-ffunction-sections和-fdata-sections，这会增加静态库的大小，因为每个函数和全局数据变量都将放在单独的部分中。

And then use -Wl,--gc-sectionson the program linking with this static library, which will remove unused sections.

然后-Wl,--gc-sections在与这个静态库链接的程序上使用，这将删除未使用的部分。

Thus, the final binary will be smaller thant without those flags.

因此，最终的二进制文件将比没有这些标志的要小。

Be careful though, as -Wl,--gc-sectionscan break things.

不过要小心，因为它-Wl,--gc-sections会破坏东西。

Answer 4

回答by Rei Vilo

I get better results adding an additional step and building an .aarchive:

添加一个额外的步骤并构建一个.a存档，我得到了更好的结果：

first, gcc and g++ are used with -ffunction-sections-fdata-sectionsflags
then, all .oobjects are put into an .aarchive with ar rcs file.a *.o
finally, the linker is called with -Wl,-gc-sections,-u,mainoptions
for all, optimisation is set to -Os.

首先，gcc 和 g++ 与-ffunction-sections-fdata-sections标志一起使用
然后，所有.o对象都被放入一个.a存档中ar rcs file.a *.o
最后，使用-Wl,-gc-sections,-u,main选项调用链接器
总之，优化设置为-Os。

Answer 5

回答by Goswin von Brederlow

I tried it a while back and looking at the results it seems the size increase comes from the order of objects with different alignment. Normaly the linker sorts objects to keep the padding between them small but it looks like that only works within a section, not across the individual sections. So you often get extra padding between the data sections for each function increasing the overall space.

我试过一段时间后查看结果，似乎大小增加来自具有不同对齐方式的对象的顺序。通常，链接器对对象进行排序以保持它们之间的填充较小，但看起来这仅适用于一个部分，而不适用于各个部分。因此，您通常会在每个函数的数据部分之间获得额外的填充，从而增加了整体空间。

For a static lib with -Wl,-gc-sections the removal of unused section will most likely make more than up for the small increase though.

对于带有 -Wl,-gc-sections 的静态库，删除未使用的部分很可能会弥补小幅增加的不足。

C语言查询 gcc 的 -ffunction-section 和 -fdata-sections 选项

提问by Jay

采纳答案by leppie

回答by Anton Staaf

回答by fwhacking

回答by Rei Vilo

回答by Goswin von Brederlow

相关推荐

最近更新

标签

C语言 查询 gcc 的 -ffunction-section 和 -fdata-sections 选项

提问by Jay

采纳答案by leppie

回答by Anton Staaf

回答by fwhacking

回答by Rei Vilo

回答by Goswin von Brederlow

相关推荐

C语言 C 中的全局变量是静态的还是非静态的？

C语言 将多个 C 源文件编译成一个唯一的目标文件

C语言 在编译时确定字节序

C语言 使用strtok在C中解析字符串

相关推荐

最近更新

标签

C语言查询 gcc 的 -ffunction-section 和 -fdata-sections 选项

C语言将多个 C 源文件编译成一个唯一的目标文件

C语言在编译时确定字节序

C语言使用strtok在C中解析字符串