C语言 gcc、严格别名和恐怖故事

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2958633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 05:33:47  来源:igfitidea点击:

gcc, strict-aliasing, and horror stories

cgccstrict-aliasing

提问by Joseph Quinsey

In gcc-strict-aliasing-and-casting-through-a-unionI asked whether anyone had encountered problems with union punning through pointers. So far, the answer seems to be No.

gcc-strict-aliasing-and-casting-through-a-union 中,我问是否有人遇到过通过指针进行联合双关语的问题。到目前为止,答案似乎是否定的

This question is broader: Do you have anyhorror stories about gcc and strict-aliasing?

这个问题更广泛:你有任何关于 gcc 和严格别名的恐怖故事吗?

Background: Quoting from AndreyT's answer in c99-strict-aliasing-rules-in-c-gcc:

背景:引用AndreyT 在 c99-strict-aliasing-rules-in-c-gcc 中的回答

"Strict aliasing rules are rooted in parts of the standard that were present in C and C++ since the beginning of [standardized] times. The clause that prohibits accessing object of one type through a lvalue of another type is present in C89/90 (6.3) as well as in C++98 (3.10/15). ... It is just that not all compilers wanted (or dared) to enforce it or rely on it."

“严格的别名规则植根于自 [标准化] 时代开始以来存在于 C 和 C++ 中的部分标准。禁止通过另一种类型的左值访问一种类型的对象的条款存在于 C89/90 (6.3 ) 以及在 C++98 (3.10/15) 中……只是并非所有编译器都希望(或敢于)强制执行或依赖它。”

Well, gccis now daring to do so, with its -fstrict-aliasingswitch. And this has caused some problems. See, for example, the excellent article http://davmac.wordpress.com/2009/10/about a Mysql bug, and the equally excellent discussion in http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html.

好吧,gcc现在敢于这样做了,它的-fstrict-aliasing开关。这造成了一些问题。例如,参见关于 Mysql 错误的优秀文章 http://davmac.wordpress.com/2009/10/,以及http://cellperformance.beyond3d.com/articles/2006/06/understanding 中同样出色的讨论-strict-aliasing.html

Some other less-relevant links:

其他一些不太相关的链接:

So to repeat, do you have a horror story of your own? Problems notindicated by -Wstrict-aliasingwould, of course, be preferred. And other C compilers are also welcome.

再说一遍,你有自己的恐怖故事吗?没有指出的问题-Wstrict-aliasing当然是首选。也欢迎使用其他 C 编译器。

Added June 2nd: The first link in Michael Burr's answer, which does indeedqualify as a horror story, is perhaps a bit dated (from 2003). I did a quick test, but the problem has apparently gone away.

新增6月2日:在第一个环节迈克尔·伯尔的答案,这也的确是有资格作为一个恐怖故事,也许是(2003年)有点过时。我做了一个快速测试,但问题显然已经消失了。

Source:

来源:

#include <string.h>
struct iw_event {               /* dummy! */
    int len;
};
char *iwe_stream_add_event(
    char *stream,               /* Stream of events */
    char *ends,                 /* End of stream */
    struct iw_event *iwe,       /* Payload */
    int event_len)              /* Real size of payload */
{
    /* Check if it's possible */
    if ((stream + event_len) < ends) {
            iwe->len = event_len;
            memcpy(stream, (char *) iwe, event_len);
            stream += event_len;
    }
    return stream;
}

The specific complaint is:

具体投诉如下:

Some users have complained that when the [above] code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which means a bogus len is mem-copied into the stream).

一些用户抱怨说,当在没有 -fno-strict-aliasing 的情况下编译 [above] 代码时,write 和 memcpy 的顺序是颠倒的(这意味着伪造的 len 被 mem 复制到流中)。

Compiled code, using gcc4.3.4 on CYGWIN wih -O3 (please correct me if I am wrong--my assembler is a bit rusty!):

编译代码,在 CYGWIN 上使用gcc4.3.4 和 -O3(如果我错了,请纠正我——我的汇编程序有点生疏!):

_iwe_stream_add_event:
        pushl       %ebp
        movl        %esp, %ebp
        pushl       %ebx
        subl        , %esp
        movl        8(%ebp), %eax       # stream    --> %eax
        movl        20(%ebp), %edx      # event_len --> %edx
        leal        (%eax,%edx), %ebx   # sum       --> %ebx
        cmpl        12(%ebp), %ebx      # compare sum with ends
        jae L2
        movl        16(%ebp), %ecx      # iwe       --> %ecx
        movl        %edx, (%ecx)        # event_len --> iwe->len (!!)
        movl        %edx, 8(%esp)       # event_len --> stack
        movl        %ecx, 4(%esp)       # iwe       --> stack
        movl        %eax, (%esp)        # stream    --> stack
        call        _memcpy
        movl        %ebx, %eax          # sum       --> retval
L2:
        addl        , %esp
        popl        %ebx
        leave
        ret

And for the second link in Michael's answer,

对于迈克尔回答中的第二个链接,

*(unsigned short *)&a = 4;

gccwill usually (always?) give a warning. But I believea valid solution to this (for gcc) is to use:

gcc通常会(总是?)发出警告。但我相信一个有效的解决方案(对于gcc)是使用:

#define CAST(type, x) (((union {typeof(x) src; type dst;}*)&(x))->dst)
// ...
CAST(unsigned short, a) = 4;

I've asked SO whether this is OK in gcc-strict-aliasing-and-casting-through-a-union, but so far nobody disagrees.

我已经问过这在gcc-strict-aliasing-and-casting-through-a-union 中是否可以,但到目前为止没有人不同意。

采纳答案by Michael Burr

No horror story of my own, but here are some quotes from Linus Torvalds (sorry if these are already in one of the linked references in the question):

没有我自己的恐怖故事,但这里有一些来自 Linus Torvalds 的引述(抱歉,如果这些已经在问题中的链接参考文献之一中):

http://lkml.org/lkml/2003/2/26/158:

http://lkml.org/lkml/2003/2/26/158

Date Wed, 26 Feb 2003 09:22:15 -0800 Subject Re: Invalid compilation without -fno-strict-aliasing From Jean Tourrilhes <>

On Wed, Feb 26, 2003 at 04:38:10PM +0100, Horst von Brand wrote:

Jean Tourrilhes <> said:

It looks like a compiler bug to me... Some users have complained that when the following code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which mean a bogus len is mem-copied into the stream). Code (from linux/include/net/iw_handler.h) :

static inline char *
iwe_stream_add_event(char *   stream,     /* Stream of events */
                     char *   ends,       /* End of stream */
                    struct iw_event *iwe, /* Payload */
                     int      event_len)  /* Real size of payload */
{
  /* Check if it's possible */
  if((stream + event_len) < ends) {
      iwe->len = event_len;
      memcpy(stream, (char *) iwe, event_len);
      stream += event_len;
  }
  return stream;
}

IMHO, the compiler should have enough context to know that the reordering is dangerous. Any suggestion to make this simple code more bullet proof is welcomed.

The compiler is free to assume char *stream and struct iw_event *iwe point to separate areas of memory, due to strict aliasing.

Which is true and which is not the problem I'm complaining about.

日期 2003 年 2 月 26 日星期三 09:22:15 -0800 主题回复:没有 -fno-strict-aliasing 的无效编译来自 Jean Tourrilhes <>

2003 年 2 月 26 日星期三下午 04:38:10 +0100,Horst von Brand 写道:

Jean Tourrilhes <> 说:

对我来说,这看起来像是一个编译器错误......一些用户抱怨说,当在没有 -fno-strict-aliasing 的情况下编译以下代码时,write 和 memcpy 的顺序是颠倒的(这意味着伪造的 len 是 mem 复制的)入流)。代码(来自 linux/include/net/iw_handler.h):

static inline char *
iwe_stream_add_event(char *   stream,     /* Stream of events */
                     char *   ends,       /* End of stream */
                    struct iw_event *iwe, /* Payload */
                     int      event_len)  /* Real size of payload */
{
  /* Check if it's possible */
  if((stream + event_len) < ends) {
      iwe->len = event_len;
      memcpy(stream, (char *) iwe, event_len);
      stream += event_len;
  }
  return stream;
}

恕我直言,编译器应该有足够的上下文来知道重新排序是危险的。欢迎任何使这个简单的代码更加防弹的建议。

由于严格的别名,编译器可以自由地假设 char *stream 和 struct iw_event *iwe 指向不同的内存区域。

哪个是真的,哪个不是我抱怨的问题。

(Note with hindsight: this code is fine, but Linux's implementation of memcpywas a macro that cast to long *to copy in larger chunks. With a correctly-defined memcpy, gcc -fstrict-aliasingisn't allowed to break this code. But it means you need inline asm to define a kernel memcpyif your compiler doesn't know how turn a byte-copy loop into efficient asm, which was the case for gcc before gcc7)

(事后注意:这段代码很好,但 Linux 的实现memcpy是一个宏,它强制转换long *为以更大的块复制。使用正确定义的memcpygcc -fstrict-aliasing不允许破坏此代码。但这意味着您需要内联 asm 来定义一个内核,memcpy如果您的编译器不知道如何将字节复制循环转换为高效的 asm,这是 gcc7 之前的 gcc 的情况)

And Linus Torvald's comment on the above:

Jean Tourrilhes wrote: >

It looks like a compiler bug to me...

Why do you think the kernel uses "-fno-strict-aliasing"?

The gcc people are more interested in trying to find out what can be allowed by the c99 specs than about making things actually work. The aliasing code in particular is not even worth enabling, it's just not possible to sanely tell gcc when some things can alias.

Some users have complained that when the following code is compiled without the -fno-strict-aliasing, the order of the write and memcpy is inverted (which mean a bogus len is mem-copied into the stream).

The "problem" is that we inline the memcpy(), at which point gcc won't care about the fact that it can alias, so they'll just re-order everything and claim it's out own fault. Even though there is no sane way for us to even tell gcc about it.

I tried to get a sane way a few years ago, and the gcc developers really didn't care about the real world in this area. I'd be surprised if that had changed, judging by the replies I have already seen.

I'm not going to bother to fight it.

Linus

以及 Linus Torvald 对上述内容的评论:

Jean Tourrilhes 写道: >

对我来说它看起来像是一个编译器错误......

为什么你认为内核使用“-fno-strict-aliasing”?

gcc 人员更感兴趣的是试图找出 c99 规范允许的内容,而不是让事情真正起作用。特别是别名代码甚至不值得启用,只是不可能在某些事情可以别名时理智地告诉 gcc。

一些用户抱怨说,当在没有 -fno-strict-aliasing 的情况下编译以下代码时,write 和 memcpy 的顺序是颠倒的(这意味着将伪造的 len 复制到流中)。

“问题”是我们内联了 memcpy(),此时 gcc 不会关心它可以别名的事实,因此他们只会重新排序所有内容并声称这是自己的错。即使我们没有理智的方式告诉 gcc 这件事。

几年前,我试图保持理智,而 gcc 开发人员真的不关心这个领域的真实世界。从我已经看到的答复来看,如果情况发生了变化,我会感到惊讶。

我不会费心去与之抗争。

莱纳斯

http://www.mail-archive.com/[email protected]/msg01647.html:

http://www.mail-archive.com/[email protected]/msg01647.html

Type-based aliasing is stupid. It's so incredibly stupid that it's not even funny. It's broken. And gcc took the broken notion, and made it more so by making it a "by-the-letter-of-the-law" thing that makes no sense.

...

I know for a factthat gcc would re-order write accesses that were clearly to (statically) the same address. Gcc would suddenly think that

unsigned long a;

a = 5;
*(unsigned short *)&a = 4;

could be re-ordered to set it to 4 first (because clearly they don't alias - by reading the standard), and then because now the assignment of 'a=5' was later, the assignment of 4 could be elided entirely! And if somebody complains that the compiler is insane, the compiler people would say "nyaah, nyaah, the standards people said we can do this", with absolutely no introspection to ask whether it made any SENSE.

基于类型的别名是愚蠢的。它是如此的愚蠢,以至于它甚至都不好笑。它坏了。gcc 接受了这个破碎的概念,并通过使其成为毫无意义的“法律规定”的东西来使其更加如此。

...

我知道一个事实,即GCC将重新以便显然以(静态)相同的地址写访问。gcc 会突然想到

unsigned long a;

a = 5;
*(unsigned short *)&a = 4;

可以重新排序以首先将其设置为 4(因为显然它们没有别名 - 通过阅读标准),然后因为现在 'a=5' 的分配是后来的,4 的分配可以完全省略!如果有人抱怨编译器疯了,编译器的人会说“nyaah,nyaah,标准人们说我们可以做到这一点”,绝对没有反省过它是否有任何意义。

回答by paleozogt

SWIG generates code that depends on strict aliasing being off, which can cause all sorts of problems.

SWIG 生成依赖于关闭严格别名的代码,这可能会导致各种问题

SWIGEXPORT jlong JNICALL Java_com_mylibJNI_make_1mystruct_1_1SWIG_12(
       JNIEnv *jenv, jclass jcls, jint jarg1, jint jarg2) {
  jlong jresult = 0 ;
  int arg1 ;
  int arg2 ;
  my_struct_t *result = 0 ;

  (void)jenv;
  (void)jcls;
  arg1 = (int)jarg1; 
  arg2 = (int)jarg2; 
  result = (my_struct_t *)make_my_struct(arg1,arg2);
  *(my_struct_t **)&jresult = result;              /* <<<< horror*/
  return jresult;
}

回答by Joseph Quinsey

gcc, aliasing, and 2-D variable-length arrays:The following sample code copies a 2x2 matrix:

gcc、别名和二维可变长度数组:以下示例代码复制一个 2x2 矩阵:

#include <stdio.h>

static void copy(int n, int a[][n], int b[][n]) {
   int i, j;
   for (i = 0; i < 2; i++)    // 'n' not used in this example
      for (j = 0; j < 2; j++) // 'n' hard-coded to 2 for simplicity
         b[i][j] = a[i][j];
}

int main(int argc, char *argv[]) {
   int a[2][2] = {{1, 2},{3, 4}};
   int b[2][2];
   copy(2, a, b);    
   printf("%d %d %d %d\n", b[0][0], b[0][1], b[1][0], b[1][1]);
   return 0;
}

With gcc 4.1.2on CentOS, I get:

在 CentOS 上使用 gcc 4.1.2,我得到:

$ gcc -O1 test.c && a.out
1 2 3 4
$ gcc -O2 test.c && a.out
10235717 -1075970308 -1075970456 11452404 (random)

I don't know whether this is generally known, and I don't know whether this a bug or a feature. I can't duplicate the problem with gcc 4.3.4on Cygwin, so it may have been fixed. Some work-arounds:

我不知道这是否广为人知,也不知道这是错误还是功能。 我无法在 Cygwin 上复制 gcc 4.3.4的问题,因此它可能已被修复。一些解决方法:

  • Use __attribute__((noinline))for copy().
  • Use the gcc switch -fno-strict-aliasing.
  • Change the third parameter of copy() from b[][n]to b[][2].
  • Don't use -O2or -O3.
  • 使用__attribute__((noinline))拷贝()。
  • 使用 gcc 开关-fno-strict-aliasing
  • 将 copy() 的第三个参数从 更改b[][n]b[][2]
  • 不要使用-O2-O3

Further notes:

补充说明:

  • This is an answer, after a year and a day, to my own question (and I'm a bit surprised there are only two other answers).
  • I lost several hours with this on my actual code, a Kalman filter. Seemingly small changes would have drastic effects, perhaps because of changing gcc's automatic inlining (this is a guess; I'm still uncertain). But it probably doesn't qualify as a horror story.
  • Yes, I know you wouldn't write copy()like this. (And, as an aside, I was slightly surprised to see gcc did not unroll the double-loop.)
  • No gcc warning switches, include -Wstrict-aliasing=, did anything here.
  • 1-D variable-length arrays seem to be OK.
  • 这是一年零一天后对我自己问题的答案(我有点惊讶只有其他两个答案)。
  • 我在我的实际代码(卡尔曼滤波器)上损失了几个小时。看似很小的变化会产生巨大的影响,也许是因为改变了 gcc 的自动内联(这是一个猜测;我仍然不确定)。但它可能不符合恐怖故事的条件
  • 是的,我知道你不会这样写copy()。(顺便说一句,看到 gcc 没有展开双循环,我有点惊讶。)
  • 没有 gcc 警告开关,包括-Wstrict-aliasing=,在这里做了任何事情。
  • 一维可变长度数组似乎没问题。

Update:The above does not really answer the OP's question, since he (i.e. I) was asking about cases where strict aliasing 'legitimately' broke your code, whereas the above just seems to be a garden-variety compiler bug.

更新以上并没有真正回答 OP 的问题,因为他(即我)正在询问严格别名“合法”破坏您的代码的情况,而以上似乎只是一个花园品种的编译器错误。

I reported it to GCC Bugzilla, but they weren't interested in the old 4.1.2, even though (I believe) it is the key to the $1-billion RHEL5. It doesn't occur in 4.2.4 up.

我向GCC Bugzilla报告了它,但他们对旧的 4.1.2 不感兴趣,尽管(我相信)它是价值 10 亿美元的 RHEL5 的关键。它不会出现在 4.2.4 以上。

And I have a slightly simpler example of a similar bug, with only one matrix. The code:

我有一个类似错误的稍微简单的例子,只有一个矩阵。编码:

static void zero(int n, int a[][n]) {
   int i, j;
   for (i = 0; i < n; i++)
   for (j = 0; j < n; j++)
      a[i][j] = 0;
}

int main(void) {
   int a[2][2] = {{1, 2},{3, 4}};
   zero(2, a);    
   printf("%d\n", a[1][1]);
   return 0;
}

produces the results:

产生结果:

gcc -O1 test.c && a.out
0
gcc -O1 -fstrict-aliasing test.c && a.out
4

It seems it is the combination -fstrict-aliasingwith -finlinewhich causes the bug.

似乎是-fstrict-aliasing-finline导致错误的组合。

回答by don bright

here is mine:

这是我的:

http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html

http://forum.openscad.org/CGAL-3-6-1-causing-errors-but-CGAL-3-6-0-OK-tt2050.html

it caused certain shapes in a CAD program to be drawn incorrectly. thank goodness for the project's leaders work on creating a regression test suite.

它导致 CAD 程序中的某些形状绘制不正确。谢天谢地,项目负责人致力于创建回归测试套件。

the bug only manifested itself on certain platforms, with older versions of GCC and older versions of certain libraries. and then only with -O2 turned on. -fno-strict-aliasing solved it.

该错误仅在某些平台上出现,包括旧版本的 GCC 和旧版本的某些库。然后只打开-O2。-fno-strict-aliasing 解决了它。

回答by supercat

The Common Initial Sequence rule of C used to be interpreted as making it possible to write a function which could work on the leading portion of a wide variety of structure types, provided they start with elements of matching types. Under C99, the rule was changed so that it only applied if the structure types involved were members of the same union whose completedeclaration was visible at the point of use.

C 的通用初始序列规则曾经被解释为可以编写一个函数,该函数可以在各种结构类型的前导部分工作,前提是它们以匹配类型的元素开始。在 C99 下,规则已更改,以便仅当所涉及的结构类型是同一联合的成员时,其完整声明在使用时可见时才适用。

The authors of gcc insist that the language in question is only applicable if the accesses are performed through the union type, notwithstanding the facts that:

gcc 的作者坚持认为,该语言仅适用于通过联合类型执行访问的情况,尽管有以下事实:

  1. There would be no reason to specify that the completedeclaration must be visible if accesses had to be performed through the union type.

  2. Although the CIS rule was described in terms of unions, its primary usefulness lay in what it implied about the way in which structs were laid out and accessed. If S1 and S2 were structures that shared a CIS, there would be no way that a function that accepted a pointer to an S1 and an S2 from an outside source could comply with C89's CIS rules without allowing the same behavior to be useful with pointers to structures that weren't actually inside a union object; specifying CIS support for structures would thus have been redundant given that it was already specified for unions.

  1. 如果必须通过联合类型执行访问,则没有理由指定完整声明必须可见。

  2. 尽管 CIS 规则是根据联合来描述的,但它的主要用途在于它暗示了结构的布局和访问方式。如果 S1 和 S2 是共享 CIS 的结构,那么接受来自外部源的指向 S1 和 S2 的指针的函数不可能遵守 C89 的 CIS 规则,而不允许相同的行为对指向的指针有用实际上不在联合对象内部的结构;鉴于已经为联合指定了 CIS 支持,因此指定对结构的 CIS 支持将是多余的。

回答by user470617

The following code returns 10, under gcc 4.4.4. Is anything wrong with the union method or gcc 4.4.4?

以下代码在 gcc 4.4.4 下返回 10。联合方法或gcc 4.4.4有什么问题吗?

int main()
{
  int v = 10;

  union vv {
    int v;
    short q;
  } *s = (union vv *)&v;

  s->v = 1;

  return v;
}