C语言 取消引用类型双关指针将破坏严格别名规则
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3246228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Dereferencing type-punned pointer will break strict-aliasing rules
提问by Framester
I used the following piece of code to read data from files as part of a larger program.
我使用以下代码从文件中读取数据,作为更大程序的一部分。
double data_read(FILE *stream,int code) {
char data[8];
switch(code) {
case 0x08:
return (unsigned char)fgetc(stream);
case 0x09:
return (signed char)fgetc(stream);
case 0x0b:
data[1] = fgetc(stream);
data[0] = fgetc(stream);
return *(short*)data;
case 0x0c:
for(int i=3;i>=0;i--)
data[i] = fgetc(stream);
return *(int*)data;
case 0x0d:
for(int i=3;i>=0;i--)
data[i] = fgetc(stream);
return *(float*)data;
case 0x0e:
for(int i=7;i>=0;i--)
data[i] = fgetc(stream);
return *(double*)data;
}
die("data read failed");
return 1;
}
Now I am told to use -O2and I get following gcc warning:
warning: dereferencing type-punned pointer will break strict-aliasing rules
现在我被告知使用-O2并且我收到以下 gcc 警告:
warning: dereferencing type-punned pointer will break strict-aliasing rules
Googleing I found two orthogonal answers:
谷歌搜索我发现了两个正交的答案:
vs
对比
In the end I don't want to ignore the warnings. What would you recommend?
最后,我不想忽略警告。你会推荐什么?
[update]I substituted the toy example with the real function.
[更新]我用真实的函数替换了玩具示例。
采纳答案by Martin B
It looks a lot as if you really want to use fread:
看起来你真的很想使用 fread:
int data;
fread(&data, sizeof(data), 1, stream);
That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but notin C++) is to use a union:
也就是说,如果您确实想走读取字符的路线,然后将它们重新解释为 int,那么在 C 中(但不是在 C++ 中)执行此操作的安全方法是使用联合:
union
{
char theChars[4];
int theInt;
} myunion;
for(int i=0; i<4; i++)
myunion.theChars[i] = fgetc(stream);
return myunion.theInt;
I'm not sure why the length of datain your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.
我不确定为什么data原始代码中的长度为 3。我假设您想要 4 个字节;至少我不知道 int 为 3 个字节的任何系统。
Note that both your code and mine are highly non-portable.
请注意,您的代码和我的代码都是高度不可移植的。
Edit: If you want to read ints of various lengths from a file, portably, try something like this:
编辑:如果您想从文件中读取各种长度的整数,请尝试以下操作:
unsigned result=0;
for(int i=0; i<4; i++)
result = (result << 8) | fgetc(stream);
(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)
(注意:在实际程序中,您还需要针对 EOF 测试 fgetc() 的返回值。)
This reads a 4-byte unsigned from the file in little-endian format, regardlessof what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.
无论系统的字节序是什么,这都会以小端格式从文件中读取一个 4 字节的无符号文件。它应该适用于任何无符号至少为 4 个字节的系统。
If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.
如果您想保持字节序中立,请不要使用指针或联合;改用位移位。
回答by Lasse Reinhold
The problem occurs because you access a char-array through a double*:
出现问题是因为您通过以下方式访问字符数组double*:
char data[8];
...
return *(double*)data;
But gcc assumes that your program will never access variables though pointers of different type. This assumption is called strict-aliasing and allows the compiler to make some optimizations:
但是 gcc 假设您的程序永远不会通过不同类型的指针访问变量。这种假设称为严格别名,并允许编译器进行一些优化:
If the compiler knows that your *(double*)can in no way overlap with data[], it's allowed to all sorts of things like reordering your code into:
如果编译器知道您*(double*)不能与 重叠data[],则允许进行各种操作,例如将代码重新排序为:
return *(double*)data;
for(int i=7;i>=0;i--)
data[i] = fgetc(stream);
The loop is most likely optimized away and you end up with just:
循环很可能被优化掉了,你最终只得到:
return *(double*)data;
Which leaves your data[] uninitialized. In this particular case the compiler might be able to see that your pointers overlap, but if you had declared it char* data, it could have given bugs.
这使您的 data[] 未初始化。在这种特殊情况下,编译器可能能够看到您的指针重叠,但如果您声明了它char* data,它可能会产生错误。
But, the strict-aliasing rule says that a char* and void* can point at any type. So you can rewrite it into:
但是,严格别名规则说 char* 和 void* 可以指向任何类型。所以你可以把它改写成:
double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;
Strict aliasing warnings are really important to understand or fix. They cause the kinds of bugs that are impossible to reproduce in-house because they occur only on one particular compiler on one particular operating system on one particular machine and only on full-moon and once a year, etc.
严格的别名警告对于理解或修复非常重要。它们会导致无法在内部重现的错误类型,因为它们仅在特定机器上的特定操作系统上的特定编译器上发生,并且仅在满月和一年一次等时发生。
回答by anon
Using a union is notthe correct thing to do here. Reading from an unwritten member of the union is undefined - i.e. the compiler is free to perform optimisations that will break your code (like optimising away the write).
在这里使用联合并不是正确的做法。从联合的未写成员读取是未定义的 - 即编译器可以自由执行会破坏您的代码的优化(例如优化写入)。
回答by Thatcher Ulrich
This doc summarizes the situation: http://dbp-consulting.com/tutorials/StrictAliasing.html
该文档总结了这种情况:http: //dbp-consulting.com/tutorials/StrictAliasing.html
There are several different solutions there, but the most portable/safe one is to use memcpy(). (The function calls may be optimized out, so it's not as inefficient as it appears.) For example, replace this:
那里有几种不同的解决方案,但最便携/安全的一种是使用 memcpy()。(函数调用可能会被优化掉,所以它并不像看起来那么低效。)例如,替换这个:
return *(short*)data;
With this:
有了这个:
short temp;
memcpy(&temp, data, sizeof(temp));
return temp;
回答by Jens Gustedt
Basically you can read gcc's message as guy you are looking for trouble, don't say I didn't warn ya.
基本上你可以阅读 gcc 的消息作为你正在寻找麻烦的人,不要说我没有警告你。
Casting a three byte character array to an intis one of the worst things I have seen, ever. Normally your inthas at least 4 bytes. So for the fourth (and maybe more if intis wider) you get random data. And then you cast all of this to a double.
将三字节字符数组转换为 anint是我见过的最糟糕的事情之一。通常你int至少有 4 个字节。因此,对于第四个(如果int更宽,可能会更多),您将获得随机数据。然后你将所有这些都转换为double.
Just do none of that. The aliasing problem that gcc warns about is innocent compared to what you are doing.
什么都不做。与您正在做的事情相比,gcc 警告的别名问题是无辜的。
回答by supercat
The authors of the C Standard wanted to let compiler writers generate efficient code in circumstances where it would be theoretically possible but unlikely that a global variable might have its value accessed using a seemingly-unrelated pointer. The idea wasn't to forbid type punning by casting and dereferencing a pointer in a single expression, but rather to say that given something like:
C 标准的作者希望让编译器编写者在理论上可能但不太可能使用看似无关的指针访问全局变量的值的情况下生成高效的代码。这个想法不是通过在单个表达式中强制转换和取消引用指针来禁止类型双关语,而是说给定的内容如下:
int x;
int foo(double *d)
{
x++;
*d=1234;
return x;
}
a compiler would be entitled to assume that the write to *d won't affect x. The authors of the Standard wanted to list situations where a function like the above that received a pointer from an unknown source would have to assume that it might alias a seemingly-unrelated global, without requiring that types perfectly match. Unfortunately, while the rationale strongly suggests that authors of the Standard intended to describe a standard for minimum conformance in cases where a compiler would otherwise have no reason to believe that things might alias, the rule fails to require that compilers recognize aliasing in cases where it is obviousand the authors of gcc have decided that they'd rather generate the smallest program it can while conforming to the poorly-written language of the Standard, than generate code which is actually useful, and instead of recognizing aliasing in cases where it's obvious (while still being able to assume that things that don't look like they'll alias, won't) they'd rather require that programmers use memcpy, thus requiring a compiler to allow for the possibility that pointers of unknown origin might alias just about anything, thus impeding optimization.
编译器有权假设对 *d 的写入不会影响 x。标准的作者想要列出这样的情况,其中,像上面这样从未知来源接收指针的函数必须假设它可能是一个看似无关的全局别名,而不要求类型完全匹配。不幸的是,虽然基本原理强烈建议标准的作者打算在编译器没有理由相信事物可能别名的情况下描述最低一致性标准,但规则未能要求编译器在以下情况下识别别名很明显并且 gcc 的作者已经决定,他们宁愿生成最小的程序,同时符合标准写得不好的语言,而不是生成实际有用的代码,而不是在明显的情况下识别别名(虽然仍然能够假设那些看起来不会别名的东西,不会)他们宁愿要求程序员使用memcpy,因此需要编译器允许未知来源的指针可能别名几乎任何东西的可能性,从而阻碍优化。
回答by Sebastien Mirolo
Apparently the standard allows sizeof(char*) to be different from sizeof(int*) so gcc complains when you try a direct cast. void* is a little special in that everything can be converted back and forth to and from void*. In practice I don't know many architecture/compiler where a pointer is not always the same for all types but gcc is right to emit a warning even if it is annoying.
显然,标准允许 sizeof(char*) 与 sizeof(int*) 不同,因此当您尝试直接转换时,gcc 会抱怨。void* 有点特殊,因为一切都可以在 void* 之间来回转换。在实践中,我不知道很多架构/编译器的指针对于所有类型并不总是相同的,但是 gcc 发出警告是正确的,即使它很烦人。
I think the safe way would be
我认为安全的方法是
int i, *p = &i;
char *q = (char*)&p[0];
or
或者
char *q = (char*)(void*)p;
You can also try this and see what you get:
你也可以试试这个,看看你得到了什么:
char *q = reinterpret_cast<char*>(p);

