C++中的结构填充
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5397447/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Struct padding in C++
提问by Baruch
If I have a struct
in C++, is there no way to safely read/write it to a file that is cross-platform/compiler compatible?
如果我struct
在 C++ 中有一个,有没有办法安全地将它读/写到跨平台/编译器兼容的文件中?
Because if I understand correctly, every compiler 'pads' differently based on the target platform.
因为如果我理解正确的话,每个编译器都会根据目标平台“填充”不同的内容。
采纳答案by Nawaz
No. That is not possible. It's because of lack of standardization of C++ at the binary level.
不,那是不可能的。这是因为在二进制级别缺乏 C++ 的标准化。
Don Boxwrites (quoting from his book Essential COM, chapter COM As A Better C++)
Don Box写道(引自他的书Essential COM,章节COM As A Better C++)
C++ and Portability
Once the decision is made to distribute a C++ class as a DLL, one is faced with one of the fundamental weaknesses of C++, that is, lack of standardization at the binary level. Although the ISO/ANSI C++ Draft Working Paper attempts to codify which programs will compile and what the semantic effects of running them will be, it makes no attempt to standardize the binary runtime model of C++. The first time this problem will become evident is when a client tries to link against the FastString DLL's import library from a C++ developement environment other thanthe one used to build the FastString DLL.
C++ 和可移植性
一旦决定将 C++ 类作为 DLL 分发,人们就会面临C++ 的一个基本弱点,即缺乏二进制级别的标准化。尽管 ISO/ANSI C++ 工作文件草案试图编纂哪些程序将被编译以及运行它们的语义效果是什么,但它并没有尝试标准化 C++ 的二进制运行时模型。当客户端尝试从 C++ 开发环境而不是用于构建 FastString DLL 的开发环境中链接 FastString DLL 的导入库时,这个问题第一次变得明显。
Struct padding is done differently by different compilers. Even if you use the same compiler, the packing alignment for structs can be different based on what pragma packyou're using.
结构填充由不同的编译器完成。即使您使用相同的编译器,结构的包装对齐方式也可能因您使用的编译指示包而异。
Not only that if you write two structs whose members are exactlysame, the onlydifference is that the order in which they're declared is different, then the size of each struct can be (and often is) different.
不仅如此,如果您编写两个成员完全相同的结构,唯一的区别是它们声明的顺序不同,那么每个结构的大小可以(并且经常是)不同。
For example, see this,
例如,看到这个,
struct A
{
char c;
char d;
int i;
};
struct B
{
char c;
int i;
char d;
};
int main() {
cout << sizeof(A) << endl;
cout << sizeof(B) << endl;
}
Compile it with gcc-4.3.4
, and you get this output:
用 编译它gcc-4.3.4
,你会得到这个输出:
8
12
That is, sizes are different even though both structs have the same members!
也就是说,即使两个结构具有相同的成员,大小也是不同的!
The bottom line is that the standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannotassume all compilers make the same decision.
底线是标准没有讨论应该如何进行填充,因此编译器可以自由做出任何决定,您不能假设所有编译器都做出相同的决定。
回答by Lindydancer
If you have the opportunity to design the struct yourself, it should be possible. The basic idea is that you should design it so that there would be no need to insert pad bytes into it. the second trick is that you must handle differences in endianess.
如果你有机会自己设计结构,那应该是可能的。基本思想是您应该设计它以便不需要将填充字节插入其中。第二个技巧是你必须处理字节序的差异。
I'll describe how to construct the struct using scalars, but the you should be able to use nested structs, as long as you would apply the same design for each included struct.
我将描述如何使用标量构造结构,但您应该能够使用嵌套结构,只要您对每个包含的结构应用相同的设计。
First, a basic fact in C and C++ is that the alignment of a type can not exceed the size of the type. If it would, then it would not be possible to allocate memory using malloc(N*sizeof(the_type))
.
首先,C 和 C++ 中的一个基本事实是类型的对齐不能超过类型的大小。如果是这样,那么就不可能使用malloc(N*sizeof(the_type))
.
Layout the struct, starting with the largest types.
布局结构,从最大的类型开始。
struct
{
uint64_t alpha;
uint32_t beta;
uint32_t gamma;
uint8_t delta;
Next, pad out the struct manually, so that in the end you will match up the largest type:
接下来,手动填充结构体,以便最终匹配最大的类型:
uint8_t pad8[3]; // Match uint32_t
uint32_t pad32; // Even number of uint32_t
}
Next step is to decide if the struct should be stored in little or big endian format. The best way is to "swap" all the element in situbefore writing or after reading the struct, if the storage format does not match the endianess of the host system.
下一步是决定结构是否应该以小端或大端格式存储。如果存储格式与主机系统的字节序不匹配,最好的方法是在写入之前或读取结构之后原位“交换”所有元素。
回答by Erik
No, there's no safe way. In addition to padding, you have to deal with different byte ordering, and different sizes of builtin types.
不,没有安全的方法。除了填充之外,您还必须处理不同的字节顺序和不同大小的内置类型。
You need to define a file format, and convert your struct to and from that format. Serialization libraries (e.g. boost::serialization, or google's protocolbuffers) can help with this.
您需要定义一种文件格式,并将您的结构与该格式相互转换。序列化库(例如 boost::serialization 或 google 的 protocolbuffers)可以帮助解决这个问题。
回答by John Dibling
Long story short, no. There is no platform-independent, Standard-conformant way to deal with padding.
长话短说,不。没有独立于平台、符合标准的方法来处理填充。
Padding is called "alignment" in the Standard, and it begins discussing it in 3.9/5:
Padding在标准中称为“对齐”,在3.9/5开始讨论:
Object types have alignment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.
对象类型有对齐要求(3.9.1、3.9.2)。完整对象类型的对齐是一个实现定义的整数值,表示字节数;对象被分配在满足其对象类型对齐要求的地址上。
But it goes on from there and winds off to many dark corners of the Standard. Alignment is "implementation-defined" meaning it can be different across different compilers, or even across address models (ie 32-bit/64-bit) under the samecompiler.
但它从那里继续,并蜿蜒到标准的许多黑暗角落。对齐是“实现定义的”,这意味着它在不同的编译器之间可能不同,甚至在同一编译器下的地址模型(即 32 位/64 位)之间也可能不同。
Unless you have truly harsh performance requirements, you might consider storing your data to disc in a different format, like char strings. Many high-performance protocols send everything using strings when the natural format might be something else. For example, a low-latency exchange feed I recently worked on sends dates as strings formatted like this: "20110321" and times are sent similarly: "141055.200". Even though this exchange feed sends 5 million messages per second all day long, they still use strings for everything because that way they can avoid endian-ness and other issues.
除非您有真正苛刻的性能要求,否则您可能会考虑以不同的格式(如字符字符串)将数据存储到磁盘。当自然格式可能是其他格式时,许多高性能协议使用字符串发送所有内容。例如,我最近处理的一个低延迟交换源将日期发送为格式如下的字符串:“20110321”,而时间的发送方式类似:“141055.200”。即使这个交换源一整天每秒发送 500 万条消息,他们仍然使用字符串来处理所有事情,因为这样他们可以避免字节序和其他问题。