C语言 为什么位字节序是位域中的一个问题?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6043483/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 08:41:07  来源:igfitidea点击:

Why bit endianness is an issue in bitfields?

ccross-platformportabilitylow-levelbit-fields

提问by Leonid99

Any portable code that uses bitfields seems to distinguish between little- and big-endian platforms. See the declaration of struct iphdr in linux kernelfor an example of such code. I fail to understand why bit endianness is an issue at all.

任何使用位域的可移植代码似乎都可以区分小端和大端平台。有关此类代码的示例,请参阅linux 内核中 struct iphdr声明。我不明白为什么位字节序是一个问题。

As far as I understand, bitfields are purely compiler constructs, used to facilitate bit level manipulations.

据我所知,位域纯粹是编译器构造,用于促进位级操作。

For instance, consider the following bitfield:

例如,考虑以下位域:

struct ParsedInt {
    unsigned int f1:1;
    unsigned int f2:3;
    unsigned int f3:4;
};
uint8_t i;
struct ParsedInt *d = &i;
在这里,写作d->f2d->f2只是一种简洁易读的说法(i>>1) & (1<<4 - 1)(i>>1) & (1<<4 - 1)

However, bit operations are well-defined and work regardless of the architecture. So, how come bitfields are not portable?

但是,位操作定义明确,无论架构如何都可以工作。那么,为什么位域不可移植呢?

回答by Lundin

By the C standard, the compiler is free to store the bit field pretty much in any random way it wants. You can nevermake any assumptions of where the bits are allocated. Here are just a few bit-field related things that are not specified by the C standard:

根据 C 标准,编译器可以随意以任何它想要的随机方式存储位域。您永远无法对位的分配位置做出任何假设。这里只是一些 C 标准没有规定的与位域相关的东西:

Unspecified behavior

未指明的行为

  • The alignment of the addressable storage unit allocated to hold a bit-field (6.7.2.1).
  • 分配用于保存位字段的可寻址存储单元的对齐方式 (6.7.2.1)。

Implementation-defined behavior

实现定义的行为

  • Whether a bit-field can straddle a storage-unit boundary (6.7.2.1).
  • The order of allocation of bit-fields within a unit (6.7.2.1).
  • 位域是否可以跨越存储单元边界(6.7.2.1)。
  • 单元内位域的分配顺序(6.7.2.1)。

Big/little endian is of course also implementation-defined. This means that your struct could be allocated in the following ways (assuming 16 bit ints):

大/小端当然也是实现定义的。这意味着您的结构可以通过以下方式分配(假设为 16 位整数):

PADDING : 8
f1 : 1
f2 : 3
f3 : 4

or

PADDING : 8
f3 : 4
f2 : 3
f1 : 1

or

f1 : 1
f2 : 3
f3 : 4
PADDING : 8

or

f3 : 4
f2 : 3
f1 : 1
PADDING : 8

Which one applies? Take a guess, or read in-depth backend documentation of your compiler. Add the complexity of 32-bit integers, in big- or little endian, to this. Then add the fact that the compiler is allowed to add any number of padding bytesanywhere inside your bit field, because it is treated as a struct (it can't add padding at the very beginning of the struct, but everywhere else).

哪一种适用?猜一猜,或阅读编译器的深入后端文档。将大端或小端的 32 位整数的复杂性添加到此。然后添加一个事实,即允许编译器在位域内的任何位置添加任意数量的填充字节,因为它被视为一个结构(它不能在结构的最开始添加填充,而是在其他任何地方)。

And then I haven't even mentioned what happens if you use plain "int" as bit-field type = implementation-defined behavior, or if you use any other type than (unsigned) int = implementation-defined behavior.

然后我什至没有提到如果您使用纯“int”作为位域类型 = 实现定义的行为,或者如果您使用除 (unsigned) int = 实现定义的行为之外的任何其他类型,会发生什么。

So to answer the question, there is no such thing as portable bit-field code, because the C standard is extremely vague with how bit fields should be implemented. The only thing bit-fields can be trusted with is to be chunks of boolean values, where the programmer isn't concerned of the location of the bits in memory.

所以要回答这个问题,没有可移植的位域代码这样的东西,因为 C 标准对于位域应该如何实现非常模糊。唯一可以信任的位域是布尔值的块,程序员不关心这些位在内存中的位置。

The only portable solution is to use the bit-wise operators instead of bit fields. The generated machine code will be exactly the same, but deterministic. Bit-wise operators are 100% portable on any C compiler for any system.

唯一可移植的解决方案是使用按位运算符而不是位字段。生成的机器代码将完全相同,但具有确定性。位运算符在任何系统的任何 C 编译器上都是 100% 可移植的。

回答by Michael Burr

As far as I understand, bitfields are purely compiler constructs

据我了解,位域纯粹是编译器构造

And that's part of the problem. If the use of bit-fields was restricted to what the compiler 'owned', then how the compiler packed bits or ordered them would be of pretty much no concern to anyone.

这就是问题的一部分。如果位域的使用仅限于编译器“拥有”的内容,那么编译器如何打包位或对它们进行排序将与任何人无关。

However, bit-fields are probably used far more often to model constructs that are external to the compiler's domain - hardware registers, the 'wire' protocol for communications, or file format layout. These thing have strict requirements of how bits have to be laid out, and using bit-fields to model them means that you have to rely on implementation-defined and - even worse - the unspecified behavior of how the compiler will layout the bit-field.

然而,位域可能更常用于对编译器域外部的构造建模——硬件寄存器、用于通信的“线路”协议或文件格式布局。这些东西对位的布局方式有严格的要求,并且使用位域对它们进行建模意味着您必须依赖实现定义,甚至更糟的是 - 编译器将如何布局位域的未指定行为.

In short, bit-fields are not specified well enough to make them useful for the situations they seem to be most commonly used for.

简而言之,位域没有被很好地指定,无法使它们在它们似乎最常用的情况下有用。

回答by makes

ISO/IEC 9899:6.7.2.1 / 10

ISO/IEC 9899:6.7.2.1 / 10

An implementation may allocate any addressable storage unit large enough to hold a bit-?eld. If enough space remains, a bit-?eld that immediately follows another bit-?eld in a structure shall be packed into adjacent bits of the same unit. If insuf?cient space remains, whether a bit-?eld that does not fit is put into the next unit or overlaps adjacent units is implementation-de?ned. The order of allocation of bit-?elds within a unit (high-order to low-order or low-order to high-order) is implementation-de?ned. The alignment of the addressable storage unit is unspeci?ed.

一个实现可以分配任何足够大的可寻址存储单元来保存一个位域。如果剩余足够的空间,紧跟在结构中的另一个位域之后的位域应被打包到同一单元的相邻位中。如果剩余空间不足,是否将不适合的位域放入下一个单元或与相邻单元重叠是实现定义的。单元内位域的分配顺序(高阶到低阶或低阶到高阶)是实现定义的。可寻址存储单元的对齐方式未指定。

It is safer to use bit shift operations instead of making any assumptions on bit field ordering or alignment when trying to write portable code, regardless of system endianness or bitness.

在尝试编写可移植代码时,使用位移操作而不是对位域排序或对齐进行任何假设更安全,而不管系统字节序或位。

Also see EXP11-C. Do not apply operators expecting one type to data of an incompatible type.

另见EXP11-C。不要将需要一种类型的运算符应用于不兼容类型的数据

回答by Dietrich Epp

Bit field accesses are implemented in terms of operations on the underlying type. In the example, unsigned int. So if you have something like:

位域访问是根据对底层类型的操作来实现的。在示例中,unsigned int. 所以如果你有类似的东西:

struct x {
    unsigned int a : 4;
    unsigned int b : 8;
    unsigned int c : 4;
};

When you access field b, the compiler accesses an entire unsigned intand then shifts and masks the appropriate bit range. (Well, it doesn't have to, but we can pretend that it does.)

当您访问 field 时b,编译器访问整个unsigned int,然后移位并屏蔽适当的位范围。(当然,它不会,但我们可以假装它。)

On big endian, layout will be something like this (most significant bit first):

在大端,布局将是这样的(最重要的位在前):

AAAABBBB BBBBCCCC

On little endian, layout will be like this:

在小端,布局将是这样的:

BBBBAAAA CCCCBBBB

If you want to access the big endian layout from little endian or vice versa, you'll have to do some extra work. This increase in portability has a performance penalty, and since struct layout is already non-portable, language implementors went with the faster version.

如果你想从小端访问大端布局,反之亦然,你必须做一些额外的工作。这种可移植性的增加会降低性能,并且由于结构布局已经不可移植,语言实现者采用了更快的版本。

This makes a lot of assumptions. Also note that sizeof(struct x) == 4on most platforms.

这提出了很多假设。另请注意,sizeof(struct x) == 4在大多数平台上。

回答by Charles Keepax

The bit fields will be stored in a different order depending on the endian-ness of the machine, this may not matter in some cases but in other it may matter. Say for example that your ParsedInt struct represented flags in a packet sent over a network, a little endian machine and big endian machine read those flags in a different order from the transmitted byte which is obviously a problem.

位字段将根据机器的字节顺序以不同的顺序存储,这在某些情况下可能无关紧要,但在其他情况下可能很重要。例如,假设您的 ParsedInt 结构表示通过网络发送的数据包中的标志,小端机器和大端机器以与传输字节不同的顺序读取这些标志,这显然是一个问题。

回答by user2465201

To echo the most salient points: If you are using this on a single compiler/HW platform as a software only construct, then endianness will not be an issue. If you are using code or data across multiple platforms OR need to match hardware bit layouts, then it ISan issue. And a lotof professional software is cross-platform, hence it has to care.

回应最突出的一点:如果您在单个编译器/硬件平台上使用它作为仅软件构造,那么字节序将不是问题。如果您在多个平台上使用代码或数据,或者需要匹配硬件位布局,那么这一个问题。并且很多专业软件是跨平台的,因此它必须关心。

Here's the simplest example: I have code that stores numbers in binary format to disk. If I do not write and read this data to disk myself explicitly byte by byte, then it will not be the same value if read from an opposite endian system.

这是最简单的示例:我有将二进制格式的数字存储到磁盘的代码。如果我不逐字节明确地将这些数据写入和读取到磁盘,那么如果从相反的字节序系统读取它,它将不是相同的值。

Concrete example:

具体例子:

int16_t s = 4096; // a signed 16-bit number...

Let's say my program ships with some data on the disk that I want to read in. Say I want to load it as 4096 in this case...

假设我的程序在磁盘上附带了一些我想读入的数据。假设我想在这种情况下将其加载为 4096...

fread((void*)&s, 2, fp); // reading it from disk as binary...

Here I read it as a 16-bit value, not as explicit bytes. That means if my system matches the endianness stored on disk, I get 4096, and if it doesn't, I get 16 !!!!!

在这里,我将其读为 16 位值,而不是显式字节。这意味着如果我的系统与存储在磁盘上的字节序匹配,我得到 4096,如果不匹配,我得到 16 !!!!!!

So the most common use of endianness is to bulk load binary numbers, and then do a bswap if you don't match. In the past, we'd store data on disk as big endian because Intel was the odd man out and provided high speed instructions to swap the bytes. Nowadays, Intel is so common that often make Little Endian the default and swap when on a big endian system.

所以字节序最常见的用途是批量加载二进制数,如果不匹配则执行 bswap。过去,我们将数据作为大端存储在磁盘上,因为英特尔是个奇怪的人,并提供高速指令来交换字节。如今,英特尔如此普遍,以至于在大端系统上经常将小端设置为默认和交换。

A slower, but endian neutral approach is to do ALL I/O by bytes, i.e.:

一种较慢但端中性的方法是按字节执行所有 I/O,即:

uint_8 ubyte;
int_8 sbyte;
int16_t s; // read s in endian neutral way

// Let's choose little endian as our chosen byte order:

fread((void*)&ubyte, 1, fp); // Only read 1 byte at a time
fread((void*)&sbyte, 1, fp); // Only read 1 byte at a time

// Reconstruct s

s = ubyte | (sByte << 8);

Note that this is identical to the code you'd write to do an endian swap, but you no longer need to check the endianness. And you can use macros to make this less painful.

请注意,这与您为进行字节序交换而编写的代码相同,但您不再需要检查字节序。您可以使用宏来减轻痛苦。

I used the example of stored data used by a program. The other main application mentioned is to write hardware registers, where those registers have an absolute ordering. One VERY COMMON place this comes up is with graphics. Get the endianness wrong and your red and blue color channels get reversed! Again, the issue is one of portability - you could simply adapt to a given hardware platform and graphics card, but if you want your same code to work on different machines, you must test.

我使用了一个程序使用的存储数据的例子。提到的另一个主要应用是写入硬件寄存器,这些寄存器具有绝对顺序。一个非常常见的地方是图形。弄错字节序,你的红色和蓝色通道就会颠倒过来!同样,问题在于可移植性——您可以简单地适应给定的硬件平台和显卡,但是如果您希望相同的代码在不同的机器上运行,则必须进行测试。

Here's a classic test:

这是一个经典的测试:

typedef union { uint_16 s; uint_8 b[2]; } EndianTest_t;

EndianTest_t test = 4096;

if (test.b[0] == 12) printf("Big Endian Detected!\n");

Note that bitfield issues exist as well but are orthogonal to endianness issues.

请注意,位域问题也存在,但与字节顺序问题正交。

回答by user2465201

Just to point out - we've been discussing the issue of byte endianness, not bit endianness or endianness in bitfields, which crosses into the other issue:

只是要指出 - 我们一直在讨论字节字节序的问题,而不是位域中的位字节序或字节序,这涉及到另一个问题:

If you are writing cross platform code, never just write out a struct as a binary object. Besides the endian byte issues described above, there can be all kinds of packing and formatting issues between compilers. The languages provide no restrictions on how a compiler may lay out structs or bitfields in actual memory, so when saving to disk, you must write each data member of a struct one at a time, preferably in a byte neutral way.

如果您正在编写跨平台代码,切勿将结构体写为二进制对象。除了上述字节序问题外,编译器之间还可能存在各种打包和格式问题。这些语言对编译器如何在实际内存中布置结构或位域没有任何限制,因此在保存到磁盘时,您必须一次写入一个结构的每个数据成员,最好以字节中性方式写入。

This packing impacts "bit endianness" in bitfields because different compilers might store the bitfields in a different direction, and the bit endianness impacts how they'd be extracted.

这种打包会影响位域中的“位字节序”,因为不同的编译器可能会以不同的方向存储位域,而位字节序会影响它们的提取方式。

So bear in mind BOTH levels of the problem - the byte endianness impacts a computer's ability to read a single scalar value, e.g., a float, while the compiler (and build arguments) impact a program's ability to read in an aggregate structure.

因此请记住问题的两个级别 - 字节顺序会影响计算机读取单个标量值(例如浮点数)的能力,而编译器(和构建参数)会影响程序读取聚合结构的能力。

What I have done in the past is to save and load a file in a neutral way and store meta-data about the way the data is laid out in memory. This allows me to use the "fast and easy" binary load path where compatible.

我过去所做的是以中立的方式保存和加载文件,并存储有关数据在内存中布局方式的元数据。这允许我在兼容的情况下使用“快速简便”的二进制加载路径。