C++ 如何将大端结构转换为小端结构?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/859535/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 17:37:11  来源:igfitidea点击:

How do I convert a big-endian struct to a little endian-struct?

c++structendianness

提问by scottm

I have a binary file that was created on a unix machine. It's just a bunch of records written one after another. The record is defined something like this:

我有一个在 unix 机器上创建的二进制文件。这只是一堆记录一个接一个地写的。该记录的定义如下:

struct RECORD {
  UINT32 foo;
  UINT32 bar;
  CHAR fooword[11];
  CHAR barword[11];
  UNIT16 baz;
}

I am trying to figure out how I would read and interpret this data on a Windows machine. I have something like this:

我想弄清楚如何在 Windows 机器上读取和解释这些数据。我有这样的事情:

fstream f;
f.open("file.bin", ios::in | ios::binary);

RECORD r;

f.read((char*)&detail, sizeof(RECORD));

cout << "fooword = " << r.fooword << endl;

I get a bunch of data, but it's not the data I expect. I'm suspect that my problem has to do with the endian difference of the machines, so I've come to ask about that.

我得到了一堆数据,但这不是我期望的数据。我怀疑我的问题与机器的字节序差异有关,所以我来询问这个问题。

I understand that multiple bytes will be stored in little-endian on windows and big-endian in a unix environment, and I get that. For two bytes, 0x1234 on windows will be 0x3412 on a unix system.

我知道多个字节将在 windows 上以 little-endian 存储,在 unix 环境中以 big-endian 存储,我明白了。对于两个字节,windows 上的 0x1234 在 unix 系统上将是 0x3412。

Does endianness affect the byte order of the struct as a whole, or of each individual member of the struct? What approaches would I take to convert a struct created on a unix system to one that has the same data on a windows system? Any links that are more in depth than the byte order of a couple bytes would be great, too!

字节序会影响整个结构的字节顺序,还是会影响结构的每个单独成员的字节顺序?我将采取什么方法将在 unix 系统上创建的结构转换为在 Windows 系统上具有相同数据的结构?任何比几个字节的字节顺序更深入的链接也很棒!

采纳答案by James Sutherland

As well as the endian, you need to be aware of padding differences between the two platforms. Particularly if you have odd length char arrays and 16 bit values, you may well find different numbers of pad bytes between some elements.

除了字节序,您还需要注意两个平台之间的填充差异。特别是如果您有奇数长度的 char 数组和 16 位值,您很可能会在某些元素之间发现不同数量的填充字节。

Edit: if the structure was written out with no packing, then it should be fairly straightforward. Something like this (untested) code should do the job:

编辑:如果结构是在没有打包的情况下写出来的,那么它应该相当简单。像这样(未经测试)的代码应该可以完成这项工作:

// Functions to swap the endian of 16 and 32 bit values

inline void SwapEndian(UINT16 &val)
{
    val = (val<<8) | (val>>8);
}

inline void SwapEndian(UINT32 &val)
{
    val = (val<<24) | ((val<<8) & 0x00ff0000) |
          ((val>>8) & 0x0000ff00) | (val>>24);
}

Then, once you've loaded the struct, just swap each element:

然后,一旦您加载了结构,只需交换每个元素:

SwapEndian(r.foo);
SwapEndian(r.bar);
SwapEndian(r.baz);

回答by kdgregory

Actually, endianness is a property of the underlying hardware, not the OS.

实际上,字节序是底层硬件的属性,而不是操作系统的属性。

The best solution is to convert to a standard when writing the data -- Google for "network byte order" and you should find the methods to do this.

最好的解决方案是在写入数据时转换为标准——谷歌搜索“网络字节顺序”,您应该找到执行此操作的方法。

Edit: here's the link: http://www.gnu.org/software/hello/manual/libc/Byte-Order.html

编辑:这是链接:http: //www.gnu.org/software/hello/manual/libc/Byte-Order.html

回答by kdgregory

Don't read directly into struct from a file! The packing might be different, you have to fiddle with pragma pack or similar compiler specific constructs. Too unreliable. A lot of programmers get away with this since their code isn't compiled in wide number of architectures and systems, but that doesn't mean it's OK thing to do!

不要直接从文件中读入结构体!包装可能会有所不同,您必须摆弄 pragma pack 或类似的编译器特定构造。太不靠谱了 由于他们的代码没有在大量体系结构和系统中编译,因此许多程序员逃脱了这一点,但这并不意味着可以这样做!

A good alternative approach is to read the header, whatever, into a buffer and parse from three to avoid the I/O overhead in atomic operations like reading a unsigned 32 bit integer!

一个很好的替代方法是将头读入缓冲区并从三个中解析,以避免原子操作(如读取无符号 32 位整数)中的 I/O 开销!

char buffer[32];
char* temp = buffer;  

f.read(buffer, 32);  

RECORD rec;
rec.foo = parse_uint32(temp); temp += 4;
rec.bar = parse_uint32(temp); temp += 4;
memcpy(&rec.fooword, temp, 11); temp += 11;
memcpy(%red.barword, temp, 11); temp += 11;
rec.baz = parse_uint16(temp); temp += 2;

The declaration of parse_uint32 would look like this:

parse_uint32 的声明如下所示:

uint32 parse_uint32(char* buffer)
{
  uint32 x;
  // ...
  return x;
}

This is a very simple abstraction, it doesn't cost any extra in practise to update the pointer as well:

这是一个非常简单的抽象,实际上更新指针也不需要任何额外的成本:

uint32 parse_uint32(char*& buffer)
{
  uint32 x;
  // ...
  buffer += 4;
  return x;
}

The later form allows cleaner code for parsing the buffer; the pointer is automatically updated when you parse from the input.

后一种形式允许更清晰的代码来解析缓冲区;从输入解析时,指针会自动更新。

Likewise, memcpy could have a helper, something like:

同样, memcpy 可以有一个助手,例如:

void parse_copy(void* dest, char*& buffer, size_t size)
{
  memcpy(dest, buffer, size);
  buffer += size;
}

The beauty of this kind of arrangement is that you can have namespace "little_endian" and "big_endian", then you can do this in your code:

这种安排的美妙之处在于你可以有命名空间“little_endian”和“big_endian”,然后你可以在你的代码中做到这一点:

using little_endian;
// do your parsing for little_endian input stream here..

Easy to switch endianess for the same code, though, rarely needed feature.. file-formats usually have a fixed endianess anyway.

易于切换相同代码的字节序,但是,很少需要的功能。无论如何,文件格式通常具有固定的字节序。

DO NOT abstract this into class with virtual methods; would just add overhead, but feel free to if so inclined:

不要使用虚方法将其抽象为类;只会增加开销,但如果愿意,请随意:

little_endian_reader reader(data, size);
uint32 x = reader.read_uint32();
uint32 y = reader.read_uint32();

The reader object would obviously just be a thin wrapper around pointer. The size parameter would be for error checking, if any. Not really mandatory for the interface per-se.

阅读器对象显然只是一个围绕指针的薄包装。size 参数将用于错误检查(如果有)。接口本身并不是强制性的。

Notice how the choise of endianess here was done at COMPILATION TIME (since we create little_endian_reader object), so we invoke the virtual method overhead for no particularly good reason, so I wouldn't go with this approach. ;-)

注意这里的字节序选择是如何在编译时间完成的(因为我们创建了 little_endian_reader 对象),所以我们没有特别好的理由调用虚方法开销,所以我不会采用这种方法。;-)

At this stage there is no real reason to keep the "fileformat struct" around as-is, you can organize the data to your liking and not necessarily read it into any specific struct at all; after all, it's just data. When you read files like images, you don't really need the header around.. you should have your image container which is same for all file types, so the code to read a specific format should just read the file, interpret and reformat the data & store the payload. =)

在这个阶段,没有真正的理由将“文件格式结构”保持原样,您可以根据自己的喜好组织数据,而不必将其读入任何特定的结构;毕竟,这只是数据。当您读取图像之类的文件时,您实际上并不需要标题。您应该拥有对所有文件类型都相同的图像容器,因此读取特定格式的代码应该只读取文件,解释并重新格式化数据并存储有效负载。=)

I mean, does this look complicated?

我的意思是,这看起来很复杂吗?

uint32 xsize = buffer.read<uint32>();
uint32 ysize = buffer.read<uint32>();
float aspect = buffer.read<float>();    

The code can look that nice, and be a really low-overhead! If the endianess is same for file and architecture the code is compiled for, the innerloop can look like this:

代码看起来不错,而且开销非常低!如果编译代码的文件和架构的字节序相同,则内循环可能如下所示:

uint32 value = *reinterpret_cast<uint32*>)(ptr); ptr += 4;
return value;

That might be illegal on some architectures, so that optimization might be a Bad Idea, and use slower, but more robust approach:

这在某些架构上可能是非法的,因此优化可能是一个坏主意,并使用更慢但更健壮的方法:

uint32 value = ptr[0] | (static_cast<uint32>(ptr[1]) << 8) | ...; ptr += 4;
return value;

On a x86 that can compile into bswap or mov, which is reasonably low-overhead if the method is inlined; the compiler would insert "move" node into the intermediate code, nothing else, which is fairly efficient. If alignment is a problem the full read-shift-or sequence might get generated, outch, but still not too shabby. Compare-branch could allow the optimization, if test the address LSB's and see if can use the fast or slow version of the parsing. But this would mean penalty for the test in every read. Might not be worth the effort.

在可以编译为 bswap 或 mov 的 x86 上,如果该方法是内联的,则开销相当低;编译器会将“移动”节点插入中间代码,没有别的,这是相当有效的。如果对齐是一个问题,完整的读取移位或序列可能会生成,输出,但仍然不会太破旧。比较分支可以允许优化,如果测试地址 LSB 并查看是否可以使用解析的快速或慢速版本。但这将意味着每次阅读时都会受到惩罚。可能不值得付出努力。

Oh, right, we are reading HEADERS and stuff, I don't think that is a bottleneck in too many applications. If some codec is doing some really TIGHT innerloop, again, reading into a temporary buffer and decoding from there is well-adviced. Same principle.. no one reads byte-at-time from file when processing a large volume of data. Well, actually, I seen that kind of code very often and the usual reply to "why you do it" is that the file systems do block reads and that the bytes come from memory anyway, true, but they go through a deep call stack which is high-overhead for getting a few bytes!

哦,对了,我们正在阅读 HEADERS 之类的东西,我不认为这是太多应用程序的瓶颈。如果某些编解码器正在执行一些非常紧凑的内部循环,再次读入临时缓冲区并从那里解码是明智的。相同的原理.. 在处理大量数据时,没有人从文件中一次读取字节。嗯,实际上,我经常看到这种代码,对“为什么要这样做”的通常回答是文件系统会阻塞读取,并且字节无论如何都来自内存,确实如此,但它们会经过一个很深的调用堆栈这是获取几个字节的高开销!

Still, write the parser code once and use zillion times -> epic win.

尽管如此,编写一次解析器代码并使用无数次 -> 史诗般的胜利。

Reading directly into struct from a file: DON'T DO IT FOLKS!

从文件直接读入结构体:不要这样做!

回答by Mehrdad Afshari

It affects each member independently, not the whole struct. Also, it does not affect things like arrays. For instance, it just makes bytes in an ints stored in reverse order.

它独立地影响每个成员,而不是整体struct。此外,它不会影响数组之类的东西。例如,它只是让ints 中的字节以相反的顺序存储。

PS. That said, there could be a machine with weird endianness. What I just said applies to most used machines (x86, ARM, PowerPC, SPARC).

附注。也就是说,可能有一台具有奇怪字节序的机器。我刚才所说的适用于大多数使用的机器(x86、ARM、PowerPC、SPARC)。

回答by Jem

You have to correct the endianess of each member of more than one byte, individually. Strings do not need to be converted (fooword and barword), as they can be seen as sequences of bytes.

您必须单独更正多于一个字节的每个成员的字节序。字符串不需要转换(fooword 和 barword),因为它们可以被视为字节序列。

However, you must take care of another problem: aligmenent of the members in your struct. Basically, you must check if sizeof(RECORD) is the same on both unix and windows code. Compilers usually provide pragmas to define the aligment you want (for example, #pragma pack).

但是,您必须处理另一个问题:结构中成员的对齐。基本上,您必须检查 unix 和 windows 代码上的 sizeof(RECORD) 是否相同。编译器通常会提供 pragma 来定义您想要的对齐方式(例如,#pragma pack)。

回答by kevin42

I like to implement a SwapBytes method for each data type that needs swapping, like this:

我喜欢为每个需要交换的数据类型实现一个 SwapBytes 方法,如下所示:

inline u_int ByteSwap(u_int in)
{
    u_int out;
    char *indata = (char *)&in;
    char *outdata = (char *)&out;
    outdata[0] = indata[3] ;
    outdata[3] = indata[0] ;

    outdata[1] = indata[2] ;
    outdata[2] = indata[1] ;
    return out;
}

inline u_short ByteSwap(u_short in)
{
    u_short out;
    char *indata = (char *)&in;
    char *outdata = (char *)&out;
    outdata[0] = indata[1] ;
    outdata[1] = indata[0] ;
    return out;
}

Then I add a function to the structure that needs swapping, like this:

然后我在需要交换的结构中添加一个函数,如下所示:

struct RECORD {
  UINT32 foo;
  UINT32 bar;
  CHAR fooword[11];
  CHAR barword[11];
  UNIT16 baz;
  void SwapBytes()
  {
    foo = ByteSwap(foo);
    bar = ByteSwap(bar);
    baz = ByteSwap(baz);
  }
}

Then you can modify your code that reads (or writes) the structure like this:

然后,您可以修改读取(或写入)结构的代码,如下所示:

fstream f;
f.open("file.bin", ios::in | ios::binary);

RECORD r;

f.read((char*)&detail, sizeof(RECORD));
r.SwapBytes();

cout << "fooword = " << r.fooword << endl;

To support different platforms you just need to have a platform specific implementation of each ByteSwap overload.

要支持不同的平台,您只需要对每个 ByteSwap 重载有一个特定于平台的实现。

回答by Martin York

You also have to consider alignment differences between the two compilers. Each compiler is allowed to insert padding between members in a structure the best suits the architecture. So you really need to know:

您还必须考虑两个编译器之间的对齐差异。每个编译器都可以在最适合架构的结构中的成员之间插入填充。所以你真的需要知道:

  • How the UNIX prog writes to the file
  • If it is a binary copy of the object the exact layout of the structure.
  • If it is a binary copy what the endian-ness of the source architecture.
  • UNIX 程序如何写入文件
  • 如果是对象的二进制副本,则结构的确切布局。
  • 如果它是二进制副本,源架构的字节顺序是什么。

This is why most programs (That I have seen (that need to be platform neutral)) serialize the data as a text stream that can be easily read by the standard iostreams.

这就是为什么大多数程序(我所见过的(需要与平台无关))将数据序列化为标准 iostream 可以轻松读取的文本流。

回答by xian

Something like this should work:

这样的事情应该工作:

#include <algorithm>

struct RECORD {
    UINT32 foo;
    UINT32 bar;
    CHAR fooword[11];
    CHAR barword[11];
    UINT16 baz;
}

void ReverseBytes( void *start, int size )
{
    char *beg = start;
    char *end = beg + size;

    std::reverse( beg, end );
}

int main() {
    fstream f;
    f.open( "file.bin", ios::in | ios::binary );

    // for each entry {
    RECORD r;
    f.read( (char *)&r, sizeof( RECORD ) );
    ReverseBytes( r.foo, sizeof( UINT32 ) );
    ReverseBytes( r.bar, sizeof( UINT32 ) );
    ReverseBytes( r.baz, sizeof( UINT16 )
    // }

    return 0;
}