将二进制文件读入结构 (C++)

Question

提问by B.K.

So I'm having a bit of an issue of not being able to properly read a binary file into my structure. The structure is this:

所以我遇到了一些问题，无法将二进制文件正确读入我的结构中。结构是这样的：

struct Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};

It is 37 bytes (25 bytes from char array, and 4 bytes per integer). My .dat file is 185 bytes. It's 5 students with 3 integer grades. So each student takes up 37 bytes (37*5=185).

它是 37 个字节（来自 char 数组的 25 个字节，每个整数 4 个字节）。我的 .dat 文件是 185 字节。这是 5 个学生，有 3 个整数等级。所以每个学生占用 37 个字节（37*5=185）。

It looks something like this in plain text format:

它在纯文本格式中看起来像这样：

Bart Simpson          75   65   70
Ralph Wiggum          35   60   44
Lisa Simpson          100  98   91
Martin Prince         99   98   99
Milhouse Van Houten   80   87   79

I'm able to read each of the records individually by using this code:

我可以使用以下代码单独读取每条记录：

Student stud;

fstream file;
file.open("quizzes.dat", ios::in | ios::out | ios::binary);

if (file.fail())
{
    cout << "ERROR: Cannot open the file..." << endl;
    exit(0);
}

file.read(stud.name, sizeof(stud.name));
file.read(reinterpret_cast<char *>(&stud.quiz1), sizeof(stud.quiz1));
file.read(reinterpret_cast<char *>(&stud.quiz2), sizeof(stud.quiz2));
file.read(reinterpret_cast<char *>(&stud.quiz3), sizeof(stud.quiz3));

while(!file.eof())
{
    cout << left 
         << setw(25) << stud.name
         << setw(5)  << stud.quiz1
         << setw(5)  << stud.quiz2
         << setw(5)  << stud.quiz3
         << endl;

    // Reading the next record
    file.read(stud.name, sizeof(stud.name));
    file.read(reinterpret_cast<char *>(&stud.quiz1), sizeof(stud.quiz1));
    file.read(reinterpret_cast<char *>(&stud.quiz2), sizeof(stud.quiz2));
    file.read(reinterpret_cast<char *>(&stud.quiz3), sizeof(stud.quiz3));
}

And I get a nice looking output, but I want to be able to read in one whole structure at a time, not just individual members of each structure at a time. This code is what I believe needed to accomplish the task, but... it doesn't work (I'll show output after it):

我得到了一个漂亮的输出，但我希望能够一次读取整个结构，而不仅仅是一次读取每个结构的单个成员。这段代码是我认为完成任务所需的，但是......它不起作用（我会在它之后显示输出）：

*not including the similar parts as far as opening of the file and structure declaration, etc.

*不包括文件打开和结构声明等类似部分。

file.read(reinterpret_cast<char *>(&stud), sizeof(stud));

while(!file.eof())
{
    cout << left 
         << setw(25) << stud.name
         << setw(5)  << stud.quiz1
         << setw(5)  << stud.quiz2
         << setw(5)  << stud.quiz3
         << endl;

    file.read(reinterpret_cast<char *>(&stud), sizeof(stud));
}

OUTPUT:

输出：

Bart Simpson             16640179201818317312
ph Wiggum                288358417665884161394631027
impson                   129184563217692391371917853806
ince                     175193530917020655191851872800

The only part it doesn't mess up is the first name, after that it's down the hill.. I've tried everything and I've no idea what is wrong. I've even searched through the books I have and I couldn't find anything. Things in there look like what I have and they work, but for some odd reason mine doesn't. I did the file.get(ch) (ch being a char) at byte 25 and it returned K, which is ASCII for 75.. which is the 1st test score, so, everything's where it should be. It's just not reading in my structures properly.

它唯一没有搞砸的部分是名字，然后它就在山下。我已经尝试了一切，但我不知道出了什么问题。我什至搜索了我拥有的书籍，但找不到任何东西。那里的东西看起来像我所拥有的并且它们可以工作，但出于某种奇怪的原因，我的却没有。我在第 25 个字节处执行了 file.get(ch)（ch 是一个字符），它返回了 K，它是 75 的 ASCII 码......这是第一个测试分数，所以，一切都应该在那里。它只是没有正确读取我的结构。

Any help would be greatly appreciated, I'm just stuck with this one.

任何帮助将不胜感激，我只是坚持这个。

EDIT:After receiving such a large amount of unexpected and awesome input from you guys, I've decided to take your advice and stick with reading in one member at a time. I made things cleaner and smaller by using functions. Thank you once again for providing such quick and enlightening input. It's much appreciated.

编辑：在从你们那里收到如此大量意想不到和令人敬畏的意见后，我决定接受你们的建议并坚持一次阅读一个成员。我通过使用函数让事情变得更简洁更小。 再次感谢您提供如此快速而有启发性的意见。非常感谢。

IF you're interestedin a workaround that's not recommended by most, scroll towards the bottom, to the 3rd answer by user1654209. That workaround works flawlessly, but read all the comments to see why it's not favored.

如果您对大多数人不推荐的解决方法感兴趣，请向下滚动到 user1654209 的第三个答案。该解决方法完美无缺，但请阅读所有评论以了解为什么它不受青睐。

Answer 1

回答by JasonD

Your struct has almost certainly been padded to preserve the alignment of its content. This means that it will not be 37 bytes, and that mismatch causes the reading to go out of sync. Looking at the way each string is losing 3 characters, it seems that it has been padded to 40 bytes.

您的结构几乎可以肯定已被填充以保持其内容的对齐。这意味着它不会是 37 个字节，这种不匹配会导致读取不同步。查看每个字符串丢失 3 个字符的方式，它似乎已被填充到 40 个字节。

As the padding is likely to be between the string and the integers, not even the first record reads correctly.

由于填充可能在字符串和整数之间，因此即使是第一条记录也无法正确读取。

In this case I would recommend not attempting to read your data as a binary blob, and stick to reading individual fields. It's far more robust, especially if you even want to alter your structure.

在这种情况下，我建议不要尝试将您的数据作为二进制 blob 读取，而是坚持读取单个字段。它要健壮得多，特别是如果您甚至想改变您的结构。

Answer 2

回答by Some programmer dude

Without seeing the code that writesthe data, I'm guessing that you write the data the way you read it in the first example, each element one by one. Then each record in the filewill indeed be 37 bytes.

在没有看到写入数据的代码的情况下，我猜您是按照第一个示例中读取数据的方式写入数据的，每个元素一个一个。那么文件中的每条记录确实是 37 个字节。

However, since the compiler pads structures to put members on nice boundaries for optimization reasons, your structure is 40 bytes. So when you read the complete structure in a single call, then you actually read 40 bytes at a time, which means that your reading will go out of phase with the actual records in the file.

但是，由于编译器出于优化原因填充结构以将成员置于良好的边界上，因此您的结构为 40 字节。因此，当您在一次调用中读取完整结构时，实际上一次读取了 40 个字节，这意味着您的读取将与文件中的实际记录不同相。

You either have to re-implement the writing to write the complete structure in one go, or use the first method of reading where you're reading one member field at a time.

您要么必须重新实现写入以一次性编写完整的结构，要么使用第一种读取方法，一次读取一个成员字段。

Answer 3

回答by Oualid Jabnoune

A simple workaround is to pack your structure to 1 byte

一个简单的解决方法是将您的结构打包为 1 个字节

using gcc

使用 gcc

struct __attribute__((packed)) Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};

using msvc

使用 msvc

#pragma pack(push, 1) //set padding to 1 byte, saves previous value
struct  Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};
#pragma pack(pop) //restore previous pack value

EDIT : As user ahans states : pragma pack is supported by gcc since version 2.7.2.3 (released in 1997) so it seems safe to use pragma pack as the only packed notation if you are targetting msvc and gcc

编辑：正如用户 ahans 所说：自 2.7.2.3 版（1997 年发布）以来，gcc 支持 pragma pack，因此如果您的目标是 msvc 和 gcc，使用 pragma pack 作为唯一的打包符号似乎是安全的

Answer 4

回答by ahans

As you've already found out, the padding is the issue here. Also, as others have suggested, the proper way of solving this is to read each member individually as you've done in your example. I don't expect this to cost much more than reading the whole thing in once performance-wise. However, if you still want to go ahead and read it as once, you can tell the compiler to do the padding differently:

正如您已经发现的那样，填充是这里的问题。此外，正如其他人所建议的那样，解决此问题的正确方法是像您在示例中所做的那样单独阅读每个成员。我不认为这比一次性阅读整个内容的成本高得多。但是，如果您仍然想继续阅读一次，您可以告诉编译器以不同的方式进行填充：

#pragma pack(push, 1)
struct Student
{
    char name[25];
    int quiz1;
    int quiz2;
    int quiz3;
};
#pragma pack(pop)

With #pragma pack(push, 1)you tell the compiler to save the current pack value on an internal stack and use a pack value of 1 thereafter. This means you get an alignment of 1 byte, which means no padding at all in this case. With #pragma pack(pop)you tell the compiler to get the last value from the stack and use this thereafter, thereby restoring the behavior the compiler used before the definition of your struct.

随着#pragma pack(push, 1)你告诉编译器保存到内部栈当前数据组值，然后使用1包价值。这意味着你得到 1 个字节的对齐，这意味着在这种情况下根本没有填充。随着#pragma pack(pop)你告诉编译器从堆栈中最后一个值，然后利用这一点，从而恢复编译器的定义之前使用的行为struct。

While #pragmausually indicates non-portable, compiler-dependent features, this one works at least with GCC and Microsoft VC++.

虽然#pragma通常表示不可移植的、依赖于编译器的功能，但这个功能至少适用于 GCC 和 Microsoft VC++。

Answer 5

回答by Indinfer

There is more than one way to solve the problem of this thread. Here is a solution based on using union of a struct and a char buf:

有不止一种方法可以解决这个线程的问题。这是一个基于使用结构体和字符 buf 联合的解决方案：

#include <fstream>
#include <sstream>
#include <iomanip>
#include <string>

/*
This is the main idea of the technique: Put the struct
inside a union. And then put a char array that is the
number of chars needed for the array.

union causes sStudent and buf to be at the exact same
place in memory. They overlap each other!
*/
union uStudent
{
    struct sStudent
    {
        char name[25];
        int quiz1;
        int quiz2;
        int quiz3;
    } field;

    char buf[ sizeof(sStudent) ];    // sizeof calcs the number of chars needed
};

void create_data_file(fstream& file, uStudent* oStudent, int idx)
{
    if (idx < 0)
    {
        // index passed beginning of oStudent array. Return to start processing.
        return;
    }

    // have not yet reached idx = -1. Tail recurse
    create_data_file(file, oStudent, idx - 1);

    // write a record
    file.write(oStudent[idx].buf, sizeof(uStudent));

    // return to write another record or to finish
    return;
}


std::string read_in_data_file(std::fstream& file, std::stringstream& strm_buf)
{
    // allocate a buffer of the correct size
    uStudent temp_student;

    // read in to buffer
    file.read( temp_student.buf, sizeof(uStudent) );

    // at end of file?
    if (file.eof())
    {
        // finished
        return strm_buf.str();
    }

    // not at end of file. Stuff buf for display
    strm_buf << std::setw(25) << std::left << temp_student.field.name;
    strm_buf << std::setw(5) << std::right << temp_student.field.quiz1;
    strm_buf << std::setw(5) << std::right << temp_student.field.quiz2;
    strm_buf << std::setw(5) << std::right << temp_student.field.quiz3;
    strm_buf << std::endl;

    // head recurse and see whether at end of file
    return read_in_data_file(file, strm_buf);
}



std::string quiz(void)
{

    /*
    declare and initialize array of uStudent to facilitate
    writing out the data file and then demonstrating
    reading it back in.
    */
    uStudent oStudent[] =
    {
        {"Bart Simpson",          75,   65,   70},
        {"Ralph Wiggum",          35,   60,   44},
        {"Lisa Simpson",         100,   98,   91},
        {"Martin Prince",         99,   98,   99},
        {"Milhouse Van Houten",   80,   87,   79}

    };




    fstream file;

    // ios::trunc causes the file to be created if it does not already exist.
    // ios::trunc also causes the file to be empty if it does already exist.
    file.open("quizzes.dat", ios::in | ios::out | ios::binary | ios::trunc);

    if ( ! file.is_open() )
    {
        ShowMessage( "File did not open" );
        exit(1);
    }


    // create the data file
    int num_elements = sizeof(oStudent) / sizeof(uStudent);
    create_data_file(file, oStudent, num_elements - 1);

    // Don't forget
    file.flush();

    /*
    We wrote actual integers. So, you cannot check the file so
    easily by just using a common text editor such as Windows Notepad.

    You would need an editor that shows hex values or something similar.
    And integrated development invironment (IDE) is likely to have such
    an editor.   Of course, not always so.
    */


    /*
    Now, read the file back in for display. Reading into a string buffer
    for display all at once. Can modify code to display the string buffer
    wherever you want.
    */

    // make sure at beginning of file
    file.seekg(0, ios::beg);

    std::stringstream strm_buf;
    strm_buf.str( read_in_data_file(file, strm_buf) );

    file.close();

    return strm_buf.str();
}

Call quiz() and receive a string formatted for display to std::cout, writing to a file, or whatever.

调用 quiz() 并接收格式化的字符串以显示到 std::cout、写入文件或其他任何内容。

The main idea is that all the items inside a union start at the same address in memory. So you can have a char or wchar_t buf that is the same size as the struct you want to write to or read from a file. And notice that zero casts are needed. There is not one cast in the code.

主要思想是联合中的所有项目都从内存中的相同地址开始。因此，您可以拥有与要写入文件或从文件读取的结构相同大小的 char 或 wchar_t buf。请注意，需要零转换。代码中没有一个演员表。

I also did not have to worry about padding.

我也不必担心填充。

For those who do not like recursion, sorry. Working it out with recursion is easier and less error prone for me. Maybe not easier for others? The recursions can be converted to loops. And they would need to be converted to loops for very large files.

对于那些不喜欢递归的人，抱歉。用递归来解决它对我来说更容易，也更不容易出错。也许对其他人来说并不容易？递归可以转换为循环。对于非常大的文件，它们需要转换为循环。

For those who like recursions, this is yet another instance of using recursion.

对于那些喜欢递归的人来说，这是使用递归的另一个例子。

I don't claim that using union is the best solution or not. Seems that it is a solution. Maybe you like it?

我并不声称使用 union 是最好的解决方案。似乎这是一个解决方案。也许你喜欢它？

将二进制文件读入结构 (C++)

提问by B.K.

回答by JasonD

回答by Some programmer dude

回答by Oualid Jabnoune

回答by ahans

回答by Indinfer

相关推荐

最近更新

标签

将二进制文件读入结构 (C++)

提问by B.K.

回答by JasonD

回答by Some programmer dude

回答by Oualid Jabnoune

回答by ahans

回答by Indinfer

相关推荐

C++ 构造函数名称后面的冒号有什么作用？

C++ 如何告诉 g++ 编译器在哪里搜索包含文件？

限制 C++ 中 Queue<T> 的大小

C++ 如何推回向量的向量？

相关推荐

最近更新

标签