C++ 如何在C++中正确地将向量写入二进制文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14089266/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 17:57:47  来源:igfitidea点击:

how to correctly write vector to binary file in c++?

c++fstream

提问by dchochan

Thanks to Mats Petersson for the explanation on how to to copy vector to array, it's seems work. Here is the code snipet:

感谢 Mats Petersson 对如何将向量复制到数组的解释,它似乎有效。这是代码片段:

#include <iostream>
#include <string.h>
#include <vector>
#include <fstream>

using namespace std;

class Student
  {
    private:
    char m_name[30];
    int m_score;

    public:
    Student()
      {

      }
    Student(const Student& copy)
      {
           m_score = copy.m_score;   //wonder why i can use this statment as
           strncpy(m_name, copy.m_name, 30); //declare it private
      }
      Student(const char name[], const int &score)
      :m_score(score)
      {
           strncpy(m_name, name, 30);
      }
      void print() const
      {
           cout.setf(ios::left);
           cout.width(20);
           cout << m_name << " " << m_score << endl;
      }
      };


      int main()
      {
        vector<Student> student;
        student.push_back(Student("Alex",19));
        student.push_back(Student("Maria",20));
        student.push_back(Student("muhamed",20));
        student.push_back(Student("Jeniffer",20));
        student.push_back(Student("Alex",20));
        student.push_back(Student("Maria",21));
      {
      Student temp[student.size()];
      unsigned int counter;
      for(counter = 0; counter < student.size(); ++counter)
      {
        temp[counter] = student[counter];
      }

      ofstream fout("data.dat", ios::out | ios::binary);
      fout.write((char*) &temp, sizeof(temp));
      fout.close();
      }

      vector<Student> student2;
      ifstream fin("data.dat", ios::in | ios::binary);

      {
        fin.seekg(0, ifstream::end);
        int size = fin.tellg() / sizeof (Student);
        Student temp2[size];
        fin.seekg(0, ifstream::beg);
        fin.read((char*)&temp2, sizeof(temp2));
        int counter;
        for(counter = 0; counter <6; ++counter)
        {
        student2.push_back(temp2[counter]);
        }
        fin.close();
      }
      vector<Student>::iterator itr = student2.begin();
      while(itr != student2.end())
      {
        itr->print();
        ++itr;
      }
      return 0;
      }

But I guest this method will waste huge memory and be cumbersome. Maybe I will consider writing it Mr. with ocelot and other suggestions. Thanks everyone for the answer.

但是我guest这种方法会浪费大量内存,而且很麻烦。也许我会考虑用豹猫和其他建议写它。谢谢大家的回答。

采纳答案by ocelot

You are writing to file the vector structure, not its data buffer. Try change writing procedure to:

您正在写入向量结构文件,而不是其数据缓冲区。尝试将写入程序更改为:

 ofstream fout("data.dat", ios::out | ios::binary);
 fout.write((char*)&student[0], student.size() * sizeof(Student));
 fout.close();

And instead of calculation size of vector from file size, it's better write vector size (number of objects) before. In the case you can write to the same file other data.

而不是从文件大小计算向量大小,最好先写向量大小(对象数)。在这种情况下,您可以将其他数据写入同一文件。

 size_t size = student.size();
 fout.write((char*)&size, sizeof(size));

回答by Anonymous Coward

To store a vector<T>of PODsin a file, you have to write the contents of the vector, not the vector itself. You can access the raw data with &vector[0], address of the first element (given it contains at least one element). To get the raw data length, multiply the number of elements in the vector with the size of one element:

要存储vector<T>在一个文件中,你必须写的载体,没有载体本身的内容。您可以使用&vector[0]第一个元素的地址访问原始数据(假设它至少包含一个元素)。要获得原始数据长度,请将向量中的元素数乘以一个元素的大小:

strm.write(reinterpret_cast<const char*>(&vec[0]), vec.size()*sizeof(T));

The same applies when you read the vector from the file; The element count is the total file size divided by the size of one element (given that you only store one type of POD in the file):

当您从文件中读取矢量时,这同样适用;元素计数是总文件大小除以一个元素的大小(假设您只在文件中存储一种类型的 POD):

const size_t count = filesize / sizeof(T);
std::vector<T> vec(count);
strm.read(reinterpret_cast<char*>(&vec[0]), count*sizeof(T));

This only works if you can calculate the number of elements based on the file size (if you only store one type of POD or if all vectors contain the same number of elements). If you have vectors with different PODs with different lengths, you have to write the number of elements in the vector to the file before writing the raw data.

这仅适用于您可以根据文件大小计算元素数量的情况(如果您只存储一种类型的 POD 或者所有向量都包含相同数量的元素)。如果您有不同长度的不同 POD 的向量,则必须在写入原始数据之前将向量中的元素数量写入文件。

Furthermore, when you transfer numeric types in binary form between different systems, be aware of endianness.

此外,当您在不同系统之间以二进制形式传输数字类型时,请注意字节序

回答by Basile Starynkevitch

You probably cannot write in binary (the way you are doing) any std::vectorbecause that template contains internal pointers, and writing and re-reading them is meaningless.

您可能无法以二进制(您正在做的方式)编写任何内容,std::vector因为该模板包含内部指针,并且编写和重新读取它们毫无意义。

Some general advices:

一些一般性建议:

  • don't write in binary any STL template containers (like std::vectoror std::map), they surely contain internal pointers that you really don't want to write as is. If you really need to write them, implement your own writing and reading routines (e.g. using STL iterators).

  • avoid using strcpywithout care. Your code will crash if the name has more than 30 characters. At least, use strncpy(m_name, name, sizeof(m_name));(but even that would work badly for a 30 characters name). Actually, m_nameshould be a std::string.

  • serialize explicitly your container classes (by handling each meaningful member data). You could consider using JSONnotation (or perhaps YAML, or maybe even XML -which I find too complex so don't recommend) to serialize. It gives you a textual dump format, which you could easily inspect with a standard editor (e.g. emacsor gedit). You'll find a lot of serializing free libraries, e.g. jsoncppand many others.

  • learn to compile with g++ -Wall -gand to use the gdbdebugger and the valgrindmemory leakage detector; also learn to use makeand to write your Makefile-s.

  • take advantage that Linux is free software, so you can look into its source code (and you may want to study stdc++ implementation even if the STL headers are complex).

  • 不要用二进制编写任何 STL 模板容器(如std::vectorstd::map),它们肯定包含您真的不想按原样编写的内部指针。如果您确实需要编写它们,请实现您自己的编写和读取例程(例如使用 STL 迭代器)。

  • 避免随意使用strcpy。如果名称超过 30 个字符,您的代码将崩溃。至少,使用strncpy(m_name, name, sizeof(m_name));(但即使这样对于 30 个字符的名称也会很糟糕)。其实m_name应该是一个std::string

  • 显式序列化您的容器类(通过处理每个有意义的成员数据)。您可以考虑使用JSON表示法(或者可能是YAML,或者甚至是 XML - 我觉得它太复杂了所以不推荐)来序列化。它为您提供文本转储格式,您可以使用标准编辑器(例如emacsgedit)轻松检查。您会发现很多序列化免费库,例如jsoncpp和许多其他库。

  • 学习编译g++ -Wall -g和使用gdb调试器和valgrind内存泄漏检测器;还学习使用make和编写您的Makefile-s。

  • 利用 Linux 是免费软件的优势,因此您可以查看其源代码(即使 STL 头文件很复杂,您也可能想研究 stdc++ 实现)。

回答by Mats Petersson

For functions read() and write(), you need what is called "plain old data" or "POD". That means basically that the class or structure must have no pointers inside them, and no virtual functions. the implementation of vector certainly has pointers - I'm not sure about virtual functions.

对于函数 read() 和 write(),您需要所谓的“普通旧数据”或“POD”。这基本上意味着类或结构内部不能有指针,也不能有虚函数。vector 的实现当然有指针 - 我不确定虚函数。

You will have to write a function that stores a student at a time (or that translates a bunch of students to a array [not a vector] of bytes or some such - but that's more complex).

您将不得不编写一个函数来一次存储一个学生(或者将一群学生转换为字节数组 [不是向量] 或类似的数组 - 但这更复杂)。

The reason you can't write non-POD data, in particular pointers, to a binary file is that when you read the data back again, you can almost certainly bet that the memory layout has changed from when you wrote it. It becomes a little bit like trying to park in the same parking space at the shops - someone else will have parked in the third spot from the entrance when you turn up next time, so you'll have to pick another spot. Think of the memory allocated by the compiler as parking spaces, and the student information as cars.

不能将非 POD 数据(尤其是指针)写入二进制文件的原因是,当您再次读回数据时,几乎可以肯定,内存布局与写入时相比已经发生了变化。这有点像试图将车停在商店的同一个停车位 - 当您下次出现时,其他人将停在入口处的第三个位置,因此您必须选择另一个位置。将编译器分配的内存视为停车位,将学生信息视为汽车。

[Technically, in this case, it's even worse - your vector doesn't actually contain the students inside the class, which is what you are writing to the file, so you haven't even saved the information about the students, just the information about where they are located (the number of the parking spaces)]

[从技术上讲,在这种情况下,情况更糟 - 您的向量实际上不包含班级内的学生,这是您写入文件的内容,因此您甚至没有保存有关学生的信息,只是信息关于它们的位置(停车位数量)]