你如何在 C++ 中序列化一个对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/523872/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 15:47:00  来源:igfitidea点击:

How do you serialize an object in C++?

c++serializationmarshallingc++-faq

提问by Bill the Lizard

I have a small hierarchy of objects that I need to serialize and transmit via a socket connection. I need to both serialize the object, then deserialize it based on what type it is. Is there an easy way to do this in C++ (as there is in Java)?

我有一个小的对象层次结构,我需要通过套接字连接序列化和传输这些对象。我需要序列化对象,然后根据它的类型反序列化它。在 C++ 中有没有一种简单的方法可以做到这一点(就像在 Java 中一样)?

Are there any C++ serialization online code samples or tutorials?

是否有任何 C++ 序列化在线代码示例或教程?

EDIT:Just to be clear, I'm looking for methods on converting an object into an array of bytes, then back into an object. I can handle the socket transmission.

编辑:为了清楚起见,我正在寻找将对象转换为字节数组,然后再转换回对象的方法。我可以处理套接字传输。

采纳答案by newgre

Talking about serialization, the boost serialization APIcomes to my mind. As for transmitting the serialized data over the net, I'd either use Berkeley sockets or the asio library.

说到序列化,我想到了boost 序列化 API。至于通过网络传输序列化数据,我要么使用 Berkeley 套接字,要么使用asio 库

Edit:
If you want to serialize your objects to a byte array, you can use the boost serializer in the following way (taken from the tutorial site):

编辑:
如果要将对象序列化为字节数组,可以按以下方式使用 boost 序列化器(取自教程站点):

#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
class gps_position
{
private:
    friend class boost::serialization::access;
    template<class Archive>
    void serialize(Archive & ar, const unsigned int version)
    {
        ar & degrees;
        ar & minutes;
        ar & seconds;
    }
    int degrees;
    int minutes;
    float seconds;

public:
    gps_position(){};
    gps_position(int d, int m, float s) :
    degrees(d), minutes(m), seconds(s)
    {}
};

Actual serialization is then pretty easy:

实际的序列化非常简单:

#include <fstream>
std::ofstream ofs("filename.dat", std::ios::binary);

    // create class instance
    const gps_position g(35, 59, 24.567f);

    // save data to archive
    {
        boost::archive::binary_oarchive oa(ofs);
        // write class instance to archive
        oa << g;
        // archive and stream closed when destructors are called
    }

Deserialization works in an analogous manner.

反序列化以类似的方式工作。

There are also mechanisms which let you handle serialization of pointers (complex data structures like tress etc are no problem), derived classes and you can choose between binary and text serialization. Besides all STL containers are supported out of the box.

还有一些机制可以让您处理指针的序列化(复杂的数据结构,如 tress 等没有问题),派生类,您可以在二进制和文本序列化之间进行选择。此外,所有 STL 容器都是开箱即用的。

回答by Mr.Ree

In some cases, when dealing with simple types, you can do:

在某些情况下,在处理简单类型时,您可以这样做:

object o;
socket.write(&o, sizeof(o));

That's ok as a proof-of-concept or first-draft, so other members of your team can keep working on other parts.

这可以作为概念验证或初稿,因此您团队的其他成员可以继续处理其他部分。

But sooner or later, usually sooner, this will get you hurt!

但是迟早,通常很快,这会让你受到伤害!

You run into issues with:

您遇到以下问题:

  • Virtual pointer tables will be corrupted.
  • Pointers (to data/members/functions) will be corrupted.
  • Differences in padding/alignment on different machines.
  • Big/Little-Endian byte ordering issues.
  • Variations in the implementation of float/double.
  • 虚拟指针表将被破坏。
  • 指针(指向数据/成员/函数)将被破坏。
  • 不同机器上填充/对齐的差异。
  • 大/小端字节序问题。
  • float/double 实现的变化。

(Plus you need to know what you are unpacking into on the receiving side.)

(另外,您需要知道在接收方要打开什么包装。)

You can improve upon this by developing your own marshalling/unmarshalling methods for every class. (Ideally virtual, so they can be extended in subclasses.) A few simple macros will let you to write out different basic types quite quickly in a big/little-endian-neutral order.

您可以通过为每个类开发自己的编组/解组方法来改进这一点。(理想情况下是虚拟的,因此它们可以在子类中扩展。)一些简单的宏可以让您以大/小端中性顺序非常快速地写出不同的基本类型。

But that sort of grunt work is much better, and more easily, handled via boost's serialization library.

但是这种繁重的工作要好得多,也更容易通过boost 的序列化库来处理。

回答by Calmarius

There is a generic pattern you can use to serialize objects. The fundemental primitive is these two functions you can read and write from iterators:

有一个通用模式可用于序列化对象。基本原语是您可以从迭代器读取和写入的这两个函数:

template <class OutputCharIterator>
void putByte(char byte, OutputCharIterator &&it)
{
    *it = byte;
    ++it;
}


template <class InputCharIterator>
char getByte(InputCharIterator &&it, InputCharIterator &&end)
{
    if (it == end)
    {
        throw std::runtime_error{"Unexpected end of stream."};
    }

    char byte = *it;
    ++it;
    return byte;
}

Then serialization and deserialization functions follow the pattern:

然后序列化和反序列化函数遵循以下模式:

template <class OutputCharIterator>
void serialize(const YourType &obj, OutputCharIterator &&it)
{
    // Call putbyte or other serialize overloads.
}

template <class InputCharIterator>
void deserialize(YourType &obj, InputCharIterator &&it, InputCharIterator &&end)
{
    // Call getByte or other deserialize overloads.
}

For classes you can use the friend function pattern to allow the overload to be found using ADL:

对于类,您可以使用友元函数模式来允许使用 ADL 找到重载:

class Foo
{
    int internal1, internal2;

    // So it can be found using ADL and it accesses private parts.
    template <class OutputCharIterator>
    friend void serialize(const Foo &obj, OutputCharIterator &&it)
    {
        // Call putByte or other serialize overloads.
    }

    // Deserialize similar.
};

The in your program you can serialize and object into a file like this:

在您的程序中,您可以将对象序列化为一个文件,如下所示:

std::ofstream file("savestate.bin");
serialize(yourObject, std::ostreambuf_iterator<char>(file));

Then read:

然后阅读:

std::ifstream file("savestate.bin");
deserialize(yourObject, std::istreamBuf_iterator<char>(file), std::istreamBuf_iterator<char>());


My old answer here:

我的旧答案在这里:

Serialization means turning your object into binary data. While deserialization means recreating an object from the data.

序列化意味着将您的对象转换为二进制数据。而反序列化意味着从数据中重新创建一个对象。

When serializing you are pushing bytes into an uint8_tvector. When unserializing you are reading bytes from an uint8_tvector.

序列化时,您将字节推入uint8_t向量中。反序列化时,您正在从uint8_t向量中读取字节。

There are certainly patterns you can employ when serializing stuff.

在序列化内容时,您当然可以使用一些模式。

Each serializable class should have a serialize(std::vector<uint8_t> &binaryData)or similar signatured function that will write its binary representation into the provided vector. Then this function may pass this vector down to it's member's serializing functions so they can write their stuff into it too.

每个可序列化的类都应该有一个serialize(std::vector<uint8_t> &binaryData)或类似的签名函数,将其二进制表示写入提供的向量中。然后这个函数可以把这个向量向下传递给它的成员的序列化函数,这样他们也可以将他们的东西写入其中。

Since the data representation can be different on different architectures. You need to find out a scheme how to represent the data.

由于数据表示在不同架构上可能不同。您需要找出如何表示数据的方案。

Let's start from the basics:

让我们从基础开始:

Serializing integer data

序列化整数数据

Just write the bytes in little endian order. Or use varint representation if size matters.

只需按小端顺序写入字节即可。如果大小很重要,或者使用 varint 表示。

Serialization in little endian order:

以小端顺序序列化:

data.push_back(integer32 & 0xFF);
data.push_back((integer32 >> 8) & 0xFF);
data.push_back((integer32 >> 16) & 0xFF);
data.push_back((integer32 >> 24) & 0xFF);

Deserialization from little endian order:

从小端顺序反序列化:

integer32 = data[0] | (data[1] << 8) | (data[2] << 16) | (data[3] << 24);

Serializing floating point data

序列化浮点数据

As far as I know the IEEE 754 has a monopoly here. I don't know of any mainstream architecture that would use something else for floats. The only thing that can be different is the byte order. Some architectures use little endian, others use big endian byte order. This means you need to be careful which order to you loud up the bytes on the receiving end. Another difference can be handling of the denormal and infinity and NAN values. But as long as you avoid these values you should be OK.

据我所知,IEEE 754 在这里具有垄断地位。我不知道任何主流架构会使用其他东西作为浮动。唯一可以不同的是字节顺序。一些架构使用小端,其他架构使用大端字节顺序。这意味着您需要注意在接收端放大字节的顺序。另一个区别可能是对非正规和无穷大以及 NAN 值的处理。但只要你避免这些值,你应该没问题。

Serialization:

序列化:

uint8_t mem[8];
memcpy(mem, doubleValue, 8);
data.push_back(mem[0]);
data.push_back(mem[1]);
...

Deserialization is doing it backward. Mind the byte order of your architecture!

反序列化是向后做的。注意架构的字节顺序!

Serializing strings

序列化字符串

First you need to agree on an encoding. UTF-8 is common. Then store it as a length prefixed manner: first you store the length of the string using a method I mentioned above, then write the string byte-by-byte.

首先,您需要就编码达成一致。UTF-8 很常见。然后将其存储为长度前缀方式:首先使用我上面提到的方法存储字符串的长度,然后逐字节写入字符串。

Serializing arrays.

序列化数组。

They are the same as a strings. You first serialize an integer representing the size of the array then serialize each object in it.

它们与字符串相同。首先序列化一个表示数组大小的整数,然后序列化其中的每个对象。

Serializing whole objects

序列化整个对象

As I said before they should have a serializemethod that add content to a vector. To unserialize an object, it should have a constructor that takes byte stream. It can be an istreambut in the simplest case it can be just a reference uint8_tpointer. The constructor reads the bytes it wants from the stream and sets up the fields in the object. If the system is well designed and serialize the fields in object field order, you can just pass the stream to the field's constructors in an initializer list and have them deserialized in the right order.

正如我之前所说,他们应该有一种serialize向矢量添加内容的方法。要反序列化一个对象,它应该有一个接受字节流的构造函数。它可以是一个istream但在最简单的情况下它可以只是一个引用uint8_t指针。构造函数从流中读取它想要的字节并在对象中设置字段。如果系统设计良好并按对象字段顺序序列化字段,则只需将流传递给初始化列表中字段的构造函数,并以正确的顺序反序列化它们。

Serializing object graphs

序列化对象图

First you need to make sure if these objects are really something you want to serialize. You don't need to serialize them if instances of these objects present on the destination.

首先,您需要确定这些对象是否真的是您想要序列化的对象。如果目标上存在这些对象的实例,则不需要序列化它们。

Now you found out you need to serialize that object pointed by a pointer. The problem of pointers that they are valid only the in the program that uses them. You cannot serialize pointer, you should stop using them in objects. Instead create object pools. This object pool is basically a dynamic array which contains "boxes". These boxes have a reference count. Non-zero reference count indicates a live object, zero indicates an empty slot. Then you create smart pointer akin to the shared_ptr that doesn't store the pointer to the object, but the index in the array. You also need to agree on an index that denotes the null pointer, eg. -1.

现在你发现你需要序列化一个指针指向的对象。指针的问题只有在使用它们的程序中才有效。你不能序列化指针,你应该停止在对象中使用它们。而是创建对象池。这个对象池基本上是一个包含“盒子”的动态数组。这些框有一个引用计数。非零引用计数表示一个活动对象,零表示一个空槽。然后创建类似于 shared_ptr 的智能指针,它不存储指向对象的指针,而是存储数组中的索引。您还需要就表示空指针的索引达成一致,例如。-1.

Basically what we did here is replaced the pointers with array indexes. Now when serializing you can serialize this array index as usual. You don't need to worry about where does the object will be in memory on the destination system. Just make sure they have the same object pool too.

基本上我们在这里所做的是用数组索引替换指针。现在,在序列化时,您可以像往常一样序列化此数组索引。您无需担心对象在目标系统内存中的位置。只需确保它们也具有相同的对象池。

So we need to serialize the object pools. But which ones? Well when you serialize an object graph you are not serializing just an object, you are serializing an entire system. This means the serialization of the system shouldn't start from parts of the system. Those objects shouldn't worry about the rest of the system, they only need to serialize the array indexes and that's it. You should have a system serializer routine that orchestrates the serialization of the system and walks through the relevant object pools and serialize all of them.

所以我们需要序列化对象池。但哪些呢?好吧,当您序列化一个对象图时,您不是在序列化一个对象,而是在序列化整个系统。这意味着系统的序列化不应该从系统的一部分开始。这些对象不应该担心系统的其余部分,它们只需要序列化数组索引就可以了。您应该有一个系统序列化例程来协调系统的序列化并遍历相关的对象池并序列化所有这些对象。

On the receiving end all the arrays an the objects within are deserialized, recreating the desired object graph.

在接收端,所有数组和其中的对象都被反序列化,重新创建所需的对象图。

Serializing function pointers

序列化函数指针

Don't store pointers in the object. Have a static array which contains the pointers to these functions and store the index in the object.

不要在对象中存储指针。有一个包含指向这些函数的指针的静态数组,并将索引存储在对象中。

Since both programs have this table compiled into themshelves, using just the index should work.

由于两个程序都将此表编译到它们的架子中,因此仅使用索引应该可以工作。

Serializing polymorphic types

序列化多态类型

Since I said you should avoid pointers in serializable types and you should use array indexes instead, polymorphism just cannot work, because it requires pointers.

既然我说你应该避免在可序列化类型中使用指针,而应该使用数组索引,多态就不能工作,因为它需要指针。

You need to work this around with type tags and unions.

您需要使用类型标签和联合来解决这个问题。

Versioning

版本控制

On top of all the above. You might want different versions of the software interoperate.

在以上所有之上。您可能希望不同版本的软件互操作。

In this case each object should write a version number at the beginning of their serialization to indicate version.

在这种情况下,每个对象都应该在其序列化开始时写一个版本号来指示版本。

When loading up the object at the other side the, newer objects maybe able to handle the older representations but the older ones cannot handle the newer so they should throw an exception about this.

在另一侧加载对象时,较新的对象可能能够处理较旧的表示,但较旧的对象无法处理较新的表示,因此他们应该对此抛出异常。

Each time a something changes, you should bump the version number.

每次发生变化时,您都应该增加版本号。



So to wrap this up, serialization can be complex. But fortunately you don't need to serialize everything in your program, most often only the protocol messages are serialized, which are often plain old structs. So you don't need the complex tricks I mentioned above too often.

所以总而言之,序列化可能很复杂。但幸运的是,您不需要序列化程序中的所有内容,通常只序列化协议消息,这些消息通常是普通的旧结构。所以你不需要我上面经常提到的复杂技巧。

回答by Neil McGill

By way of learning I wrote a simple C++11 serializer. I had tried various of the other more heavyweight offerings, but wanted something that I could actually understand when it went wrong or failed to compile with the latest g++ (which happened for me with Cereal; a really nice library but complex and I could not grok the errors the compiler threw up on upgrade.) Anyway, it's header only and handles POD types, containers, maps etc... No versioning and it will only load files from the same arch it was saved in.

通过学习,我编写了一个简单的 C++11 序列化程序。我尝试过其他各种更重量级的产品,但想要一些我能真正理解的东西,当它出错或无法使用最新的 g++ 编译时(这对我来说是使用 Cereal 发生的;一个非常好的库但很复杂,我无法理解)编译器在升级时抛出的错误。)无论如何,它只是头文件并处理 POD 类型、容器、地图等......没有版本控制,它只会从保存它的同一个拱门加载文件。

https://github.com/goblinhack/simple-c-plus-plus-serializer

https://github.com/goblinhack/simple-c-plus-plus-serializer

Example usage:

用法示例:

#include "c_plus_plus_serializer.h"

static void serialize (std::ofstream out)
{
    char a = 42;
    unsigned short b = 65535;
    int c = 123456;
    float d = std::numeric_limits<float>::max();
    double e = std::numeric_limits<double>::max();
    std::string f("hello");

    out << bits(a) << bits(b) << bits(c) << bits(d);
    out << bits(e) << bits(f);
}

static void deserialize (std::ifstream in)
{
    char a;
    unsigned short b;
    int c;
    float d;
    double e;
    std::string f;

    in >> bits(a) >> bits(b) >> bits(c) >> bits(d);
    in >> bits(e) >> bits(f);
}