C++中如何实现序列化
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1809670/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to implement serialization in C++
提问by Paul
Whenever I find myself needing to serialize objects in a C++ program, I fall back to this kind of pattern:
每当我发现自己需要在 C++ 程序中序列化对象时,我就会回到这种模式:
class Serializable {
public:
static Serializable *deserialize(istream &is) {
int id;
is >> id;
switch(id) {
case EXAMPLE_ID:
return new ExampleClass(is);
//...
}
}
void serialize(ostream &os) {
os << getClassID();
serializeMe(os);
}
protected:
int getClassID()=0;
void serializeMe(ostream &os)=0;
};
The above works pretty well in practice. However, I've heard that this kind of switching over class IDs is evil and an antipattern; what's the standard, OO-way of handling serialization in C++?
以上在实践中效果很好。但是,我听说这种切换类 ID 是邪恶的,是一种反模式;在 C++ 中处理序列化的标准 OO 方式是什么?
回答by Yacoby
Using something like Boost Serialization, while by no means a standard, is a (for the most part) very well written library that does the grunt work for you.
使用Boost Serialization 之类的东西,虽然绝不是标准,但(在大多数情况下)是一个编写得非常好的库,可以为您完成繁重的工作。
The last time I had to manually parse a predefined record structure with a clear inheritance tree, I ended up using the factory patternwith registrable classes (i.e. Using a map of key to a (template) creator function rather than a lot of switch functions) to try and avoid the issue you were having.
上次我必须手动解析带有清晰继承树的预定义记录结构时,我最终使用了带有可注册类的工厂模式(即使用键映射到(模板)创建者函数而不是很多开关函数)尽量避免您遇到的问题。
EDIT
A basic C++ implementation of a object factory mentioned in the above paragraph.
编辑
上一段中提到的对象工厂的基本 C++ 实现。
/**
* A class for creating objects, with the type of object created based on a key
*
* @param K the key
* @param T the super class that all created classes derive from
*/
template<typename K, typename T>
class Factory {
private:
typedef T *(*CreateObjectFunc)();
/**
* A map keys (K) to functions (CreateObjectFunc)
* When creating a new type, we simply call the function with the required key
*/
std::map<K, CreateObjectFunc> mObjectCreator;
/**
* Pointers to this function are inserted into the map and called when creating objects
*
* @param S the type of class to create
* @return a object with the type of S
*/
template<typename S>
static T* createObject(){
return new S();
}
public:
/**
* Registers a class to that it can be created via createObject()
*
* @param S the class to register, this must ve a subclass of T
* @param id the id to associate with the class. This ID must be unique
*/
template<typename S>
void registerClass(K id){
if (mObjectCreator.find(id) != mObjectCreator.end()){
//your error handling here
}
mObjectCreator.insert( std::make_pair<K,CreateObjectFunc>(id, &createObject<S> ) );
}
/**
* Returns true if a given key exists
*
* @param id the id to check exists
* @return true if the id exists
*/
bool hasClass(K id){
return mObjectCreator.find(id) != mObjectCreator.end();
}
/**
* Creates an object based on an id. It will return null if the key doesn't exist
*
* @param id the id of the object to create
* @return the new object or null if the object id doesn't exist
*/
T* createObject(K id){
//Don't use hasClass here as doing so would involve two lookups
typename std::map<K, CreateObjectFunc>::iterator iter = mObjectCreator.find(id);
if (iter == mObjectCreator.end()){
return NULL;
}
//calls the required createObject() function
return ((*iter).second)();
}
};
回答by Matthieu M.
Serialization is a touchy topic in C++...
序列化是 C++ 中的一个敏感话题……
Quick question:
快速提问:
- Serialization: short-lived structure, one encoder/decoder
- Messaging: longer life, encoders / decoders in multiple languages
- 序列化:短暂的结构,一个编码器/解码器
- 消息传递:更长的寿命,多种语言的编码器/解码器
The 2 are useful, and have their use.
2 是有用的,并且有它们的用途。
Boost.Serializationis the most recommended library for serialization usually, though the odd choice of operator&
which serializes or deserializes depending on the const-ness is really an abuse of operator overloading for me.
Boost.Serialization 通常是最推荐的序列化库,尽管operator&
根据常量选择序列化或反序列化的奇怪选择对我来说实际上是对运算符重载的滥用。
For messaging, I would rather suggest Google Protocol Buffer. They offer a clean syntax for describing the message and generate encoders and decoders for a huge variety of languages. There are also one other advantage when performance matters: it allows lazy deserialization (ie only part of the blob at once) by design.
对于消息传递,我宁愿建议Google Protocol Buffer。它们提供了一种清晰的语法来描述消息,并为各种语言生成编码器和解码器。当性能很重要时,还有另一个优势:它允许通过设计进行延迟反序列化(即一次仅部分 blob)。
Moving on
继续
Now, as for the details of implementation, it really depends on what you wish.
现在,至于实施的细节,这实际上取决于您的意愿。
- You need versioning, even for regular serialization, you'll probably need backward compatibility with the previous version anyway.
- You may, or may not, need a system of
tag
+factory
. It's only necessary for polymorphic class. And you will need onefactory
per inheritance tree (kind
) then... the code can be templatized of course! - Pointers / References are going to bite you in the ass... they reference a position in memory that changes after deserialization. I usually choose a tangent approach: each object of each
kind
is given anid
, unique for itskind
, and so I serialize theid
rather than a pointer. Some framework handles it as long as you don't have circular dependency and serialize the objects pointed to / referenced first.
- 您需要versioning,即使对于常规序列化,您也可能需要与以前的版本向后兼容。
- 您可能需要,也可能不需要
tag
+系统factory
。只有多态类才需要它。并且factory
每个继承树都需要一个(kind
) 然后……当然可以对代码进行模板化! - 指针/引用会让你大吃一惊……它们引用内存中的一个位置,该位置在反序列化后会发生变化。我通常选择切线方法:每个对象的每个对象都
kind
被赋予一个id
,它的唯一性kind
,因此我序列化id
而不是指针。只要您没有循环依赖并首先序列化指向/引用的对象,一些框架就会处理它。
Personally, I tried as much as I can to separate the code of serialization / deserialization from the actual code that runs the class. Especially, I try to isolate it in the source files so that changes on this part of the code does not annihilate the binary compatibility.
就我个人而言,我尽可能地将序列化/反序列化的代码与运行该类的实际代码分开。特别是,我尝试在源文件中隔离它,以便对这部分代码的更改不会破坏二进制兼容性。
On versioning
关于版本控制
I usually try to keep serialization and deserialization of one version close together. It's easier to check that they are truly symmetric. I also try to abstract the versioning handling directly in my serialization framework + a few other things, because DRY should be adhered to :)
我通常尝试将一个版本的序列化和反序列化保持在一起。更容易检查它们是否真正对称。我还尝试在我的序列化框架中直接抽象版本处理+其他一些东西,因为应该坚持 DRY :)
On error-handling
关于错误处理
To ease error-detection, I usually use a pair of 'markers' (special bytes) to separate one object from another. It allows me to immediately throw during deserialization because I can detect a problem of desynchronization of the stream (ie, somewhat ate too much bytes or did not ate sufficiently).
为了简化错误检测,我通常使用一对“标记”(特殊字节)来将一个对象与另一个对象分开。它允许我在反序列化期间立即抛出,因为我可以检测到流的不同步问题(即,有点吃太多字节或没有吃足够)。
If you want permissive deserialization, ie deserializing the rest of the stream even if something failed before, you'll have to move toward byte-count: each object is preceded by its byte-count and can only eat so much byte (and is expected to eat them all). This approach is nice because it allows for partial deserialization: ie you can save the part of the stream required for an object and only deserialize it if necessary.
如果你想要宽松的反序列化,即反序列化流的其余部分,即使之前有些失败,你必须转向字节计数:每个对象前面都有它的字节计数,并且只能吃这么多字节(并且是预期的)把它们都吃掉)。这种方法很好,因为它允许部分反序列化:即,您可以保存对象所需的流的一部分,并且仅在必要时对其进行反序列化。
Tagging (your class IDs) is useful here, not (only) for dispatching, but simply to check that you are actually deserializing the right type of object. It also allows for pretty error messages.
标记(您的类 ID)在这里很有用,不仅(仅)用于调度,而且只是为了检查您实际上是否正在反序列化正确类型的对象。它还允许漂亮的错误消息。
Here are some error messages / exceptions you may wish:
以下是您可能希望的一些错误消息/异常:
No version X for object TYPE: only Y and Z
Stream is corrupted: here are the next few bytes BBBBBBBBBBBBBBBBBBB
TYPE (version X) was not completely deserialized
Trying to deserialize a TYPE1 in TYPE2
No version X for object TYPE: only Y and Z
Stream is corrupted: here are the next few bytes BBBBBBBBBBBBBBBBBBB
TYPE (version X) was not completely deserialized
Trying to deserialize a TYPE1 in TYPE2
Note that as far as I remember both Boost.Serialization
and protobuf
really help for error/version handling.
需要注意的是,据我记得都Boost.Serialization
和protobuf
真正的帮助错误/版本处理。
protobuf
has some perks too, because of its capacity of nesting messages:
protobuf
也有一些好处,因为它具有嵌套消息的能力:
- the byte-count is naturally supported, as well as the versioning
- you can do lazy deserialization (ie, store the message and only deserialize if someone asks for it)
- 自然支持字节计数以及版本控制
- 您可以进行延迟反序列化(即,存储消息并且仅在有人要求时才反序列化)
The counterpart is that it's harder to handle polymorphism because of the fixed format of the message. You have to carefully design them for that.
对应的是,由于消息的固定格式,处理多态性更加困难。你必须为此仔细设计它们。
回答by Viktor Latypov
The answer by Yacoby can be extended further.
Yacoby 的回答可以进一步扩展。
I believe the serialization can be implemented in a way similar to managed languages if one actually implements a reflection system.
我相信序列化可以以类似于托管语言的方式实现,如果一个人真正实现了反射系统。
For years we've been using the automated approach.
多年来,我们一直在使用自动化方法。
I was one of the implementors of the working C++ postprocessor and the Reflection library: LSDC tool and Linderdaum Engine Core (iObject + RTTI + Linker/Loader). See the source at http://www.linderdaum.com
我是 C++ 后处理器和反射库的实现者之一:LSDC 工具和 Linderdaum 引擎核心(iObject + RTTI + 链接器/加载器)。请参阅http://www.linderdaum.com 上的来源
The class factory abstracts the process of class instantiation.
类工厂抽象了类实例化的过程。
To initialize specific members, you might add some intrusive RTTI and autogenerate the load/save procedures for them.
要初始化特定成员,您可能会添加一些侵入性 RTTI 并为它们自动生成加载/保存过程。
Suppose, you have the iObject class at the top of your hierarchy.
假设您在层次结构的顶部有 iObject 类。
// Base class with intrusive RTTI
class iObject
{
public:
iMetaClass* FMetaClass;
};
///The iMetaClass stores the list of properties and provides the Construct() method:
// List of properties
class iMetaClass: public iObject
{
public:
virtual iObject* Construct() const = 0;
/// List of all the properties (excluding the ones from base class)
vector<iProperty*> FProperties;
/// Support the hierarchy
iMetaClass* FSuperClass;
/// Name of the class
string FName;
};
// The NativeMetaClass<T> template implements the Construct() method.
template <class T> class NativeMetaClass: public iMetaClass
{
public:
virtual iObject* Construct() const
{
iObject* Res = new T();
Res->FMetaClass = this;
return Res;
}
};
// mlNode is the representation of the markup language: xml, json or whatever else.
// The hierarchy might have come from the XML file or JSON or some custom script
class mlNode {
public:
string FName;
string FValue;
vector<mlNode*> FChildren;
};
class iProperty: public iObject {
public:
/// Load the property from internal tree representation
virtual void Load( iObject* TheObject, mlNode* Node ) const = 0;
/// Serialize the property to some internal representation
virtual mlNode* Save( iObject* TheObject ) const = 0;
};
/// function to save a single field
typedef mlNode* ( *SaveFunction_t )( iObject* Obj );
/// function to load a single field from mlNode
typedef void ( *LoadFunction_t )( mlNode* Node, iObject* Obj );
// The implementation for a scalar/iObject field
// The array-based property requires somewhat different implementation
// Load/Save functions are autogenerated by some tool.
class clFieldProperty : public iProperty {
public:
clFieldProperty() {}
virtual ~clFieldProperty() {}
/// Load single field of an object
virtual void Load( iObject* TheObject, mlNode* Node ) const {
FLoadFunction(TheObject, Node);
}
/// Save single field of an object
virtual mlNode* Save( iObject* TheObject, mlNode** Result ) const {
return FSaveFunction(TheObject);
}
public:
// these pointers are set in property registration code
LoadFunction_t FLoadFunction;
SaveFunction_t FSaveFunction;
};
// The Loader class stores the list of metaclasses
class Loader: public iObject {
public:
void RegisterMetaclass(iMetaClass* C) { FClasses[C->FName] = C; }
iObject* CreateByName(const string& ClassName) { return FClasses[ClassName]->Construct(); }
/// The implementation is an almost trivial iteration of all the properties
/// in the metaclass and calling the iProperty's Load/Save methods for each field
void LoadFromNode(mlNode* Source, iObject** Result);
/// Create the tree-based representation of the object
mlNode* Save(iObject* Source);
map<string, iMetaClass*> FClasses;
};
When you define the ConcreteClass derived from iObject, you use some extension and the code generator tool to produce the list of save/load procedures and the registration code for.
当您定义从 iObject 派生的 ConcreteClass 时,您使用一些扩展和代码生成器工具来生成保存/加载程序列表和注册代码。
Let us see the code for this sample.
让我们看看这个示例的代码。
Somewhere in the framework we have an empty formal define
在框架的某个地方,我们有一个空的正式定义
#define PROPERTY(...)
/// vec3 is a custom type with implementation omitted for brevity
/// ConcreteClass2 is also omitted
class ConcreteClass: public iObject {
public:
ConcreteClass(): FInt(10), FString("Default") {}
/// Inform the tool about our properties
PROPERTY(Name=Int, Type=int, FieldName=FInt)
/// We can also provide get/set accessors
PROPERTY(Name=Int, Type=vec3, Getter=GetPos, Setter=SetPos)
/// And the other field
PROPERTY(Name=Str, Type=string, FieldName=FString)
/// And the embedded object
PROPERTY(Name=Embedded, Type=ConcreteClass2, FieldName=FEmbedded)
/// public field
int FInt;
/// public field
string FString;
/// public embedded object
ConcreteClass2* FEmbedded;
/// Getter
vec3 GetPos() const { return FPos; }
/// Setter
void SetPos(const vec3& Pos) { FPos = Pos; }
private:
vec3 FPos;
};
The autogenerated registration code would be:
自动生成的注册码将是:
/// Call this to add everything to the linker
void Register_ConcreteClass(Linker* L) {
iMetaClass* C = new NativeMetaClass<ConcreteClass>();
C->FName = "ConcreteClass";
iProperty* P;
P = new FieldProperty();
P->FName = "Int";
P->FLoadFunction = &Load_ConcreteClass_FInt_Field;
P->FSaveFunction = &Save_ConcreteClass_FInt_Field;
C->FProperties.push_back(P);
... same for FString and GetPos/SetPos
C->FSuperClass = L->FClasses["iObject"];
L->RegisterClass(C);
}
// The autogenerated loaders (no error checking for brevity):
void Load_ConcreteClass_FInt_Field(iObject* Dest, mlNode* Val) {
dynamic_cast<ConcereteClass*>Object->FInt = Str2Int(Val->FValue);
}
mlNode* Save_ConcreteClass_FInt_Field(iObject* Dest, mlNode* Val) {
mlNode* Res = new mlNode();
Res->FValue = Int2Str( dynamic_cast<ConcereteClass*>Object->FInt );
return Res;
}
/// similar code for FString and GetPos/SetPos pair with obvious changes
Now, if you have the the JSON-like hierarchical script
现在,如果你有类似 JSON 的分层脚本
Object("ConcreteClass") {
Int 50
Str 10
Pos 1.5 2.2 3.3
Embedded("ConcreteClass2") {
SomeProp Value
}
}
The Linker object would resolve all the classes and properties in Save/Load methods.
Linker 对象将解析 Save/Load 方法中的所有类和属性。
Sorry for the long post, the implementation grows even larger when all the error handling comes in.
很抱歉这篇长文章,当所有错误处理都进来时,实现会变得更大。
回答by Charles Salvia
Serialization is unfortunately never going to be completely painless in C++, at least not for the foreseeable future, simply because C++ lacks the critical language feature that makes easy serialization possible in other languages : reflection. That is, if you create a class Foo
, C++ has no mechanism to inspect the class programatically at runtime to determine what member variables it contains.
不幸的是,序列化在 C++ 中永远不会完全无痛,至少在可预见的未来不会,这仅仅是因为 C++ 缺乏使其他语言中的序列化成为可能的关键语言特性:反射。也就是说,如果您创建一个类Foo
,C++ 没有在运行时以编程方式检查类以确定它包含哪些成员变量的机制。
So therefore, there is no way to create generalized serialization functions. One way or another, you have to implement a special serialization function for each class. Boost.Serialization is no different, it simply provides you with a convenient framework and a nice set of tools which help you do this.
因此,无法创建通用的序列化函数。无论如何,您必须为每个类实现一个特殊的序列化函数。Boost.Serialization 也不例外,它只是为您提供了一个方便的框架和一组很好的工具来帮助您做到这一点。
回答by Chris Cleeland
Perhaps I am not clever, but I think that ultimately the same kind of code that you have written gets written, simply because C++ doesn't have the runtime mechanisms to do anything different. The question is whether it will be written bespoke by a developer, generated via template metaprogramming (which is what I suspect that boost.serialization does), or generated via some external tool like an IDL compiler/code generator.
也许我不聪明,但我认为最终会编写出与您编写的代码相同的代码,这仅仅是因为 C++ 没有运行时机制来执行任何不同的操作。问题是它是由开发人员定制编写,通过模板元编程生成(我怀疑 boost.serialization 就是这样做的),还是通过一些外部工具(如 IDL 编译器/代码生成器)生成。
The question of which of those three mechanisms (and maybe there are other possibilities, too) is something that should be evaluated on a per-project basis.
这三种机制中的哪一种(也许还有其他可能性)的问题应该在每个项目的基础上进行评估。
回答by Bj?rn Pollex
I guess the closest thing to a standard way would be Boost.Serialization. I would like to hear were and in what context you heard that thing about the class IDs. In the case of serialization I can really think of no other way (unless of course, you know the type you expect when deserializing). And also, One size does not fit all.
我想最接近标准方式的是Boost.Serialization。我想听听你在什么情况下听到关于类 ID 的事情。在序列化的情况下,我真的想不出其他方法(当然,除非您知道反序列化时期望的类型)。而且,一种尺寸并不适合所有人。