C# 结构的快速序列化/反序列化

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9944994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 11:19:37  来源:igfitidea点击:

Fast serialization/deserialization of structs

c#performanceserializationstruct

提问by user256890

I have huge amont of geographic data represented in simple object structure consisting only structs. All of my fields are of value type.

我有大量的地理数据,以仅包含结构的简单对象结构表示。我的所有字段都是值类型。

public struct Child
{
   readonly float X;
   readonly float Y;
   readonly int myField;
}

public struct Parent
{
   readonly int id;
   readonly int field1;
   readonly int field2;
   readonly Child[] children;
}

The data is chunked up nicely to small portions of Parent[]-s. Each array contains a few thousands Parent instances. I have way too much data to keep all in memory, so I need to swap these chunks to disk back and forth. (One file would result approx. 2-300KB).

数据被很好地分块到Parent[]-s 的一小部分。每个数组包含数千个 Parent 实例。我有太多数据无法全部保存在内存中,因此我需要将这些块来回交换到磁盘。(一个文件将导致大约 2-300KB)。

What would be the most efficient way of serializing/deserializing the Parent[]to a byte[]for dumpint to disk and reading back? Concerning speed, I am particularly interested in fast deserialization, write speed is not that critical.

什么是序列化/反序列化的最有效的方式Parent[],以一个byte[]为dumpint磁盘和回读?关于速度,我对快速反序列化特别感兴趣,写入速度不是那么重要。

Would simple BinarySerializergood enough? Or should I hack around with StructLayout(see accepted answer)? I am not sure if that would work with array field of Parent.children.

简单BinarySerializer就够了吗?或者我应该用StructLayout(见接受的答案)来破解?我不确定这是否适用于Parent.children.

UPDATE: Response to comments - Yes, the objects are immutable (code updated) and indeed the childrenfield is not value type. 300KB sounds not much but I have zillions of files like that, so speed does matter.

更新:对评论的回应 - 是的,对象是不可变的(代码已更新),实际上该children字段不是值类型。300KB 听起来并不多,但我有无数这样的文件,所以速度很重要。

采纳答案by usr

BinarySerializer is a very general serializer. It will not perform as well as a custom implementation.

BinarySerializer 是一个非常通用的序列化器。它的性能不如自定义实现。

Fortunately for your, your data consists of structs only. This means that you will be able to fix a structlayout for Child and just bit-copy the children array using unsafe code from a byte[] you have read from disk.

幸运的是,您的数据仅由结构组成。这意味着您将能够修复 Child 的结构布局,只需使用从磁盘读取的 byte[] 中的不安全代码对 children 数组进行位复制。

For the parents it is not that easy because you need to treat the children separately. I recommend you use unsafe code to copy the bit-copyable fields from the byte[] you read and deserialize the children separately.

对于父母来说,这并不容易,因为您需要分别对待孩子。我建议您使用不安全代码从您读取的字节 [] 中复制位可复制字段,并分别反序列化子项。

Did you consider mapping all the children into memory using memory mapped files? You could then re-use the operating systems cache facility and not deal with reading and writing at all.

您是否考虑过使用内存映射文件将所有子项映射到内存中?然后,您可以重新使用操作系统缓存设施,而根本不处理读取和写入。

Zero-copy-deserializing a Child[] looks like this:

零复制反序列化 Child[] 看起来像这样:

byte[] bytes = GetFromDisk();
fixed (byte* bytePtr = bytes) {
 Child* childPtr = (Child*)bytePtr;
 //now treat the childPtr as an array:
 var x123 = childPtr[123].X;

 //if we need a real array that can be passed around, we need to copy:
 var childArray = new Child[GetLengthOfDeserializedData()];
 for (i = [0..length]) {
  childArray[i] = childPtr[i];
 }
}

回答by markmuetz

If you don't fancy going down the write your own serializerroute, you can use the protobuf.netserializer. Here's the output from a small test program:

如果您不喜欢编写自己的序列化程序路线,则可以使用protobuf.net序列化程序。这是一个小型测试程序的输出:

Using 3000 parents, each with 5 children
BinaryFormatter Serialized in: 00:00:00.1250000
Memory stream 486218 B
BinaryFormatter Deserialized in: 00:00:00.1718750

ProfoBuf Serialized in: 00:00:00.1406250
Memory stream 318247 B
ProfoBuf Deserialized in: 00:00:00.0312500
Using 3000 parents, each with 5 children
BinaryFormatter Serialized in: 00:00:00.1250000
Memory stream 486218 B
BinaryFormatter Deserialized in: 00:00:00.1718750

ProfoBuf Serialized in: 00:00:00.1406250
Memory stream 318247 B
ProfoBuf Deserialized in: 00:00:00.0312500

It should be fairly self-explanatory. This was just for one run, but was fairly indicative of the speed up I saw (3-5x).

它应该是不言自明的。这只是一次运行,但相当表明我看到的加速(3-5 倍)。

To make your structs serializable (with protobuf.net), just add the following attributes:

要使您的结构可序列化(使用 protobuf.net),只需添加以下属性:

[ProtoContract]
[Serializable]
public struct Child
{
    [ProtoMember(1)] public float X;
    [ProtoMember(2)] public float Y;
    [ProtoMember(3)] public int myField;
}

[ProtoContract]
[Serializable]
public struct Parent
{
    [ProtoMember(1)] public int id;
    [ProtoMember(2)] public int field1;
    [ProtoMember(3)] public int field2;
    [ProtoMember(4)] public Child[] children;
}

UPDATE:

更新:

Actually, writing a custom serializer is pretty easy, here is a bare-bones implementation:

实际上,编写自定义序列化程序非常简单,这是一个基本实现:

class CustSerializer
{
    public void Serialize(Stream stream, Parent[] parents, int childCount)
    {
        BinaryWriter sw = new BinaryWriter(stream);
        foreach (var parent in parents)
        {
            sw.Write(parent.id);
            sw.Write(parent.field1);
            sw.Write(parent.field2);

            foreach (var child in parent.children)
            {
                sw.Write(child.myField);
                sw.Write(child.X);
                sw.Write(child.Y);
            }
        }
    }

    public Parent[] Deserialize(Stream stream, int parentCount, int childCount)
    {
        BinaryReader br = new BinaryReader(stream);
        Parent[] parents = new Parent[parentCount];

        for (int i = 0; i < parentCount; i++)
        {
            var parent = new Parent();
            parent.id = br.ReadInt32();
            parent.field1 = br.ReadInt32();
            parent.field2 = br.ReadInt32();
            parent.children = new Child[childCount];

            for (int j = 0; j < childCount; j++)
            {
                var child = new Child();
                child.myField = br.ReadInt32();
                child.X = br.ReadSingle();
                child.Y = br.ReadSingle();
                parent.children[j] = child;
            }

            parents[i] = parent;
        }
        return parents;
    }
}

And here is its output when run in a simple speed test:

这是在简单速度测试中运行时的输出:

Custom Serialized in: 00:00:00 
Memory stream 216000 B 
Custom Deserialized in: 00:00:00.0156250
Custom Serialized in: 00:00:00 
Memory stream 216000 B 
Custom Deserialized in: 00:00:00.0156250

Obviously, it's a lot less flexible than the other approaches, but if speed really is that important it's about 2-3x faster than the protobuf method. It produces minimal file sizes as well, so writing to disk should be faster.

显然,它比其他方法灵活得多,但如果速度真的那么重要,它比 protobuf 方法快 2-3 倍。它也产生最小的文件大小,因此写入磁盘应该更快。