Java 对象序列化性能提示

Question

提问by Mario Ortegón

I must serialize a huge tree of objects (7,000) into disk. Originally we kept this tree in a database with Kodo, but it would make thousands upon thousands of Queries to load this tree into memory, and it would take a good part of the local universe available time.

我必须将一棵巨大的对象树 (7,000) 序列化到磁盘中。最初我们使用 Kodo 将这棵树保存在一个数据库中，但是将这棵树加载到内存中会产生成千上万的查询，并且会占用本地宇宙可用时间的很大一部分。

I tried serialization for this and indeed I get a performance improvement. However, I get the feeling that I could improve this by writing my own, custom serialization code. I need to make loading this serialized object as fast as possible.

我为此尝试了序列化，并且确实获得了性能改进。但是，我觉得我可以通过编写自己的自定义序列化代码来改进这一点。我需要尽快加载这个序列化对象。

In my machine, serializing / deserializing these objects takes about 15 seconds. When loading them from the database, it takes around 40 seconds.

在我的机器上，序列化/反序列化这些对象大约需要 15 秒。从数据库加载它们时，大约需要 40 秒。

Any tips on what could I do to improve this performance, taking into consideration that because objects are in a tree, they reference each other?

考虑到因为对象在树中，所以它们相互引用，关于我可以做些什么来提高这种性能的任何提示？

Answer 1

回答by dogbane

Don't forget to use the 'transient' key word for instance variables that don't have to be serialized. This gives you a performance boost because you are no longer reading/writing unnecessary data.

不要忘记对不必序列化的实例变量使用“瞬态”关键字。这会给您带来性能提升，因为您不再读取/写入不必要的数据。

Answer 2

回答by Esko Luontola

One optimization is customizing the class descriptors, so that you store the class descriptors in a different database and in the object stream you only refer to them by ID. This reduces the space needed by the serialized data. See for example how in one project the classes SerialUtiland ClassesTabledo it.

一种优化是自定义类描述符，以便您将类描述符存储在不同的数据库中，并且在对象流中您只能通过 ID 引用它们。这减少了序列化数据所需的空间。例如，参见一个项目中的SerialUtil和ClassesTable类是如何做到的。

Making classes Externalizable instead of Serializable can give some performance benefits. The downside is that it requires lots of manual work.

使类 Externalizable 而不是 Serializable 可以带来一些性能优势。缺点是它需要大量的手动工作。

Then there are other serialization libraries, for example jserial, which can give better performance than Java's default serialization. Also, if the object graph does not include cycles, then it can be serialized a little bit faster, because the serializer does not need to keep track of objects it has seen (see "How does it work?" in jserial's FAQ).

然后还有其他序列化库，例如jserial，它可以提供比 Java 的默认序列化更好的性能。此外，如果对象图不包含循环，那么它可以更快地序列化，因为序列化器不需要跟踪它所看到的对象（请参阅jserial 的常见问题解答中的“它是如何工作的？” ）。

Answer 3

回答by Andrey Vityuk

I would recomend you to implement custom writeObject()and readObject()methods. In this way you will able eleminate writting chidren nodes for each node in a tree. When you use default serialization, each node will be serialized with all it's children.

我建议您实现自定义writeObject()和readObject()方法。通过这种方式，您将能够为树中的每个节点删除写子节点。当您使用默认序列化时，每个节点都将与其所有子节点一起序列化。

For example, writeObject()of a Treeclass should iterate through the all nodes of a tree and only write nodes data (without Nodes itself) with some markers, which identifies tree level.

例如，的writeObject（）的树类应通过一个树和所有节点只写节点的数据（没有节点本身）用一些标记，其识别树级别迭代。

You can look at LinkedList, to see how this methods implemented there. It uses the same approach in order to prevent writting prev and next entries for each single entry.

您可以查看LinkedList，了解这些方法是如何在那里实现的。它使用相同的方法来防止为每个单个条目编写上一个和下一个条目。

Answer 4

回答by Rich

To avoid having to write your own serialization code, give Google Protocol Buffersa try. According to their site:

为避免编写自己的序列化代码，请尝试使用Google Protocol Buffers。根据他们的网站：

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages – Java, C++, or Python

协议缓冲区是谷歌的语言中立、平台中立、可扩展的结构化数据序列化机制——想想 XML，但更小、更快、更简单。您可以一次定义数据的结构方式，然后可以使用特殊生成的源代码轻松地使用各种语言（Java、C++ 或 Python）在各种数据流中写入和读取结构化数据

I've not used it, but have heard a lot of positive things about it. Plus, I have to maintain some custom serialization code, and it can be an absolute nightmare to do (let alone tracking down bugs), so getting someone else to do it for you is always a Good Thing.

我没有使用过它，但听说过很多关于它的积极的事情。另外，我必须维护一些自定义序列化代码，这绝对是一场噩梦（更不用说跟踪错误了），所以让其他人为你做这件事总是一件好事。

Answer 5

回答by Maurice Perry

Have you tried compressing the stream (GZIPOutputStream) ?

您是否尝试过压缩流 (GZIPOutputStream) ？

Answer 6

回答by thr

This is how I would do it, form the top of my head

这就是我要做的，形成我的头顶

Serialization

序列化

Serialize each object individually
Assign each object a unique key
When an object holds a reference to another object, put the unique key for that object in the objects place in the serialization. (I would use an UUID converted to binary)
Save each object into a file/database/storage using the unique key

单独序列化每个对象
为每个对象分配一个唯一的键
当一个对象持有对另一个对象的引用时，将该对象的唯一键放在序列化中的对象位置。（我会使用转换为二进制的 UUID）
使用唯一键将每个对象保存到文件/数据库/存储中

Unserialization

反序列化

Start form an arbitrary object (usually the root i suspect) unserialize it and put it in a map with it's unique key as index and return it
When you step on an object key in the serialization stream, first check if it's already unserializedby looking up it's unique key in the map and if it is just grab it from there, if not put a lazy loading proxy (which repeats these two steps for that object) instead of the real object which has hooks to load the right object when you need it.

从任意对象（通常是我怀疑的根）开始，将其反序列化并将其放入映射中，并将其唯一键作为索引并返回
当您踩到序列化流中的对象键时，首先通过在映射中查找它的唯一键来检查它是否已经被反序列化，如果它只是从那里抓取它，如果没有放置一个延迟加载代理（重复这两个步骤）对于那个对象）而不是真实的对象，它有钩子在你需要的时候加载正确的对象。

Edit, you might need to use two-pass serialization and unserialization if you have circular references in there, it complicates things a bit - but not that much.

编辑，如果您在那里有循环引用，您可能需要使用两遍序列化和反序列化，它会使事情变得有点复杂 - 但不是那么多。

Answer 7

回答by Tom Hawtin - tackline

For performance, I'd suggest not using java.io serialisation at all. Instead get down on to the bytes yourself.

为了性能，我建议根本不要使用 java.io 序列化。而是自己深入了解字节。

If you are going to java.io serialise the tree you might need to make sure your recursion doesn't get too deep, either by flattening (as say TreeSetdoes) or arranging to serialise the deepest nodes first (so you have back references rather than nested readObjectcalls).

如果你打算 java.io 序列化树，你可能需要确保你的递归不会太深，要么通过展平（正如所说的TreeSet那样）或安排首先序列化最深的节点（所以你有反向引用而不是嵌套readObject调用）。

I would be surprised if there wasn't a way in Kodo to read the entire tree in in one (or a few) goes.

如果在 Kodo 中没有办法一次性（或几次）读取整棵树，我会感到惊讶。

Answer 8

回答by cherouvim

Also, have a look at XStream, a library to serialize objects to XML and back again.

另外，看看XStream，这是一个将对象序列化为 XML 并再次返回的库。

Answer 9

回答by Pascal de Kloe

You can use Colferto generate the beans and Java's standard serialization performance will get a 10 - 1000x boost. Unless the size reaches over a GB chances are you'll be well below a second.

您可以使用Colfer生成 bean，Java 的标准序列化性能将得到 10 - 1000 倍的提升。除非大小超过 GB，否则您将远低于一秒。

Java 对象序列化性能提示

提问by Mario Ortegón

回答by dogbane

回答by Esko Luontola

回答by Andrey Vityuk

回答by Rich

回答by Maurice Perry

回答by thr

回答by Tom Hawtin - tackline

回答by cherouvim

回答by Pascal de Kloe

相关推荐

最近更新

标签

Java 对象序列化性能提示

提问by Mario Ortegón

回答by dogbane

回答by Esko Luontola

回答by Andrey Vityuk

回答by Rich

回答by Maurice Perry

回答by thr

回答by Tom Hawtin - tackline

回答by cherouvim

回答by Pascal de Kloe

相关推荐

java 如何在编写 XML 文件时忽略 DTD 验证但保留 Doctype？

java PDF转文本工具还是Java库？

java 如何实现最近使用的缓存

java Apache Derby - 检查数据库是否已创建？

相关推荐

最近更新

标签