Java 的序列化是如何工作的,何时应该使用它来代替其他一些持久化技术?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/352117/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 13:43:17  来源:igfitidea点击:

How does Java's serialization work and when it should be used instead of some other persistence technique?

javaserialization

提问by Esko

I've been lately trying to learn more and generally test Java's serialization for both work and personal projects and I must say that the more I know about it, the less I like it. This may be caused by misinformation though so that's why I'm asking these two things from you all:

我最近一直在努力学习更多,并且通常测试 Java 的工作和个人项目的序列化,我必须说我对它了解得越多,我就越不喜欢它。这可能是由错误信息引起的,所以这就是为什么我要问你们这两件事:

1:On byte level, how does serialization know how to match serialized values with some class?

1:在字节级别,序列化如何知道如何将序列化值与某个类匹配?

One of my problems right here is that I made a small test with ArrayList containing values "one", "two", "three". After serialization the byte array took 78 bytes which seems awfully lot for such low amount of information(19+3+3+4 bytes). Granted there's bound to be some overhead but this leads to my second question:

我的问题之一是我对包含值“一”、“二”、“三”的 ArrayList 进行了一个小测试。序列化后,字节数组占用了 78 个字节,对于如此低的信息量(19+3+3+4 个字节)来说,这似乎非常多。当然肯定会有一些开销,但这导致了我的第二个问题:

2:Can serialization be considered a good method for persisting objects at all? Now obviously if I'd use some homemade XML format the persistence data would be something like this

2:序列化能算是一个持久化对象的好方法吗?现在很明显,如果我使用一些自制的 XML 格式,持久性数据将是这样的

<object>
    <class="java.util.ArrayList">
    <!-- Object array inside Arraylist is called elementData -->
    <field name="elementData">
        <value>One</value>
        <value>Two</value>
        <value>Three</value>
    </field>
</object>

which, like XML in general, is a bit bloated and takes 138 bytes(without whitespaces, that is). The same in JSON could be

与一般的 XML 一样,它有点臃肿,占用 138 个字节(即没有空格)。JSON 中的相同可能是

{
    "java.util.ArrayList": {
        "elementData": [
            "one",
            "two",
            "three"
        ]
    }
}

which is 75 bytes so already slightly smaller than Java's serialization. With these text-based formats it's of course obvious that there has to be a way to represent your basic data as text, numbers or any combination of both.

这是 75 个字节,因此已经比 Java 的序列化略小。对于这些基于文本的格式,显然必须有一种方法将您的基本数据表示为文本、数字或两者的任意组合。

So to recap, how does serialization work on byte/bit level, when it should be used and when it shouldn't be used and what are real benefits of serialization besides that it comes standard in Java?

所以回顾一下,序列化如何在字节/位级别上工作,何时应该使用它,何时不应该使用它,除了它在 Java 中成为标准之外,序列化的真正好处是什么?

采纳答案by Jon Skeet

I would personally try to avoid Java's "built-in" serialization:

我个人会尽量避免 Java 的“内置”序列化:

  • It's not portable to other platforms
  • It's not hugely efficient
  • It's fragile - getting it to cope with multiple versions of a class is somewhat tricky. Even changing compilers can break serialization unless you're careful.
  • 它不可移植到其他平台
  • 这不是非常有效
  • 它很脆弱——让它处理一个类的多个版本有点棘手。除非您小心,否则即使更改编译器也会破坏序列化。

For details of what the actual bytes mean, see the Java Object Serialization Specification.

有关实际字节含义的详细信息,请参阅Java 对象序列化规范

There are various alternatives, such as:

有多种选择,例如:

(Disclaimer: I work for Google, and I'm doing a port of Protocol Buffers to C# as my 20% project, so clearly I think that's a good bit of technology :)

(免责声明:我为 Google 工作,我正在将 Protocol Buffers 移植到 C# 作为我的 20% 项目,所以很明显我认为这是一项很好的技术:)

Cross-platform formats are almost always more restrictive than platform-specific formats for obvious reasons - Protocol Buffers has a pretty limited set of native types, for example - but the interoperability can be incredibly useful. You also need to consider the impact of versioning, with backward and forward compatibility, etc. The text formats are generally hand-editable, but tend to be less efficient in both space and time.

出于显而易见的原因,跨平台格式几乎总是比特定于平台的格式更具限制性——例如,Protocol Buffers 的本机类型集非常有限——但互操作性非常有用。您还需要考虑版本控制、向后和向前兼容性等的影响。文本格式通常是可手动编辑的,但在空间和时间上往往效率较低。

Basically, you need to look at your requirements carefully.

基本上,您需要仔细查看您的要求。

回答by Yuval Adam

I bumped into this dilemma about a month ago (see the question I asked).

大约一个月前我遇到了这个困境(见我问的问题)。

The main lesson I learned from it is use Java serialization only when necessary and if there's no other option. Like Jon said, it has it's downfalls, while other serialization techniques are much easier, faster and more portable.

我从中学到的主要教训是仅在必要且没有其他选择时才使用 Java 序列化。就像 Jon 说的,它有缺点,而其他序列化技术更容易、更快和更便携。

回答by berlindev

Serializing means that you put your structured data in your classes into a flat order of bytecode to save it.

序列化意味着您将类中的结构化数据按字节码的平面顺序进行保存。

You should generally use other techniques than the buildin java-method, it is just made to work out of the box but if you have some changing contents or changing orders in future in your serialized classes, you get into trouble because you'll cannot load them correctly.

您通常应该使用内置 java 方法以外的其他技术,它只是开箱即用,但是如果将来在序列化类中更改内容或更改顺序,则会遇到麻烦,因为您将无法加载他们正确。

回答by Argelbargel

see the Java Object Serialization Stream Protocolfor a description of the file format an grammar used for serialized objects.

有关用于序列化对象的语法的文件格式的描述,请参阅 Java 对象序列化流协议

Personally I think the built-in serialization is acceptable to persist short-lived data (e.g. store the state of a session object between to http-requests) which is not relevant outside your application.

就我个人而言,我认为内置序列化对于持久化短期数据是可以接受的(例如,将会话对象的状态存储在 http 请求之间),而这些数据在您的应用程序之外是不相关的。

For data that has a longer live-time or should be used outside your application, I'd persist either into a database or at least use a more commonly used format...

对于生存时间较长或应在应用程序之外使用的数据,我要么将其保存到数据库中,要么至少使用更常用的格式...

回答by Michael Borgwardt

The main advantage of serialization is that it is extremely easy to use, relatively fast, and preserves actual Java object meshes.

序列化的主要优点是它非常易于使用、相对较快,并且保留了实际的 Java 对象网格。

But you have to realize that it's not really meant to be used for storing data, but mainly as a way for different JVM instances to communicate over a network using the RMI protocol.

但是您必须意识到它并不是真正用于存储数据,而是主要作为不同 JVM 实例使用 RMI 协议通过网络进行通信的一种方式。

回答by Michael Borgwardt

The advantage of Java Object Serialization (JOS) is that it just works. There are also tools out there that do the same as JOS, but use an XML format instead of a binary format.

Java 对象序列化 (JOS) 的优势在于它可以正常工作。还有一些工具与 JOS 的功能相同,但使用 XML 格式而不是二进制格式。

About the length: JOS writes some class information at the start, instead of as part of each instance - e.g. the full field names are recorded once, and an index into that list of names is used for instances of the class. This makes the output longer if you write only one instance of the class, but is more efficient if you write several (different) instances of it. It's not clear to me if your example actually uses a class, but this is the general reason why JOS is longer than one would expect.

关于长度:JOS 在开始时写入一些类信息,而不是作为每个实例的一部分 - 例如,完整的字段名称被记录一次,并且该名称列表中的索引用于类的实例。如果您只编写类的一个实例,这会使输出更长,但如果您编写它的多个(不同)实例,则效率更高。我不清楚你的例子是否真的使用了一个类,但这就是 JOS 比人们预期的要长的一般原因。

BTW: this is incidental, but I don't think JSON records class names (as you have in your example), and so it might not do what you need.

顺便说一句:这是偶然的,但我认为 JSON 不会记录类名(如您的示例中所示),因此它可能无法满足您的需求。

回答by Tom Hawtin - tackline

The reason why storing a tiny amount of information is serial form is relatively large is that it stores information about the classes of the objects it is serialising. If you store a duplicate of your list, then you'll see that the file hasn't grown by much. Store the same object twice and the difference is tiny.

串行形式存储的少量信息之所以比较大,是因为它存储了有关它正在序列化的对象的类的信息。如果您存储列表的副本,那么您会看到该文件并没有增加太多。存储相同的对象两次,差异很小。

The important pros are: relatively easy to use, quite fast and can evolve (just like XML). However, the data is rather opaque, it is Java-only, tightly couples data to classes and untrusted data can easily cause DoS. You should think about the serialised form, rather than just slapping implements Serializableeverywhere.

重要的优点是:相对易于使用、相当快并且可以发展(就像 XML)。然而,数据是相当不透明的,它是 Java 的,数据与类紧密耦合,不可信的数据很容易导致 DoS。你应该考虑序列化的形式,而不是implements Serializable到处拍打。

回答by Tom Hawtin - tackline

If you don't have too much data, you can save objects into a java.util.Properties object. An example of a key/value pair would be user_1234_firstname = Peter. Using reflection to save and load objects can make things easier.

如果没有太多数据,可以将对象保存到 java.util.Properties 对象中。键/值对的一个例子是 user_1234_firstname = Peter。使用反射来保存和加载对象可以使事情变得更容易。

回答by Premraj

How does Java's built-in serialization works?

Java 的内置序列化是如何工作的?

Whenever we want to serialize an object, we implement java.io.Serializable interface. The interface which does not have any methods to implement, even though we are implementing it to indicate something to compiler or JVM(known as Marker Interface). So if JVM sees a Class is Serializable it perform some pre-processing operation on those classes. The operation is, it adds the following two sample methods.

每当我们想要序列化一个对象时,我们都会实现 java.io.Serializable 接口。没有任何方法要实现的接口,即使我们正在实现它来向编译器或 JVM 指示某些东西(称为标记接口)。因此,如果 JVM 看到一个类是可序列化的,它就会对这些类执行一些预处理操作。操作是,它增加了以下两个示例方法。

private void writeObject(java.io.ObjectOutputStream stream)
            throws IOException {
        stream.writeObject(name); // object property
        stream.writeObject(address); // object property
    }

    private void readObject(java.io.ObjectInputStream stream)
            throws IOException, ClassNotFoundException {
        name = (String) stream.readObject(); // object property
        address = (String) stream.readObject();// object property
    }

When it should be used instead of some other persistence technique?

什么时候应该使用它而不是其他一些持久化技术?

The built in Serializationis useful when sender and receiver both are Java. If you want to avoid the above kind of problems, we use XML or JSON with the help of frameworks.

Serialization当发送方和接收方都是 Java 时,内置函数很有用。如果您想避免上述问题,我们在框架的帮助下使用 XML 或 JSON。