使用 Java 读取结构化二进制文件的最佳方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/277944/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 12:26:01  来源:igfitidea点击:

Best way to read structured binary files with Java

javafilebinaryfiles

提问by Daniel Rikowski

I have to read a binary file in a legacy format with Java.

我必须使用 Java 读取旧格式的二进制文件。

In a nutshell the file has a header consisting of several integers, bytes and fixed-length char arrays, followed by a list of records which also consist of integers and chars.

简而言之,该文件有一个由几个整数、字节和固定长度的字符数组组成的头,后面跟着一个记录列表,这些记录也由整数和字符组成。

In any other language I would create structs (C/C++) or records (Pascal/Delphi) which are byte-by-byte representations of the header and the record. Then I'd read sizeof(header)bytes into a header variable and do the same for the records.

在任何其他语言中,我会创建structs (C/C++) 或records (Pascal/Delphi),它们是标头和记录的逐字节表示。然后我将sizeof(header)字节读入头变量并对记录执行相同操作。

Something like this: (Delphi)

像这样:(德尔福)

type
  THeader = record
    Version: Integer;
    Type: Byte;
    BeginOfData: Integer;
    ID: array[0..15] of Char;
  end;

...

procedure ReadData(S: TStream);
var
  Header: THeader;
begin
  S.ReadBuffer(Header, SizeOf(THeader));
  ...
end;

What is the best way to do something similar with Java? Do I have to read every single value on its own or is there any other way to do this kind of "block-read"?

用 Java 做类似事情的最佳方法是什么?我是否必须自己读取每个值,还是有其他方法可以进行这种“块读取”?

采纳答案by Powerlord

To my knowledge, Java forces you to read a file as bytes rather than being able to block read. If you were serializing Java objects, it'd be a different story.

据我所知,Java 强制您将文件作为字节读取,而不是能够阻止读取。如果您正在序列化 Java 对象,那就是另一回事了。

The other examples shown use the DataInputStreamclass with a File, but you can also use a shortcut: The RandomAccessFileclass:

显示的其他示例将DataInputStream类与 File 一起使用,但您也可以使用快捷方式:RandomAccessFile类:

RandomAccessFile in = new RandomAccessFile("filename", "r");
int version = in.readInt();
byte type = in.readByte();
int beginOfData = in.readInt();
byte[] tempId;
in.read(tempId, 0, 16);
String id = new String(tempId);

Note that you could turn the responce objects into a class, if that would make it easier.

请注意,您可以将响应对象转换为一个类,如果这样会更容易的话。

回答by Arvind

I guess FileInputStream lets you read in bytes. So, opening the file with FileInputStream and read in the sizeof(header). I am assuming that the header has a fixed format and size. I don't see that mentioned in the initial post, but assuming that is the case as it would get much more complex if the header has optional args and different sizes.

我猜 FileInputStream 可以让您以字节为单位读取。因此,使用 FileInputStream 打开文件并读取 sizeof(header)。我假设标题具有固定的格式和大小。我没有看到最初的帖子中提到的,但假设是这种情况,因为如果标题具有可选的参数和不同的大小,它会变得更加复杂。

Once you have the info, there can be a header class in which you assign the contents of the buffer that you've already read. And then parse the records in a similar fashion.

获得信息后,可以有一个标题类,您可以在其中分配已读取的缓冲区内容。然后以类似的方式解析记录。

回答by Vincent Ramdhanie

You could use the DataInputStream class as follows:

您可以按如下方式使用 DataInputStream 类:

DataInputStream in = new DataInputStream(new BufferedInputStream(
                         new FileInputStream("filename")));
int x = in.readInt();
double y = in.readDouble();

etc.

Once you get these values you can do with them as you please. Look up the java.io.DataInputStream class in the API for more info.

一旦你得到这些值,你就可以随心所欲地处理它们。在 API 中查找 java.io.DataInputStream 类以获取更多信息。

回答by Darron

In the past I used DataInputStream to read data of arbitrary types in a specified order. This will not allow you to easily account for big-endian/little-endian issues.

过去我使用 DataInputStream 以指定的顺序读取任意类型的数据。这将不允许您轻松解决大端/小端问题。

As of 1.4 the java.nio.Buffer family might be the way to go, but it seems that the your code might actually be more complicated. These classes do have support for handling endian issues.

从 1.4 开始, java.nio.Buffer 系列可能是要走的路,但似乎您的代码实际上可能更复杂。这些类确实支持处理字节序问题。

回答by Thomas Jones-Low

A while ago I found this articleon using reflection and parsing to read binary data. In this case, the author is using reflection to read the java binary .class files. But if you are reading the data into a class file, it may be of some help.

不久前我发现了这篇关于使用反射和解析读取二进制数据的文章。在这种情况下,作者使用反射来读取 java 二进制 .class 文件。但是,如果您将数据读入类文件,它可能会有所帮助。

回答by Joe Pineda

I may have misunderstood you, but it seems to me you're creating in-memory structures you hope will be a byte-per-byte accurate representation of what you want to read from hard-disk, then copy the whole stuff onto memory and manipulate thence?

我可能误解了您的意思,但在我看来,您正在创建内存结构,您希望将要从硬盘读取的内容逐字节准确表示,然后将整个内容复制到内存中,然后操纵那里?

If that's indeed the case, you're playing a very dangerous game. At least in C, the standard doesn't enforce things like padding or aligning of members of a struct. Not to mention things like big/small endianness or parity bits... So even if your code happens to run it's very non-portable and risky - you depend on the compiler's creator not changing its mind on future versions.

如果情况确实如此,那么您正在玩一个非常危险的游戏。至少在 C 中,该标准不强制执行诸如填充或对齐结构成员之类的事情。更不用说大/小字节序或奇偶校验位之类的事情了......所以即使你的代码碰巧运行它也是非常不可移植和冒险的 - 你依赖于编译器的创建者不会改变未来版本的想法。

Better to create an automaton to both validate the structure being read (byte per byte) from HD is valid, and filling an in-memory structure if it's indeed OK. You may loose some milliseconds (not so much as it may seem for modern OSes do a lot of disk read caching) though you gain platform and compiler independence. Plus, your code will be easily ported to another language.

最好创建一个自动机来验证从 HD 读取的结构(每字节一个字节)是否有效,并在确实可以的情况下填充内存中的结构。尽管您获得了平台和编译器的独立性,但您可能会丢失一些毫秒(不像现代操作系统似乎执行大量磁盘读取缓存那样多)。此外,您的代码将很容易移植到另一种语言。

Post Edit: In a way I sympathize with you. In the good-ol' days of DOS/Win3.11, I once created a C program to read BMP files. And used exactly the same technique. Everything was nice until I tried to compile it for Windows - oops!! Int was now 32 bits long, rather than 16! When I tried to compile on Linux, discovered gcc had very different rules for bit fields allocation than Microsoft C (6.0!). I had to resort to macro tricks to make it portable...

帖子编辑:在某种程度上,我同情你。在 DOS/Win3.11 的好日子里,我曾经创建了一个 C 程序来读取 BMP 文件。并使用完全相同的技术。一切都很好,直到我尝试为 Windows 编译它 - 哎呀!!Int 现在是 32 位长,而不是 16 位!当我尝试在 Linux 上编译时,发现 gcc 的位域分配规则与 Microsoft C(6.0!)非常不同。我不得不求助于宏技巧以使其便携......

回答by Javamann

Here is a link to read byte using a ByteBuffer (Java NIO)

这是使用 ByteBuffer (Java NIO) 读取字节的链接

http://exampledepot.com/egs/java.nio/ReadChannel.html

http://exampledepot.com/egs/java.nio/ReadChannel.html

回答by John Montgomery

As other people mention DataInputStream and Buffers are probably the low-level API's you are after for dealing with binary data in java.

正如其他人提到的 DataInputStream 和 Buffers 可能是您在 java 中处理二进制数据所追求的低级 API。

However you probably want something like Construct(wiki page has good examples too: http://en.wikipedia.org/wiki/Construct_(python_library), but for Java.

但是,您可能想要像Construct这样的东西(维基页面也有很好的例子:http: //en.wikipedia.org/wiki/Construct_(python_library),但对于 Java。

I don't know of any (Java versions) off hand, but taking that approach (declaratively specifying the struct in code) would probably be the right way to go. With a suitable fluent interfacein Java it would probably be quite similar to a DSL.

我不知道有任何(Java 版本)可用,但采用这种方法(在代码中声明性地指定结构)可能是正确的方法。使用Java 中合适的流畅接口,它可能与 DSL 非常相似。

EDIT: bit of googling reveals this:

编辑:谷歌搜索揭示了这一点:

http://javolution.org/api/javolution/io/Struct.html

http://javolution.org/api/javolution/io/Struct.html

Which might be the kind of thing you are looking for. I have no idea whether it works or is any good, but it looks like a sensible place to start.

这可能是您正在寻找的那种东西。我不知道它是否有效或有什么好处,但它看起来是一个明智的起点。

回答by John Montgomery

I've written up a technique to do this sort of thing in java - similar to the old C-like idiom of reading bit-fields. Note it is just a start but could be expanded upon.

我已经编写了一种在 Java 中执行此类操作的技术 - 类似于读取位字段的旧的类似 C 的习语。请注意,这只是一个开始,但可以扩展。

here

这里

回答by Wilfred Springer

If you would be using Preon, then all you would have to do is this:

如果您要使用Preon,那么您所要做的就是:

public class Header {
    @BoundNumber int version;
    @BoundNumber byte type;
    @BoundNumber int beginOfData;
    @BoundString(size="15") String id;
}

Once you have this, you create Codec using a single line:

一旦你有了这个,你就可以使用一行来创建编解码器:

Codec<Header> codec = Codecs.create(Header.class);

And you use the Codec like this:

你像这样使用编解码器:

Header header = Codecs.decode(codec, file);