Java 将大文件逐块读入字节数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39399398/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-03 04:20:23  来源:igfitidea点击:

Java Reading large files into byte array chunk by chunk

java

提问by h0lmesxx

So I've been trying to make a small program that inputs a file into a byte array, then it will turn that byte array into hex, then binary. It will then play with the binary values (I haven't thought of what to do when I get to this stage) and then save it as a custom file.

所以我一直在尝试制作一个小程序,将文件输入到字节数组中,然后将该字节数组转换为十六进制,然后是二进制。然后它会处理二进制值(到了这个阶段我还没想好要做什么),然后将它保存为自定义文件。

I studied a lot of internet code and I can turn a file into a byte array and into hex, but the problem is I can't turn huge files into byte arrays (out of memory).

我研究了很多互联网代码,我可以将文件转换为字节数组和十六进制,但问题是我无法将大文件转换为字节数组(内存不足)。

This is the code that is not a complete failure

这是不是完全失败的代码

public void rundis(Path pp) {
    byte bb[] = null;

    try {
        bb = Files.readAllBytes(pp); //Files.toByteArray(pathhold);
        System.out.println("byte array made");
    } catch (Exception e) {
        e.printStackTrace();
    }
    if (bb.length != 0 || bb != null) {
        System.out.println("byte array filled");
        //send to method to turn into hex
    } else {
        System.out.println("byte array NOT filled");
    }

}

I know how the process should go, but I don't know how to code that properly.

我知道这个过程应该如何进行,但我不知道如何正确编码。

The process if you are interested:

有兴趣的可以看看流程:

  • Input file using File
  • Read the chunk by chunk of the file into a byte array. Ex. each byte array record hold 600 bytes
  • Send that chunk to be turned into a Hex value --> Integer.tohexstring
  • Send that hex value chunk to be made into a binary value --> Integer.toBinarystring
  • Mess around with the Binary value
  • Save to custom file line by line
  • 输入文件使用 File
  • 将文件的一个块一个块读入一个字节数组中。前任。每个字节数组记录保存 600 个字节
  • 发送该块以将其转换为十六进制值 --> Integer.tohexstring
  • 将该十六进制值块发送为二进制值 --> Integer.toBinarystring
  • 弄乱二进制值
  • 逐行保存到自定义文件

Problem:: I don't know how to turn a huge file into a byte array chunk by chunk to be processed. Any and all help will be appreciated, thank you for reading :)

问题:: 我不知道如何将一个大文件逐个块地转换为一个字节数组以进行处理。任何和所有帮助将不胜感激,感谢您阅读:)

回答by Lynx 242

To chunk your input use a FileInputStream:

要分块输入,请使用 FileInputStream:

    Path pp = FileSystems.getDefault().getPath("logs", "access.log");
    final int BUFFER_SIZE = 1024*1024; //this is actually bytes

    FileInputStream fis = new FileInputStream(pp.toFile());
    byte[] buffer = new byte[BUFFER_SIZE]; 
    int read = 0;
    while( ( read = fis.read( buffer ) ) > 0 ){
        // call your other methodes here...
    }

    fis.close();

回答by tkausl

To stream a file, you need to step away from Files.readAllBytes(). It's a nice utility for small files, but as you noticed not so much for large files.

要流式传输文件,您需要远离Files.readAllBytes(). 对于小文件来说,这是一个不错的实用程序,但正如您所注意到的,对于大文件而言,它不是很多。

In pseudocode it would look something like this:

在伪代码中,它看起来像这样:

while there are more bytes available
    read some bytes
    process those bytes
    (write the result back to a file, if needed)

In Java, you can use a FileInputStreamto read a file byte by byteor chunk by chunk. Lets say we want to write back our processed bytes. First we open the files:

在 Java 中,您可以使用 a逐字节逐块FileInputStream读取文件。假设我们想写回我们处理过的字节。首先我们打开文件:

FileInputStream is = new FileInputStream(new File("input.txt"));
FileOutputStream os = new FileOutputStream(new File("output.txt"));

We need the FileOutputStreamto write back our results - we don't want to just drop our precious processed data, right? Next we need a buffer which holds a chunk of bytes:

我们需要FileOutputStream写回我们的结果 - 我们不想只是丢弃我们宝贵的处理数据,对吧?接下来我们需要一个缓冲区来保存一大块字节:

byte[] buf = new byte[4096];

How many bytes is up to you, I kinda like chunks of 4096 bytes. Then we need to actually read some bytes

多少字节取决于你,我有点喜欢 4096 字节的块。然后我们需要实际读取一些字节

int read = is.read(buf);

this will read up to buf.lengthbytes and store them in buf. It will return the total bytes read. Then we process the bytes:

这将读取最多buf.length字节并将它们存储在buf. 它将返回读取的总字节数。然后我们处理字节:

//Assuming the processing function looks like this:
//byte[] process(byte[] data, int bytes);
byte[] ret = process(buf, read);

process()in above example is your processing method. It takes in a byte-array, the number of bytes it should process and returns the result as byte-array.

process()上面的例子是你的处理方法。它接受一个字节数组,它应该处理的字节数并将结果作为字节数组返回。

Last, we write the result back to a file:

最后,我们将结果写回一个文件:

os.write(ret);

We have to execute this in a loop until there are no bytes left in the file, so lets write a loop for it:

我们必须在循环中执行此操作,直到文件中没有剩余字节为止,因此让我们为其编写一个循环:

int read = 0;
while((read = is.read(buf)) > 0) {
    byte[] ret = process(buf, read);
    os.write(ret);
}

and finally close the streams

最后关闭溪流

is.close();
os.close();

And thats it. We processed the file in 4096-byte chunks and wrote the result back to a file. It's up to you what to do with the result, you could also send it over TCP or even drop it if it's not needed, or even readfrom TCP instead of a file, the basic logic is the same.

就是这样。我们以 4096 字节的块处理文件并将结果写回文件。由你决定如何处理结果,你也可以通过 TCP 发送它,如果不需要它甚至可以丢弃它,或者甚至从 TCP 而不是文件中读取,基本逻辑是相同的。

This still needs some proper error-handling to work around missing files or wrong permissions but that's up to you to implement that.

这仍然需要一些适当的错误处理来解决丢失的文件或错误的权限,但这取决于您来实现。



A example implementation for the process method:

process 方法的示例实现:

//returns the hex-representation of the bytes
public static byte[] process(byte[] bytes, int length) {
    final char[] hexchars = "0123456789ABCDEF".toCharArray();
    char[] ret = new char[length * 2];
    for ( int i = 0; i < length; ++i) {
        int b = bytes[i] & 0xFF;
        ret[i * 2] = hexchars[b >>> 4];
        ret[i * 2 + 1] = hexchars[b & 0x0F];
    }
    return ret;
}