Java 为什么使用 BufferedInputStream 比使用 FileInputStream 更快地逐字节读取文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18600331/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 09:23:33  来源:igfitidea点击:

Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

javafile-ioinputstreamfileinputstream

提问by ZimZim

I was trying to read a file into an array by using FileInputStream, and an ~800KB file took about 3 seconds to read into memory. I then tried the same code except with the FileInputStream wrapped into a BufferedInputStream and it took about 76 milliseconds. Why is reading a file byte by byte done so much faster with a BufferedInputStream even though I'm still reading it byte by byte? Here's the code (the rest of the code is entirely irrelevant). Note that this is the "fast" code. You can just remove the BufferedInputStream if you want the "slow" code:

我试图通过使用 FileInputStream 将一个文件读入一个数组,一个 ~800KB 的文件读入内存大约需要 3 秒。然后我尝试了相同的代码,除了 FileInputStream 包装到 BufferedInputStream 中,它花费了大约 76 毫秒。为什么使用 BufferedInputStream 逐字节读取文件的速度要快得多,即使我仍在逐字节读取它?这是代码(其余代码完全无关)。请注意,这是“快速”代码。如果您想要“慢”代码,您可以删除 BufferedInputStream :

InputStream is = null;

    try {
        is = new BufferedInputStream(new FileInputStream(file));

        int[] fileArr = new int[(int) file.length()];

        for (int i = 0, temp = 0; (temp = is.read()) != -1; i++) {
            fileArr[i] = temp;
        }

BufferedInputStream is over 30 times faster. Far more than that. So, why is this, and is it possible to make this code more efficient (without using any external libraries)?

BufferedInputStream 快 30 倍以上。远不止这些。那么,为什么会这样,是否可以使此代码更高效(不使用任何外部库)?

采纳答案by Sotirios Delimanolis

In FileInputStream, the method read()reads a single byte. From the source code:

在 中FileInputStream,该方法read()读取单个字节。从源代码:

/**
 * Reads a byte of data from this input stream. This method blocks
 * if no input is yet available.
 *
 * @return     the next byte of data, or <code>-1</code> if the end of the
 *             file is reached.
 * @exception  IOException  if an I/O error occurs.
 */
public native int read() throws IOException;

This is a native call to the OS which uses the disk to read the single byte. This is a heavy operation.

这是对使用磁盘读取单个字节的操作系统的本机调用。这是一项繁重的操作。

With a BufferedInputStream, the method delegates to an overloaded read()method that reads 8192amount of bytes and buffers them until they are needed. It still returns only the single byte (but keeps the others in reserve). This way the BufferedInputStreammakes less native calls to the OS to read from the file.

使用 a BufferedInputStream,该方法委托给一个重载read()方法,该方法读取8192字节数并缓冲它们直到需要它们。它仍然只返回单个字节(但保留其他字节)。这样就BufferedInputStream可以减少对操作系统的本地调用以从文件中读取。

For example, your file is 32768bytes long. To get all the bytes in memory with a FileInputStream, you will require 32768native calls to the OS. With a BufferedInputStream, you will only require 4, regardless of the number of read()calls you will do (still 32768).

例如,您的文件是32768字节长。要使用 获取内存中的所有字节FileInputStream,您将需要32768对操作系统进行本机调用。使用BufferedInputStream,您将只需要4,无论read()您将进行多少次调用(仍然32768)。

As to how to make it faster, you might want to consider Java 7's NIO FileChannelclass, but I have no evidence to support this.

至于如何使其更快,您可能需要考虑 Java 7 的 NIOFileChannel类,但我没有证据支持这一点。



Note:if you used FileInputStream's read(byte[], int, int)method directly instead, with a byte[>8192]you wouldn't need a BufferedInputStreamwrapping it.

注意:如果您直接使用FileInputStream'sread(byte[], int, int)方法,则byte[>8192]不需要BufferedInputStream包装它。

回答by usha

A BufferedInputStream wrapped around a FileInputStream, will request data from the FileInputStream in big chunks (512 bytes or so by default, I think.) Thus if you read 1000 characters one at a time, the FileInputStream will only have to go to the disk twice. This will be much faster!

包裹在 FileInputStream 周围的 BufferedInputStream 将从大块的 FileInputStream 请求数据(我认为默认情况下为 512 字节左右。)因此,如果您一次读取 1000 个字符,则 FileInputStream 将只需要两次访问磁盘. 这样会快很多!

回答by huseyin

It is because of the cost of disk access. Lets assume you will have a file which size is 8kb. 8*1024 times access disk will be needed to read this file without BufferedInputStream.

这是因为磁盘访问的成本。假设您将拥有一个大小为 8kb 的文件。在没有 BufferedInputStream 的情况下读取此文件将需要 8*1024 次访问磁盘。

At this point, BufferedStream comes to the scene and acts as a middle man between FileInputStream and the file to be read.

此时,BufferedStream 就出现了,它充当了 FileInputStream 和要读取的文件之间的中间人。

In one shot, will get chunks of bytes default is 8kb to memory and then FileInputStream will read bytes from this middle man. This will decrease the time of the operation.

一次,将获得默认为 8kb 的字节块到内存,然后 FileInputStream 将从这个中间人读取字节。这将减少操作的时间。

private void exercise1WithBufferedStream() {
      long start= System.currentTimeMillis();
        try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
            BufferedInputStream bufferedInputStream = new BufferedInputStream(myFile);
            boolean eof = false;
            while (!eof) {
                int inByteValue = bufferedInputStream.read();
                if (inByteValue == -1) eof = true;
            }
        } catch (IOException e) {
            System.out.println("Could not read the stream...");
            e.printStackTrace();
        }
        System.out.println("time passed with buffered:" + (System.currentTimeMillis()-start));
    }


    private void exercise1() {
        long start= System.currentTimeMillis();
        try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
            boolean eof = false;
            while (!eof) {
                int inByteValue = myFile.read();
                if (inByteValue == -1) eof = true;
            }
        } catch (IOException e) {
            System.out.println("Could not read the stream...");
            e.printStackTrace();
        }
        System.out.println("time passed without buffered:" + (System.currentTimeMillis()-start));
    }