Java 大文件磁盘 IO 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/964332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 21:39:18  来源:igfitidea点击:

Java Large Files Disk IO Performance

javaperformancecomparisonstreamnio

提问by Peter Kofler

I have two (2GB each) files on my harddisk and want to compare them with each other:

我的硬盘上有两个(每个 2GB)文件,想将它们相互比较:

  • Copying the original files with Windows explorer takes approx. 2-4 minutes (that is reading and writing - on the same physical and logical disk).
  • Reading with java.io.FileInputStreamtwice and comparing the byte arrays on a byte per byte basis takes 20+ minutes.
  • java.io.BufferedInputStreambuffer is 64kb, the files are read in chunks and then compared.
  • Comparison is done is a tight loop like

    int numRead = Math.min(numRead[0], numRead[1]);
    for (int k = 0; k < numRead; k++)
    {
       if (buffer[1][k] != buffer[0][k])
       {
          return buffer[0][k] - buffer[1][k];
       }
    }
    
  • 使用 Windows 资源管理器复制原始文件大约需要。2-4 分钟(即读取和写入 - 在同一个物理和逻辑磁盘上)。
  • 读取java.io.FileInputStream两次并在每个字节的基础上比较字节数组需要 20 多分钟。
  • java.io.BufferedInputStream缓冲区为 64kb,文件被分块读取,然后进行比较。
  • 比较是一个紧密的循环,就像

    int numRead = Math.min(numRead[0], numRead[1]);
    for (int k = 0; k < numRead; k++)
    {
       if (buffer[1][k] != buffer[0][k])
       {
          return buffer[0][k] - buffer[1][k];
       }
    }
    

What can I do to speed this up? Is NIO supposed to be faster then plain streams? Is Java unable to use DMA/SATA technologies and does some slow OS-API calls instead?

我能做些什么来加快速度?NIO 应该比普通流更快吗?Java 是否无法使用 DMA/SATA 技术,而是执行一些缓慢的 OS-API 调用?

EDIT:
Thanks for the answers. I did some experiments based on them. As Andreas showed

编辑:
感谢您的回答。我根据它们做了一些实验。正如安德烈亚斯所展示的

streams or nioapproaches do not differ much.
More important is the correct buffer size.

流或nio方法没有太大区别。
更重要的是正确的缓冲区大小。

This is confirmed by my own experiments. As the files are read in big chunks, even additional buffers (BufferedInputStream) do not give anything. Optimising the comparison is possible and I got the best results with 32-fold unrolling, but the time spend in comparison is small compared to disk read, so the speedup is small. Looks like there is nothing I can do ;-(

我自己的实验证实了这一点。由于文件是以大块形式读取的,因此即使是额外的缓冲区 ( BufferedInputStream) 也不会提供任何内容。优化比较是可能的,我在 32 倍展开时得到了最好的结果,但与磁盘读取相比,比较花费的时间很小,因此加速很小。看起来我无能为力;-(

采纳答案by Andreas Petersson

I tried out three different methods of comparing two identical 3,8 gb files with buffer sizes between 8 kb and 1 MB. the first first method used just two buffered input streams

我尝试了三种不同的方法来比较两个相同的 3,8 gb 文件,缓冲区大小介于 8 kb 和 1 MB 之间。第一种方法只使用两个缓冲输入流

the second approach uses a threadpool that reads in two different threads and compares in a third one. this got slightly higher throughput at the expense of a high cpu utilisation. the managing of the threadpool takes a lot of overhead with those short-running tasks.

第二种方法使用线程池读取两个不同的线程并在第三个线程中进行比较。这以高 CPU 利用率为代价获得了略高的吞吐量。对于那些短期运行的任务,线程池的管理需要大量开销。

the third approach uses nio, as posted by laginimaineb

第三种方法使用 nio,如 laginimaineb 发布的

as you can see, the general approach does not differ much. more important is the correct buffer size.

如您所见,一般方法差别不大。更重要的是正确的缓冲区大小。

what is strange that i read 1 byte less using threads. i could not spot the error tough.

奇怪的是,我使用线程少读了 1 个字节。我无法发现错误。

comparing just with two streams
I was equal, even after 3684070360 bytes and reading for 704813 ms (4,98MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 578563 ms (6,07MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 515422 ms (6,82MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 534532 ms (6,57MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 422953 ms (8,31MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 793359 ms (4,43MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 746344 ms (4,71MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 669969 ms (5,24MB/sec * 2) with a buffer size of 1024 kB
comparing with threads
I was equal, even after 3684070359 bytes and reading for 602391 ms (5,83MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070359 bytes and reading for 523156 ms (6,72MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070359 bytes and reading for 527547 ms (6,66MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070359 bytes and reading for 276750 ms (12,69MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070359 bytes and reading for 493172 ms (7,12MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070359 bytes and reading for 696781 ms (5,04MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070359 bytes and reading for 727953 ms (4,83MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070359 bytes and reading for 741000 ms (4,74MB/sec * 2) with a buffer size of 1024 kB
comparing with nio
I was equal, even after 3684070360 bytes and reading for 661313 ms (5,31MB/sec * 2) with a buffer size of 8 kB
I was equal, even after 3684070360 bytes and reading for 656156 ms (5,35MB/sec * 2) with a buffer size of 16 kB
I was equal, even after 3684070360 bytes and reading for 491781 ms (7,14MB/sec * 2) with a buffer size of 32 kB
I was equal, even after 3684070360 bytes and reading for 317360 ms (11,07MB/sec * 2) with a buffer size of 64 kB
I was equal, even after 3684070360 bytes and reading for 643078 ms (5,46MB/sec * 2) with a buffer size of 128 kB
I was equal, even after 3684070360 bytes and reading for 865016 ms (4,06MB/sec * 2) with a buffer size of 256 kB
I was equal, even after 3684070360 bytes and reading for 716796 ms (4,90MB/sec * 2) with a buffer size of 512 kB
I was equal, even after 3684070360 bytes and reading for 652016 ms (5,39MB/sec * 2) with a buffer size of 1024 kB

the code used:

使用的代码:

import junit.framework.Assert;
import org.junit.Before;
import org.junit.Test;

import java.io.BufferedInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.Arrays;
import java.util.concurrent.*;

public class FileCompare {

    private static final int MIN_BUFFER_SIZE = 1024 * 8;
    private static final int MAX_BUFFER_SIZE = 1024 * 1024;
    private String fileName1;
    private String fileName2;
    private long start;
    private long totalbytes;

    @Before
    public void createInputStream() {
        fileName1 = "bigFile.1";
        fileName2 = "bigFile.2";
    }

    @Test
    public void compareTwoFiles() throws IOException {
        System.out.println("comparing just with two streams");
        int currentBufferSize = MIN_BUFFER_SIZE;
        while (currentBufferSize <= MAX_BUFFER_SIZE) {
            compareWithBufferSize(currentBufferSize);
            currentBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesFutures() 
            throws IOException, ExecutionException, InterruptedException {
        System.out.println("comparing with threads");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            compareWithBufferSizeFutures(myBufferSize);
            myBufferSize *= 2;
        }
    }

    @Test
    public void compareTwoFilesNio() throws IOException {
        System.out.println("comparing with nio");
        int myBufferSize = MIN_BUFFER_SIZE;
        while (myBufferSize <= MAX_BUFFER_SIZE) {
            start = System.currentTimeMillis();
            totalbytes = 0;
            boolean wasEqual = isEqualsNio(myBufferSize);

            if (wasEqual) {
                printAfterEquals(myBufferSize);
            } else {
                Assert.fail("files were not equal");
            }

            myBufferSize *= 2;
        }

    }

    private void compareWithBufferSize(int myBufferSize) throws IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName1)),
                        myBufferSize);
        byte[] buff1 = new byte[myBufferSize];
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(new File(fileName2)),
                        myBufferSize);
        byte[] buff2 = new byte[myBufferSize];
        int read1;

        start = System.currentTimeMillis();
        totalbytes = 0;
        while ((read1 = inputStream1.read(buff1)) != -1) {
            totalbytes += read1;
            int read2 = inputStream2.read(buff2);
            if (read1 != read2) {
                break;
            }
            if (!Arrays.equals(buff1, buff2)) {
                break;
            }
        }
        if (read1 == -1) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private void compareWithBufferSizeFutures(int myBufferSize)
            throws ExecutionException, InterruptedException, IOException {
        final BufferedInputStream inputStream1 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName1)),
                        myBufferSize);
        final BufferedInputStream inputStream2 =
                new BufferedInputStream(
                        new FileInputStream(
                                new File(fileName2)),
                        myBufferSize);

        final boolean wasEqual = isEqualsParallel(myBufferSize, inputStream1, inputStream2);

        if (wasEqual) {
            printAfterEquals(myBufferSize);
        } else {
            Assert.fail("files were not equal");
        }
        inputStream1.close();
        inputStream2.close();
    }

    private boolean isEqualsParallel(int myBufferSize
            , final BufferedInputStream inputStream1
            , final BufferedInputStream inputStream2)
            throws InterruptedException, ExecutionException {
        final byte[] buff1Even = new byte[myBufferSize];
        final byte[] buff1Odd = new byte[myBufferSize];
        final byte[] buff2Even = new byte[myBufferSize];
        final byte[] buff2Odd = new byte[myBufferSize];
        final Callable<Integer> read1Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Even);
            }
        };
        final Callable<Integer> read2Even = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Even);
            }
        };
        final Callable<Integer> read1Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream1.read(buff1Odd);
            }
        };
        final Callable<Integer> read2Odd = new Callable<Integer>() {
            public Integer call() throws Exception {
                return inputStream2.read(buff2Odd);
            }
        };
        final Callable<Boolean> oddEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Odd, buff2Odd);
            }
        };
        final Callable<Boolean> evenEqualsArray = new Callable<Boolean>() {
            public Boolean call() throws Exception {
                return Arrays.equals(buff1Even, buff2Even);
            }
        };

        ExecutorService executor = Executors.newCachedThreadPool();
        boolean isEven = true;
        Future<Integer> read1 = null;
        Future<Integer> read2 = null;
        Future<Boolean> isEqual = null;
        int lastSize = 0;
        while (true) {
            if (isEqual != null) {
                if (!isEqual.get()) {
                    return false;
                } else if (lastSize == -1) {
                    return true;
                }
            }
            if (read1 != null) {
                lastSize = read1.get();
                totalbytes += lastSize;
                final int size2 = read2.get();
                if (lastSize != size2) {
                    return false;
                }
            }
            isEven = !isEven;
            if (isEven) {
                if (read1 != null) {
                    isEqual = executor.submit(oddEqualsArray);
                }
                read1 = executor.submit(read1Even);
                read2 = executor.submit(read2Even);
            } else {
                if (read1 != null) {
                    isEqual = executor.submit(evenEqualsArray);
                }
                read1 = executor.submit(read1Odd);
                read2 = executor.submit(read2Odd);
            }
        }
    }

    private boolean isEqualsNio(int myBufferSize) throws IOException {
        FileChannel first = null, seconde = null;
        try {
            first = new FileInputStream(fileName1).getChannel();
            seconde = new FileInputStream(fileName2).getChannel();
            if (first.size() != seconde.size()) {
                return false;
            }
            ByteBuffer firstBuffer = ByteBuffer.allocateDirect(myBufferSize);
            ByteBuffer secondBuffer = ByteBuffer.allocateDirect(myBufferSize);
            int firstRead, secondRead;
            while (first.position() < first.size()) {
                firstRead = first.read(firstBuffer);
                totalbytes += firstRead;
                secondRead = seconde.read(secondBuffer);
                if (firstRead != secondRead) {
                    return false;
                }
                if (!nioBuffersEqual(firstBuffer, secondBuffer, firstRead)) {
                    return false;
                }
            }
            return true;
        } finally {
            if (first != null) {
                first.close();
            }
            if (seconde != null) {
                seconde.close();
            }
        }
    }

    private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
        if (first.limit() != second.limit() || length > first.limit()) {
            return false;
        }
        first.rewind();
        second.rewind();
        for (int i = 0; i < length; i++) {
            if (first.get() != second.get()) {
                return false;
            }
        }
        return true;
    }

    private void printAfterEquals(int myBufferSize) {
        NumberFormat nf = new DecimalFormat("#.00");
        final long dur = System.currentTimeMillis() - start;
        double seconds = dur / 1000d;
        double megabytes = totalbytes / 1024 / 1024;
        double rate = (megabytes) / seconds;
        System.out.println("I was equal, even after " + totalbytes
                + " bytes and reading for " + dur
                + " ms (" + nf.format(rate) + "MB/sec * 2)" +
                " with a buffer size of " + myBufferSize / 1024 + " kB");
    }
}

回答by Stu Thompson

With such large files, you are going to get MUCH better performance with java.nio.

对于如此大的文件,您将使用java.nio获得更好的性能

Additionally, reading single bytes with java streams can be very slow. Using a byte array (2-6K elements from my own experiences, ymmv as it seems platform/application specific) will dramatically improve your read performance with streams.

此外,使用 java 流读取单个字节可能非常慢。使用字节数组(来自我自己的经验的 2-6K 元素,ymmv,因为它似乎是特定于平台/应用程序的)将显着提高您对流的读取性能。

回答by laginimaineb

Reading and writing the files with Java can be just as fast. You can use FileChannels. As for comparing the files, obviously this will take a lot of time comparing byte to byte Here's an example using FileChannels and ByteBuffers (could be further optimized):

使用 Java 读取和写入文件可以同样快。您可以使用FileChannels。至于比较文件,显然这将花费大量时间比较字节到字节以下是使用 FileChannels 和 ByteBuffers 的示例(可以进一步优化):

public static boolean compare(String firstPath, String secondPath, final int BUFFER_SIZE) throws IOException {
    FileChannel firstIn = null, secondIn = null;
    try {
        firstIn = new FileInputStream(firstPath).getChannel();
        secondIn = new FileInputStream(secondPath).getChannel();
        if (firstIn.size() != secondIn.size())
            return false;
        ByteBuffer firstBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        ByteBuffer secondBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
        int firstRead, secondRead;
        while (firstIn.position() < firstIn.size()) {
            firstRead = firstIn.read(firstBuffer);
            secondRead = secondIn.read(secondBuffer);
            if (firstRead != secondRead)
                return false;
            if (!buffersEqual(firstBuffer, secondBuffer, firstRead))
                return false;
        }
        return true;
    } finally {
        if (firstIn != null) firstIn.close();
        if (secondIn != null) firstIn.close();
    }
}

private static boolean buffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit())
        return false;
    if (length > first.limit())
        return false;
    first.rewind(); second.rewind();
    for (int i=0; i<length; i++)
        if (first.get() != second.get())
            return false;
    return true;
}

回答by alamar

DMA/SATA are hardware/low-level techlonogies and aren't visible to any programming language whatsoever.

DMA/SATA 是硬件/低级技术,对任何编程语言都不可见。

For memory mapped input/output you should use java.nio, I believe.

对于内存映射的输入/输出,我相信你应该使用 java.nio。

Are you sure that you aren't reading those files by one byte? That would be wasteful, I'd recommend doing it block-by-block, and each block should be something like 64 megabytes to minimize seeking.

您确定不是按一个字节读取这些文件吗?那会很浪费,我建议一个块一个块地做,每个块应该是 64 兆字节,以尽量减少搜索。

回答by Kosi2801

You can have a look at Suns Article for I/O Tuning(altough already a bit dated), maybe you can find similarities between the examples there and your code. Also have a look at the java.niopackage which contains faster I/O elements than java.io. Dr. Dobbs Journal has a quite nice article on high performance IO using java.nio.

你可以看看Suns 的 I/O 调优文章(虽然已经有点过时了),也许你可以找到那里的例子和你的代码之间的相似之处。还可以查看包含比 java.io 更快的 I/O 元素的java.nio包。Dr. Dobbs Journal 有一篇关于使用 java.nio 的高性能 IO 的文章。

If so, there are further examples and tuning tips available there which should be able to help you to speed up your code.

如果是这样,那里有更多示例和调整技巧可用,它们应该能够帮助您加快代码速度。

Furthermore the Arrays class has methods for comparing byte arraysbuild in, maybe these can also be used to make things faster and clear up your loop a bit.

此外,Arrays 类具有用于比较内置字节数组的方法,也许这些方法也可用于使事情变得更快并稍微清理一下循环。

回答by Gareth Davis

The following is a good article on the relative merits of the different ways to read a file in java. May be of some use:

下面是一篇关于在 java 中读取文件的不同方法的相对优点的好文章。可能有点用:

How to read files quickly

如何快速读取文件

回答by Peter Lawrey

For a better comparison try copying two files at once. A hard drive can read one file much more efficiently than reading two (as the head has to move back and forth to read) One way to reduce this is to use larger buffers, e.g. 16 MB. with ByteBuffer.

为了更好地比较,请尝试一次复制两个文件。硬盘驱动器读取一个文件比读取两个文件更有效(因为磁头必须来回移动才能读取) 减少这种情况的一种方法是使用更大的缓冲区,例如 16 MB。与字节缓冲区。

With ByteBuffer you can compare 8-bytes at a time by comparing long values with getLong()

使用 ByteBuffer,您可以通过将长值与 getLong() 进行比较来一次比较 8 个字节

If your Java is efficient, most of the work is in the disk/OS for reading and writing so it shouldn't be much slower than using any other language (as the disk/OS is the bottleneck)

如果您的 Java 是高效的,那么大部分工作都在磁盘/操作系统中进行读取和写入,因此它不应该比使用任何其他语言慢多少(因为磁盘/操作系统是瓶颈)

Don't assume Java is slow until you have determined its not a bug in your code.

在确定它不是代码中的错误之前,不要假设 Java 很慢。

回答by Peter Lawrey

After modifying your NIO compare function I get the following results.

修改你的 NIO 比较函数后,我得到以下结果。

I was equal, even after 4294967296 bytes and reading for 304594 ms (13.45MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 225078 ms (18.20MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 221351 ms (18.50MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 37 MB/s

注意:这意味着文件的读取速度为 37 MB/s

Running the same thing on a faster drive

在更快的驱动器上运行相同的东西

I was equal, even after 4294967296 bytes and reading for 178087 ms (23.00MB/sec * 2) with a buffer size of 1024 kB
I was equal, even after 4294967296 bytes and reading for 119084 ms (34.40MB/sec * 2) with a buffer size of 4096 kB
I was equal, even after 4294967296 bytes and reading for 109549 ms (37.39MB/sec * 2) with a buffer size of 16384 kB

Note: this means the files are being read at a rate of 74.8 MB/s

注意:这意味着文件的读取速度为 74.8 MB/s

private static boolean nioBuffersEqual(ByteBuffer first, ByteBuffer second, final int length) {
    if (first.limit() != second.limit() || length > first.limit()) {
        return false;
    }
    first.rewind();
    second.rewind();
    int i;
    for (i = 0; i < length-7; i+=8) {
        if (first.getLong() != second.getLong()) {
            return false;
        }
    }
    for (; i < length; i++) {
        if (first.get() != second.get()) {
            return false;
        }
    }
    return true;
}

回答by Thorbj?rn Ravn Andersen

Try setting the buffer on the input stream up to several megabytes.

尝试将输入流上的缓冲区设置为几兆字节。

回答by RickHigh

I found that a lot of the articles linked to in this post are really out dated (there is also some very insightful stuff too). There are some articles linked from 2001, and the information is questionable at best. Martin Thompson of mechanical sympathy wrote quite a bit about this in 2011. Please refer to what he wrote for background and theory of this.

我发现这篇文章中链接的很多文章都已经过时了(也有一些非常有见地的东西)。有一些从 2001 年开始链接的文章,这些信息充其量是值得怀疑的。机械同情的 Martin Thompson 在 2011 年写了很多关于这个的文章。请参考他写的关于这方面的背景和理论。

I have found that NIO or not NIO has very little to do with the performance. It is much more about the size of your output buffers (read byte array on that one). NIO is no magic make it go fast web scale sauce.

我发现NIO与否NIO与性能几乎没有关系。它更多地与输出缓冲区的大小有关(读取该缓冲区的字节数组)。NIO 没有什么神奇之处,让它在网络规模上运行得更快。

I was able to take Martin's examples and use the 1.0 era OutputStream and make it scream. NIO is fast too, but the biggest indicator is just the size of the output buffer not whether or not you use NIO unless of course you are using a memory mapped NIO then it matters. :)

我能够采用 Martin 的例子并使用 1.0 时代的 OutputStream 并使其尖叫。NIO 也很快,但最大的指标只是输出缓冲区的大小,而不是您是否使用 NIO,除非您使用内存映射 NIO,否则这很重要。:)

If you want up to date authoritative information on this, see Martin's blog:

如果您想了解这方面的最新权威信息,请参阅 Martin 的博客:

http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html

http://mechanical-sympathy.blogspot.com/2011/12/java-sequential-io-performance.html

If you want to see how NIO does not make that much of a difference (as I was able to write examples using regular IO that were faster) see this:

如果您想了解 NIO 如何没有产生太大差异(因为我能够使用更快的常规 IO 编写示例),请参阅以下内容:

http://www.dzone.com/links/fast_java_io_nio_is_always_faster_than_fileoutput.html

http://www.dzone.com/links/fast_java_io_nio_is_always_faster_than_fileoutput.html

I have tested my assumption on new windows laptop with a fast hard disk, my macbook pro with SSD, an EC2 xlarge, and an EC2 4x large with maxed out IOPS/high speed I/O (and soon on an large disk NAS fibre disk array) so it works (there are some issues with it for smaller EC2 instances but if you care about performance... are you going to use a small EC2 instance?). If you use real hardware, in my tests so far, traditional IO always wins. If you use high/IO EC2, then this is also a clear winner. If you use under powered EC2 instances, NIO can win.

我已经在带有快速硬盘的新 Windows 笔记本电脑、带有 SSD 的 macbook pro、EC2 xlarge 和带有最大 IOPS/高速 I/O 的 EC2 4x large(以及很快在大型磁盘 NAS 光纤磁盘上)测试了我的假设阵列)所以它可以工作(对于较小的 EC2 实例,它存在一些问题,但如果您关心性能……您打算使用小型 EC2 实例吗?)。如果您使用真正的硬件,在我迄今为止的测试中,传统 IO 总是胜出。如果您使用高/IO EC2,那么这也是一个明显的赢家。如果您使用动力不足的 EC2 实例,NIO 可以胜出。

There is no substitution for benchmarking.

基准测试没有替代品。

Anyway, I am no expert, I just did some empirical testing using the framework that Sir Martin Thompson wrote up in his blog post.

无论如何,我不是专家,我只是使用 Martin Thompson 爵士在他的博客文章中写的框架进行了一些实证测试。

I took this to the next step and used Files.newInputStream(from JDK 7) with TransferQueueto create a recipe for making Java I/O scream (even on small EC2 instances). The recipe can be found at the bottom of this documentation for Boon (https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder). This allows me to use a traditional OutputStream but with something that works well on smaller EC2 instances. (I am the main author of Boon. But I am accepting new authors. The pay sucks. 0$ per hour. But the good news is, I can double your pay whenever you like.)

我将其带到下一步,并使用Files.newInputStream(来自 JDK 7)和TransferQueue创建一个使 Java I/O 尖叫的方法(即使在小型 EC2 实例上)。该配方可以在 Boon 的本文档底部找到(https://github.com/RichardHightower/boon/wiki/Auto-Growable-Byte-Buffer-like-a-ByteBuilder)。这使我可以使用传统的 OutputStream,但它在较小的 EC2 实例上运行良好。(我是 Boon 的主要作者。但我正在接受新作者。薪水很差。每小时 0 美元。但好消息是,只要你愿意,我可以让你的工资翻倍。)

My 2 cents.

我的 2 美分。

See this to see why TransferQueueis important. http://php.sabscape.com/blog/?p=557

请参阅此内容以了解TransferQueue为何重要。http://php.sabscape.com/blog/?p=557

Key learnings:

主要学习:

  1. If you care about performance never, ever, ever use BufferedOutputStream.
  2. NIO does not always equal performance.
  3. Buffer size matters most.
  4. Recycling buffers for high-speed writes is critical.
  5. GC can/will/does implode your performance for high-speed writes.
  6. You have to have some mechanism to reuse spent buffers.
  1. 如果您永远不关心性能,请永远不要使用BufferedOutputStream
  2. NIO 并不总是等于性能。
  3. 缓冲区大小最重要。
  4. 为高速写入回收缓冲区至关重要。
  5. GC 可以/将/确实会破坏您的高速写入性能。
  6. 你必须有一些机制来重用用过的缓冲区。