如何在最短的时间内克隆 Java 中的输入流

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13301076/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 12:11:57  来源:igfitidea点击:

How to clone an inputstream in java in minimal time

javacloneinputstreambufferedinputstream

提问by Classified

Can someone tell me how to clone an inputstream, taking as little creation time as possible? I need to clone an inputstream multiple times for multiple methods to process the IS. I've tried three ways and things don't work for one reason or another.

有人能告诉我如何克隆输入流,尽可能减少创建时间吗?我需要多次克隆输入流以使用多种方法来处理 IS。我已经尝试了三种方法,但由于一种或另一种原因,事情不起作用。

Method #1: Thanks to the stackoverflow community, I found the following link helpful and have incorporated the code snippet in my program.

方法 #1:感谢 stackoverflow 社区,我发现以下链接很有帮助,并已将代码片段合并到我的程序中。

How to clone an InputStream?

如何克隆 InputStream?

However, using this code can take up to one minute (for a 10MB file) to create the cloned inputstreams and my program needs to be as fast as possible.

但是,使用此代码最多可能需要一分钟(对于 10MB 文件)来创建克隆的输入流,并且我的程序需要尽可能快。

    int read = 0;
    byte[] bytes = new byte[1024*1024*2];

    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    while ((read = is.read(bytes)) != -1)
        bos.write(bytes,0,read);
    byte[] ba = bos.toByteArray();

    InputStream is1 = new ByteArrayInputStream(ba);
    InputStream is2 = new ByteArrayInputStream(ba);
    InputStream is3 = new ByteArrayInputStream(ba);

Method #2: I also tried using BufferedInputStream to clone the IS. This was fast (slowest creation time == 1ms. fastest == 0ms). However, after I sent is1 to be processed, the methods processing is2 and is3 threw an error saying there was nothing to process, almost like all 3 variables below referenced the same IS.

方法#2:我也尝试使用 BufferedInputStream 来克隆 IS。这很快(最慢的创建时间 == 1 毫秒。最快的 == 0 毫秒)。但是,在我发送要处理的 is1 之后,处理 is2 和 is3 的方法抛出了一个错误,说没有什么可处理的,几乎就像下面的所有 3 个变量都引用了相同的 IS。

    is = getFileFromBucket(path,filename);
    ...
    ...
    InputStream is1 = new BufferedInputStream(is);
    InputStream is2 = new BufferedInputStream(is);
    InputStream is3 = new BufferedInputStream(is);

Method #3: I think the compiler is lying to me. I checked markSupported() for is1 for the two examples above. It returned true so I thought I could run

方法#3:我认为编译器在骗我。对于上面的两个示例,我检查了 is1 的 markSupported()。它返回 true 所以我想我可以跑

    is1.mark() 
    is1.reset()

or just

要不就

    is1.reset();

before passing the IS to my respective methods. In both of the above examples, I get an error saying it's an invalid mark.

在将 IS 传递给我各自的方法之前。在上述两个示例中,我都收到一条错误消息,指出这是一个无效标记。

I'm out of ideas now so thanks in advance for any help you can give me.

我现在没有想法了,所以提前感谢你能给我的任何帮助。

P.S. From the comments I've received from people, I need to clarify a couple things regarding my situation: 1) This program is running on a VM 2) The inputstream is being passed into me from another method. I'm not reading from a local file 3) The size of the inputstream is not known

PS 根据我从人们那里收到的评论,我需要澄清一些关于我的情况的事情:1) 该程序在 VM 上运行 2) 输入流是从另一种方法传递给我的。我不是从本地文件读取 3) 输入流的大小未知

回答by BalusC

how to clone an inputstream, taking as little creation time as possible? I need to clone an inputstream multiple times for multiple methods to process the IS

如何克隆输入流,尽可能减少创建时间?我需要多次克隆输入流以使用多种方法来处理 IS

You could just create some kind of a custom ReusableInputStreamclass wherein you immediatelyalso write to an internal ByteArrayOutputStreamon the 1st full read, then wrap it in a ByteBufferwhen the last byte is read and finally reuse the very same ByteBufferon the subsequent full reads which get automatically flipped when limit is reached. This saves you from one full read as in your 1st attempt.

您可以创建某种自定义ReusableInputStream类,其中您还立即ByteArrayOutputStream在第一次完整读取时写入内部 ,然后在ByteBuffer读取最后一个字节时将其包装在 a 中,最后ByteBuffer在随后的完整读取中重用相同的内容,这些读取会自动翻转当达到极限时。这可以避免您在第一次尝试时进行一次完整阅读。

Here's a basic kickoff example:

这是一个基本的启动示例:

public class ReusableInputStream extends InputStream {

    private InputStream input;
    private ByteArrayOutputStream output;
    private ByteBuffer buffer;

    public ReusableInputStream(InputStream input) throws IOException {
        this.input = input;
        this.output = new ByteArrayOutputStream(input.available()); // Note: it's resizable anyway.
    }

    @Override
    public int read() throws IOException {
        byte[] b = new byte[1];
        read(b, 0, 1);
        return b[0];
    }

    @Override
    public int read(byte[] bytes) throws IOException {
        return read(bytes, 0, bytes.length);
    }

    @Override
    public int read(byte[] bytes, int offset, int length) throws IOException {
        if (buffer == null) {
            int read = input.read(bytes, offset, length);

            if (read <= 0) {
                input.close();
                input = null;
                buffer = ByteBuffer.wrap(output.toByteArray());
                output = null;
                return -1;
            } else {
                output.write(bytes, offset, read);
                return read;
            }
        } else {
            int read = Math.min(length, buffer.remaining());

            if (read <= 0) {
                buffer.flip();
                return -1;
            } else {
                buffer.get(bytes, offset, read);
                return read;
            }
        }

    }

    // You might want to @Override flush(), close(), etc to delegate to input.
}

(note that the actual job is performed in int read(byte[], int, int)instead of in int read()and thus it's expected to be faster when the caller itself is also streaming using a byte[]buffer)

(请注意,实际作业是在 inint read(byte[], int, int)而不是 in 中执行的int read(),因此当调用者本身也使用byte[]缓冲区进行流式传输时,预计速度会更快)

You could use it as follows:

您可以按如下方式使用它:

InputStream input = new ReusableInputStream(getFileFromBucket(path,filename));
IOUtils.copy(input, new FileOutputStream("/copy1.ext"));
IOUtils.copy(input, new FileOutputStream("/copy2.ext"));
IOUtils.copy(input, new FileOutputStream("/copy3.ext"));

As to the performance, 1 minute per 10MB is more likely a hardware problem, not a software problem. My 7200rpm laptop harddisk does it in less than 1 second.

至于性能,每 10MB 1 分钟更可能是硬件问题,而不是软件问题。我的 7200rpm 笔记本电脑硬盘在不到 1 秒的时间内完成。

回答by Stephen C

However, using this code can take up to one minute (for a 10MB file) to create the cloned inputstreams and my program needs to be as fast as possible.

但是,使用此代码最多可能需要一分钟(对于 10MB 文件)来创建克隆的输入流,并且我的程序需要尽可能快。

Well copying a stream takes time, and (in general) that is the only way to clone a stream. Unless you tighten the scope of the problem, there is little chance that the performance can be significantly improved.

复制流需要时间,并且(通常)这是克隆流的唯一方法。除非您缩小问题的范围,否则性能得到显着提高的可能性很小。

Here are a couple of circumstances where improvement is possible:

以下是可以改进的几种情况:

  • If you knew beforehand the number of bytes in the stream then you can read directly into the final byte array.

  • If you knew that the data is coming from a file, you could create a memory mapped buffer for the file.

  • 如果您事先知道流中的字节数,那么您可以直接读入最终的字节数组。

  • 如果您知道数据来自文件,则可以为该文件创建内存映射缓冲区。

But the fundamental problem is that moving lots of bytes around takes time. And the fact that it is taking 1 minute for a 10Mb file (using the code in your Question) suggeststhat the real bottleneck is not in Java at all.

但基本问题是移动大量字节需要时间。事实上,一个 10Mb 的文件(使用问题中的代码)需要 1 分钟,这表明真正的瓶颈根本不在 Java 中。

回答by Edwin Dalorzo

Regarding your first approach, the one consisting in putting all your bytes in an ByteArrayOutputStream:

关于您的第一种方法,即将所有字节放入 ByteArrayOutputStream 中:

  • First, this approach consumes a lot of memory. If you do not make sure that your JVM starts with enough memory allocated, it will need to dynamically request memory during the processing of your stream and this is time consuming.
  • Your ByteArrayOutputStream is initially created with a buffer of 32 bytes. Every time you try to put something in it, if it does not fit in the existing byte array a new bigger array is created and the old bytes are copied to the new one. Since you are using a 2MB input every time, you are forcing the ByteArrayOutputStream copy its data over and over again into bigger arrays, increasing the size of its array in 2MB every time.
  • Since the old arrays are garbage, it is probable that their memory is being reclaimed by the garbage collector, which makes your copying process even slower.
  • Perhaps you should define the ByArrayOutputStream using the constructor that specifies an initial buffer size. The more accurately that you set the size the faster the process should be because less intermediate copies will be required.
  • 首先,这种方式会消耗大量内存。如果您不确保您的 JVM 启动时分配了足够的内存,它将需要在处理流期间动态请求内存,这很耗时。
  • 您的 ByteArrayOutputStream 最初是用 32 字节的缓冲区创建的。每次你尝试往里面放东西时,如果它不适合现有的字节数组,就会创建一个新的更大的数组,并将旧的字节复制到新的字节数组中。由于您每次都使用 2MB 的输入,因此您强制 ByteArrayOutputStream 一遍又一遍地将其数据复制到更大的数组中,每次都将其数组的大小增加 2MB。
  • 由于旧数组是垃圾,垃圾收集器很可能正在回收它们的内存,这会使您的复制过程更慢。
  • 也许您应该使用指定初始缓冲区大小的构造函数来定义 ByArrayOutputStream。您设置的大小越准确,过程就越快,因为需要的中间副本越少。

You second approach is bogus, you cannot decorate the same input stream within different other streams and expect the things to work. As the bytes are consumed by one stream, the inner stream is exhausted as well, and cannot provide the other streams with accurate data.

你的第二种方法是假的,你不能在不同的其他流中装饰相同的输入流,并期望事情能正常工作。由于字节被一个流消耗,内部流也被耗尽,无法为其他流提供准确的数据。

Before I extend my answer let me ask, are your other methods expecting to receive copies of the input stream running on a separate thread? Because if so, this sounds like the work for the PipedOutputStream and PipedInputStream?

在我扩展我的答案之前,让我问一下,您的其他方法是否希望接收在单独线程上运行的输入流的副本?因为如果是这样,这听起来像是 PipedOutputStream 和 PipedInputStream 的工作?

回答by coyotesqrl

Do you intend the separate methods to run in parallel or sequentially? If sequentially, I see no reason to clone the input stream, so I have to assume you're planning to spin off threads to manage each stream.

您打算并行还是按顺序运行单独的方法?如果按顺序,我认为没有理由克隆输入流,所以我必须假设您计划分离线程来管理每个流。

I'm not near a computer right now to test this, but I'm thinking you'd be better off reading the input in chunks, of say 1024 bytes, and then pushing those chunks (or array copies of the chunks) onto your output streams with input streams attached to their thread ends. Have your readers block if there's no data available, etc.

我现在不在计算机附近进行测试,但我认为您最好以块的形式读取输入,例如 1024 字节,然后将这些块(或块的数组副本)推送到您的输出流的输入流附加到它们的线程末端。如果没有可用数据等,让您的读者阻止。