Java 如何缓存 InputStream 以供多次使用

Question

提问by Azder

I have an InputStream of a file and i use apache poi components to read from it like this:

我有一个文件的 InputStream ，我使用 apache poi 组件来读取它，如下所示：

POIFSFileSystem fileSystem = new POIFSFileSystem(inputStream);

The problem is that i need to use the same stream multiple times and the POIFSFileSystem closes the stream after use.

问题是我需要多次使用同一个流，POIFSFileSystem 在使用后关闭流。

What is the best way to cache the data from the input stream and then serve more input streams to different POIFSFileSystem ?

缓存来自输入流的数据然后将更多输入流提供给不同的 PIOFSFileSystem 的最佳方法是什么？

EDIT 1:

编辑 1：

By cache i meant store for later use, not as a way to speedup the application. Also is it better to just read up the input stream into an array or string and then create input streams for each use ?

我所说的缓存是指存储供以后使用，而不是作为加速应用程序的一种方式。将输入流读入数组或字符串，然后为每次使用创建输入流是否更好？

EDIT 2:

编辑2：

Sorry to reopen the question, but the conditions are somewhat different when working inside desktop and web application. First of all, the InputStream i get from the org.apache.commons.fileupload.FileItem in my tomcat web app doesn't support markings thus cannot reset.

很抱歉重新打开这个问题，但在桌面和 Web 应用程序中工作时，条件有所不同。首先，我从 org.apache.commons.fileupload.FileItem 在我的 tomcat web 应用程序中获得的 InputStream 不支持标记，因此无法重置。

Second, I'd like to be able to keep the file in memory for faster acces and less io problems when dealing with files.

其次，我希望能够将文件保存在内存中，以便在处理文件时更快地访问和减少 io 问题。

Answer 1

采纳答案by dfa

you can decorate InputStream being passed to POIFSFileSystemwith a version that when close() is called it respond with reset():

您可以使用一个版本装饰传递给PIOFSFileSystem 的InputStream ，该版本在调用 close() 时它以 reset() 响应：

class ResetOnCloseInputStream extends InputStream {

    private final InputStream decorated;

    public ResetOnCloseInputStream(InputStream anInputStream) {
        if (!anInputStream.markSupported()) {
            throw new IllegalArgumentException("marking not supported");
        }

        anInputStream.mark( 1 << 24); // magic constant: BEWARE
        decorated = anInputStream;
    }

    @Override
    public void close() throws IOException {
        decorated.reset();
    }

    @Override
    public int read() throws IOException {
        return decorated.read();
    }
}

testcase

测试用例

static void closeAfterInputStreamIsConsumed(InputStream is)
        throws IOException {
    int r;

    while ((r = is.read()) != -1) {
        System.out.println(r);
    }

    is.close();
    System.out.println("=========");

}

public static void main(String[] args) throws IOException {
    InputStream is = new ByteArrayInputStream("sample".getBytes());
    ResetOnCloseInputStream decoratedIs = new ResetOnCloseInputStream(is);
    closeAfterInputStreamIsConsumed(decoratedIs);
    closeAfterInputStreamIsConsumed(decoratedIs);
    closeAfterInputStreamIsConsumed(is);
}

EDIT 2

编辑 2

you can read the entire file in a byte[] (slurp mode) then passing it to a ByteArrayInputStream

您可以在 byte[]（slurp 模式）中读取整个文件，然后将其传递给 ByteArrayInputStream

Answer 2

回答by Aaron Digulla

If the file is not that big, read it into a byte[]array and give POI a ByteArrayInputStreamcreated from that array.

如果文件不是那么大，将它读入一个byte[]数组并ByteArrayInputStream从该数组中创建一个POI 。

If the file is big, then you shouldn't care, since the OS will do the caching for you as best as it can.

如果文件很大，那么您不必在意，因为操作系统会尽其所能为您进行缓存。

[EDIT] Use Apache commons-ioto read the File into a byte array in an efficient way. Do not use int read()since it reads the file byte by byte which is veryslow!

[编辑] 使用Apache commons-io以有效的方式将文件读入字节数组。不要使用，int read()因为它逐字节读取文件，这非常慢！

If you want to do it yourself, use a Fileobject to get the length, create the array and the a loop which reads bytes from the file. You must loop since read(byte[], int offset, int len)can read less than lenbytes (and usually does).

如果你想自己做，使用一个File对象来获取长度，创建数组和从文件中读取字节的循环。您必须循环，因为read(byte[], int offset, int len)可以读取少于len字节（通常是这样）。

Answer 3

回答by Michael Borgwardt

What exactly do you mean with "cache"? Do you want the different POIFSFileSystem to start at the beginning of the stream? If so, there's absolutely no point caching anything in your Java code; it will be done by the OS, just open a new stream.

“缓存”究竟是什么意思？您是否希望不同的 POIFSFileSystem 在流的开头启动？如果是这样，那么在您的 Java 代码中缓存任何内容绝对没有意义；它将由操作系统完成，只需打开一个新流。

Or do you wan to continue reading at the point where the first POIFSFileSystem stopped? That's not caching, and it's very difficult to do. The only way I can think of if you can't avoid the stream getting closed would be to write a thin wrapper that counts how many bytes have been read and then open a new stream and skip that many bytes. But that could fail when POIFSFileSystem internally uses something like a BufferedInputStream.

或者你想在第一个 POIFSFileSystem 停止的地方继续阅读？这不是缓存，而且很难做到。如果您无法避免流被关闭，我能想到的唯一方法是编写一个薄包装器来计算已读取的字节数，然后打开一个新流并跳过那么多字节。但是，当 POIFSFileSystem 在内部使用类似 BufferedInputStream 的东西时，这可能会失败。

Answer 4

回答by adrian.tarau

This is how I would implemented, to be safely used with any InputStream :

这是我将如何实现，以安全地与任何 InputStream 一起使用：

write your own InputStream wrapper where you create a temporary file to mirror the original stream content
dump everything read from the original input stream into this temporary file
when the stream was completely read you will have all the data mirrored in the temporary file
use InputStream.reset to switch(initialize) the internal stream to a FileInputStream(mirrored_content_file)
from now on you will loose the reference of the original stream(can be collected)
add a new method release() which will remove the temporary file and release any open stream.
you can even call release() from finalizeto be sure the temporary file is release in case you forget to call release()(most of the time you should avoid using finalize, always call a method to release object resources). see Why would you ever implement finalize()?

编写自己的 InputStream 包装器，在其中创建一个临时文件来镜像原始流内容
将从原始输入流读取的所有内容转储到此临时文件中
当流被完全读取时，您将在临时文件中镜像所有数据
使用 InputStream.reset 将内部流切换（初始化）为 FileInputStream(mirrored_content_file)
从现在开始你将失去原始流的引用（可以收集）
添加一个新方法 release() ，它将删除临时文件并释放任何打开的流。
你甚至可以从finalize调用 release()以确保临时文件被释放，以防你忘记调用 release() （大多数时候你应该避免使用finalize，总是调用一个方法来释放对象资源）。请参阅为什么要实现 finalize()？

Answer 5

回答by Tomasz

Try BufferedInputStream, which adds mark and reset functionality to another input stream, and just override its close method:

尝试使用 BufferedInputStream，它为另一个输入流添加了标记和重置功能，只需覆盖其关闭方法：

public class UnclosableBufferedInputStream extends BufferedInputStream {

    public UnclosableBufferedInputStream(InputStream in) {
        super(in);
        super.mark(Integer.MAX_VALUE);
    }

    @Override
    public void close() throws IOException {
        super.reset();
    }
}

So:

所以：

UnclosableBufferedInputStream  bis = new UnclosableBufferedInputStream (inputStream);

and use biswherever inputStream was used before.

并bis在之前使用inputStream 的任何地方使用。

Answer 6

回答by Daniel Kaplan

public static void main(String[] args) throws IOException {
    BufferedInputStream inputStream = new BufferedInputStream(IOUtils.toInputStream("Foobar"));
    inputStream.mark(Integer.MAX_VALUE);
    System.out.println(IOUtils.toString(inputStream));
    inputStream.reset();
    System.out.println(IOUtils.toString(inputStream));
}

This works. IOUtils is part of commons IO.

这有效。IOUtils 是公共 IO 的一部分。

Answer 7

回答by Kaba Aboubacar

This works correctly:

这正常工作：

byte[] bytes = getBytes(inputStream);
POIFSFileSystem fileSystem = new POIFSFileSystem(new ByteArrayInputStream(bytes));

where getBytes is like this:

其中 getBytes 是这样的：

private static byte[] getBytes(InputStream is) throws IOException {
    byte[] buffer = new byte[8192];
ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
int n;
baos.reset();

while ((n = is.read(buffer, 0, buffer.length)) != -1) {
      baos.write(buffer, 0, n);
    }

   return baos.toByteArray();
 }

Answer 8

回答by FuePi

I just add my solution here, as this works for me. It basically is a combination of the top two answers :)

我只是在这里添加我的解决方案，因为这对我有用。它基本上是前两个答案的组合:)

    private String convertStreamToString(InputStream is) {
    Writer w = new StringWriter();
    char[] buf = new char[1024];
    Reader r;
    is.mark(1 << 24);
    try {
        r = new BufferedReader(new InputStreamReader(is, "UTF-8"));
        int n;
        while ((n=r.read(buf)) != -1) {
            w.write(buf, 0, n);
        }
        is.reset();
    } catch(UnsupportedEncodingException e) {
        Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
    } catch(IOException e) {
        Logger.debug(this.getClass(), "Cannot convert stream to string.", e);
    }
    return w.toString();
}

Answer 9

回答by user2807207

Use below implementation for more custom use -

使用以下实现进行更多自定义使用 -

public class ReusableBufferedInputStream extends BufferedInputStream
{

    private int totalUse;
    private int used;

    public ReusableBufferedInputStream(InputStream in, Integer totalUse)
    {
        super(in);
        if (totalUse > 1)
        {
            super.mark(Integer.MAX_VALUE);
            this.totalUse = totalUse;
            this.used = 1;
        }
        else
        {
            this.totalUse = 1;
            this.used = 1;
        }
    }

    @Override
    public void close() throws IOException
    {
        if (used < totalUse)
        {
            super.reset();
            ++used;
        }
        else
        {
            super.close();
        }
    }
}

Answer 10

回答by Brice

This answer iterates on previous ones ^1|2based on the BufferInputStream. The main changes are that it allows infinite reuse. And takes care of closing the original source input stream to free-up system resources. Your OS defines a limit on those and you don't want the program to run out of file handles (That's also why you should always 'consume' responses e.g. with the apache EntityUtils.consumeQuietly()). EDITUpdated the code to handle for gready consumers that use read(buffer, offset, length), in that case it may happen that BufferedInputStreamtries hard to look at the source, this code protects against that use.

这个答案迭代以前的^{1| 2}基于BufferInputStream. 主要的变化是它允许无限重用。并负责关闭原始源输入流以释放系统资源。您的操作系统定义了这些限制，并且您不希望程序用完文件句柄（这也是为什么您应该始终“使用”响应，例如使用 apacheEntityUtils.consumeQuietly()）。编辑更新了代码以处理使用的贪婪消费者read(buffer, offset, length)，在这种情况下，可能会发生BufferedInputStream努力查看源代码的情况，此代码可防止该使用。

public class CachingInputStream extends BufferedInputStream {    
    public CachingInputStream(InputStream source) {
        super(new PostCloseProtection(source));
        super.mark(Integer.MAX_VALUE);
    }

    @Override
    public synchronized void close() throws IOException {
        if (!((PostCloseProtection) in).decoratedClosed) {
            in.close();
        }
        super.reset();
    }

    private static class PostCloseProtection extends InputStream {
        private volatile boolean decoratedClosed = false;
        private final InputStream source;

        public PostCloseProtection(InputStream source) {
            this.source = source;
        }

        @Override
        public int read() throws IOException {
            return decoratedClosed ? -1 : source.read();
        }

        @Override
        public int read(byte[] b) throws IOException {
            return decoratedClosed ? -1 : source.read(b);
        }

        @Override
        public int read(byte[] b, int off, int len) throws IOException {
            return decoratedClosed ? -1 : source.read(b, off, len);
        }

        @Override
        public long skip(long n) throws IOException {
            return decoratedClosed ? 0 : source.skip(n);
        }

        @Override
        public int available() throws IOException {
            return source.available();
        }

        @Override
        public void close() throws IOException {
            decoratedClosed = true;
            source.close();
        }

        @Override
        public void mark(int readLimit) {
            source.mark(readLimit);
        }

        @Override
        public void reset() throws IOException {
            source.reset();
        }

        @Override
        public boolean markSupported() {
            return source.markSupported();
        }
    }
}

To reuse it just close it first if it wasn't.

要重用它，如果不是，请先关闭它。

One limitation though is that if the stream is closed before the whole content of the original stream has been read, then this decorator will have incomplete data, so make sure the whole stream is read before closing.

但一个限制是，如果在读取原始流的全部内容之前关闭流，则此装饰器将具有不完整的数据，因此请确保在关闭之前读取整个流。

Java 如何缓存 InputStream 以供多次使用

提问by Azder

采纳答案by dfa

testcase

测试用例

EDIT 2

编辑 2

回答by Aaron Digulla

回答by Michael Borgwardt

回答by adrian.tarau

回答by Tomasz

回答by Daniel Kaplan

回答by Kaba Aboubacar

回答by FuePi

回答by user2807207

回答by Brice

相关推荐

最近更新

标签

Java 如何缓存 InputStream 以供多次使用

提问by Azder

采纳答案by dfa

testcase

测试用例

EDIT 2

编辑 2

回答by Aaron Digulla

回答by Michael Borgwardt

回答by adrian.tarau

回答by Tomasz

回答by Daniel Kaplan

回答by Kaba Aboubacar

回答by FuePi

回答by user2807207

回答by Brice

相关推荐

Java 在 Mac OS X 上安装 Tomcat

Java Hibernate - PropertyNotFoundException：找不到

Java 如何使用 Jackson 将 HashMap 转换为 JsonNode？

Java 实体和 DTO 的区别

相关推荐

最近更新

标签