java 我应该按什么顺序使用 GzipOutputStream 和 BufferedOutputStream
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1082320/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What order should I use GzipOutputStream and BufferedOutputStream
提问by sanity
Can anyone recommend whether I should do something like:
任何人都可以建议我是否应该做类似的事情:
os = new GzipOutputStream(new BufferedOutputStream(...));
or
或者
os = new BufferedOutputStream(new GzipOutputStream(...));
Which is more efficient? Should I use BufferedOutputStream at all?
哪个更有效率?我应该使用 BufferedOutputStream 吗?
采纳答案by Gray
What order should I use
GzipOutputStreamandBufferedOutputStream
我应该使用什么顺序
GzipOutputStream以及BufferedOutputStream
For object streams, I found that wrapping the buffered stream around the gzip stream for both input and output was almost always significantlyfaster. The smaller the objects, the better this did. Better or the same in all cases then no buffered stream.
对于对象流,我发现将缓冲流包装在 gzip 流周围的输入和输出几乎总是明显更快。物体越小,效果越好。在所有情况下更好或相同然后没有缓冲流。
ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(fis)));
oos = new ObjectOutputStream(new BufferedOutputStream(new GZIPOutputStream(fos)));
However, for text and straight byte streams, I found that it was a toss up -- with the gzip stream around the buffered stream being only slightly better. But better in all cases then no buffered stream.
然而,对于文本和直接字节流,我发现这是一个折腾——围绕缓冲流的 gzip 流只是稍微好一点。但在所有情况下都比没有缓冲流更好。
reader = new InputStreamReader(new GZIPInputStream(new BufferedInputStream(fis)));
writer = new OutputStreamWriter(new GZIPOutputStream(new BufferedOutputStream(fos)));
I ran each version 20 times and cut off the first run and averaged the rest. I also tried buffered-gzip-buffered which was slightly better for objects and worse for text. I did not play with buffer sizes at all.
我将每个版本运行了 20 次,并切断了第一次运行,并对其余部分进行了平均。我还尝试了 buffered-gzip-buffered,它对对象稍好,对文本稍差。我根本没有玩缓冲区大小。
For the object streams, I tested 2 serialized object files in the 10s of megabytes. For the larger file (38mb), it was 85% faster on reading (0.7 versus 5.6 seconds) but actually slightly slower for writing (5.9 versus 5.7 seconds). These objects had some large arrays in them which may have meant larger writes.
对于对象流,我测试了 2 个 10 兆字节的序列化对象文件。对于较大的文件 (38mb),读取速度提高了 85%(0.7 秒对 5.6 秒),但实际上写入速度稍慢(5.9 秒对 5.7 秒)。这些对象中有一些大数组,这可能意味着更大的写入。
method crc date time compressed uncompressed ratio
defla eb338650 May 19 16:59 14027543 38366001 63.4%
For the smaller file (18mb), it was 75% faster for reading (1.6 versus 6.1 seconds) and 40% faster for writing (2.8 versus 4.7 seconds). It contained a large number of small objects.
对于较小的文件 (18mb),读取速度提高 75%(1.6 秒对 6.1 秒),写入速度提高 40%(2.8 秒对 4.7 秒)。它包含了大量的小物件。
method crc date time compressed uncompressed ratio
defla 92c9d529 May 19 16:56 6676006 17890857 62.7%
For the text reader/writer I used a 64mb csv text file. The gzip stream around the buffered stream was 11% faster for reading (950 versus 1070 milliseconds) and slightly faster when writing (7.9 versus 8.1 seconds).
对于文本阅读器/编写器,我使用了 64mb csv 文本文件。围绕缓冲流的 gzip 流的读取速度提高了 11%(950 对 1070 毫秒),写入时略快(7.9 对 8.1 秒)。
method crc date time compressed uncompressed ratio
defla c6b72e34 May 20 09:16 22560860 63465800 64.5%
回答by barfuin
GZIPOutputStreamalready comes with a built-in buffer. So, there is no need to put a BufferedOutputStream right next to it in the chain. gojomo's excellent answer already provides some guidance on where to place the buffer.
GZIPOutputStream已经带有内置缓冲区。因此,无需将 BufferedOutputStream 放在链中它的旁边。gojomo 的出色回答已经提供了一些有关放置缓冲区的位置的指导。
The default buffer size for GZIPOutputStream is only 512 bytes, so you will want to increase it to 8K or even 64K via the constructor parameter. The default buffer size for BufferedOutputStream is 8K, which is why you can measure an advantage when combining the default GZIPOutputStream and BufferedOutputStream. That advantage can also be achieved by properly sizing the GZIPOutputStream's built-in buffer.
GZIPOutputStream 的默认缓冲区大小仅为 512 字节,因此您需要通过构造函数参数将其增加到 8K 甚至 64K。BufferedOutputStream 的默认缓冲区大小为 8K,这就是组合默认 GZIPOutputStream 和 BufferedOutputStream 时可以衡量优势的原因。也可以通过适当调整 GZIPOutputStream 的内置缓冲区的大小来实现这一优势。
So, to answer your question: "Should I use BufferedOutputStream at all?"→ No, in your case, you should not use it, but instead set the GZIPOutputStream's buffer to at least 8K.
所以,回答你的问题:“我应该使用 BufferedOutputStream 吗?” → 不,在您的情况下,您不应使用它,而是将 GZIPOutputStream 的缓冲区设置为至少 8K。
回答by gojomo
The buffering helps when the ultimate destination of the data is best read/written in larger chunks than your code would otherwise push it. So you generally want the buffering to be as close to the place-that-wants-larger-chunks. In your examples, that's the elided "...", so wrap the BufferedOutputStream with the GzipOutputStream. And, tune the BufferedOutputStream buffer size to match what testing shows works best with the destination.
当数据的最终目的地最好以比您的代码将其推送的更大的块读取/写入时,缓冲会有所帮助。所以你通常希望缓冲尽可能靠近想要更大块的地方。在您的示例中,这是省略的“...”,因此用 GzipOutputStream 包装 BufferedOutputStream。并且,调整 BufferedOutputStream 缓冲区大小以匹配测试显示最适合目标的内容。
I doubt the BufferedOutputStream on the outside would help much, if at all, over no explicit buffering. Why not? The GzipOutputStream will do its write()s to "..." in the same-sized chunks whether the outside buffering is present or not. So there's no optimizing for "..." possible; you're stuck with what sizes GzipOutputStream write()s.
我怀疑外部的 BufferedOutputStream 在没有显式缓冲的情况下会有所帮助(如果有的话)。为什么不?无论外部缓冲是否存在,GzipOutputStream 都会在相同大小的块中将其 write()s 写入“...”。所以不可能对“...”进行优化;您对 GzipOutputStream write()s 的大小感到困惑。
Note also that you're using memory more efficiently by buffering the compressed data rather than the uncompressed data. If your data often acheives 6X compression, the 'inside' buffer is equivalent to an 'outside' buffer 6X as big.
另请注意,通过缓冲压缩数据而不是未压缩数据,您可以更有效地使用内存。如果您的数据通常实现 6 倍压缩,则“内部”缓冲区相当于 6 倍大的“外部”缓冲区。
回答by roozbeh
Normally you want a buffer close to your FileOutputStream (assuming that's what ... represents) to avoid too many calls into the OS and frequent disk access. However, if you're writing a lot of small chunks to the GZIPOutputStream you might benefit from a buffer around GZIPOS as well. The reason being the write method in GZIPOS is synchronized and also leads to few other synchronized calls and a couple of native (JNI) calls (to update the CRC32 and do the actual compression). These all add extra overhead per call. So in that case I'd say you'll benefit from both buffers.
通常,您需要一个靠近 FileOutputStream 的缓冲区(假设这就是……所代表的内容)以避免过多调用操作系统和频繁访问磁盘。但是,如果您将大量小块写入 GZIPOutputStream,您也可能会从 GZIPOS 周围的缓冲区中受益。原因是 GZIPOS 中的 write 方法是同步的,并且还导致很少其他同步调用和几个本机 (JNI) 调用(更新 CRC32 并进行实际压缩)。这些都会增加每次调用的额外开销。因此,在这种情况下,我会说您将从两个缓冲区中受益。
回答by Peter Lawrey
I suggest you try a simple benchmark to time how long it take to compress a large file and see if it makes much difference. GzipOutputStream does have buffering but it is a smaller buffer. I would do the first with a 64K buffer, but you might find that doing both is better.
我建议你尝试一个简单的基准测试来计算压缩一个大文件需要多长时间,看看它是否有很大的不同。GzipOutputStream 确实有缓冲,但它是一个较小的缓冲区。我会用 64K 缓冲区做第一个,但你可能会发现两者都做更好。
回答by mP.
Read the javadoc, and you will discover that BIS is used to buffer bytes read from some original source. Once you get the raw bytes you want to compress them so you wrap BIS with a GIS. It makes no sense to buffer the output from a GZIP, because one needs to think what about buffering GZIP, who is going to do that ?
阅读 javadoc,您会发现 BIS 用于缓冲从某些原始来源读取的字节。获得原始字节后,您要压缩它们,以便用 GIS 包装 BIS。缓冲来自 GZIP 的输出是没有意义的,因为人们需要考虑缓冲 GZIP 怎么样,谁来做?
new GzipInputStream( new BufferedInputStream ( new FileInputXXX

