java 我应该按什么顺序使用 GzipOutputStream 和 BufferedOutputStream

Question

提问by sanity

Can anyone recommend whether I should do something like:

任何人都可以建议我是否应该做类似的事情：

os = new GzipOutputStream(new BufferedOutputStream(...));

or

或者

os = new BufferedOutputStream(new GzipOutputStream(...));

Which is more efficient? Should I use BufferedOutputStream at all?

哪个更有效率？我应该使用 BufferedOutputStream 吗？

Answer 1

采纳答案by Gray

What order should I use GzipOutputStreamand BufferedOutputStream

我应该使用什么顺序GzipOutputStream以及BufferedOutputStream

For object streams, I found that wrapping the buffered stream around the gzip stream for both input and output was almost always significantlyfaster. The smaller the objects, the better this did. Better or the same in all cases then no buffered stream.

对于对象流，我发现将缓冲流包装在 gzip 流周围的输入和输出几乎总是明显更快。物体越小，效果越好。在所有情况下更好或相同然后没有缓冲流。

ois = new ObjectInputStream(new BufferedInputStream(new GZIPInputStream(fis)));
oos = new ObjectOutputStream(new BufferedOutputStream(new GZIPOutputStream(fos)));

However, for text and straight byte streams, I found that it was a toss up -- with the gzip stream around the buffered stream being only slightly better. But better in all cases then no buffered stream.

然而，对于文本和直接字节流，我发现这是一个折腾——围绕缓冲流的 gzip 流只是稍微好一点。但在所有情况下都比没有缓冲流更好。

reader = new InputStreamReader(new GZIPInputStream(new BufferedInputStream(fis)));
writer = new OutputStreamWriter(new GZIPOutputStream(new BufferedOutputStream(fos)));

I ran each version 20 times and cut off the first run and averaged the rest. I also tried buffered-gzip-buffered which was slightly better for objects and worse for text. I did not play with buffer sizes at all.

我将每个版本运行了 20 次，并切断了第一次运行，并对其余部分进行了平均。我还尝试了 buffered-gzip-buffered，它对对象稍好，对文本稍差。我根本没有玩缓冲区大小。

For the object streams, I tested 2 serialized object files in the 10s of megabytes. For the larger file (38mb), it was 85% faster on reading (0.7 versus 5.6 seconds) but actually slightly slower for writing (5.9 versus 5.7 seconds). These objects had some large arrays in them which may have meant larger writes.

对于对象流，我测试了 2 个 10 兆字节的序列化对象文件。对于较大的文件 (38mb)，读取速度提高了 85%（0.7 秒对 5.6 秒），但实际上写入速度稍慢（5.9 秒对 5.7 秒）。这些对象中有一些大数组，这可能意味着更大的写入。

method       crc     date  time    compressed    uncompressed  ratio
defla   eb338650   May 19 16:59      14027543        38366001  63.4%

For the smaller file (18mb), it was 75% faster for reading (1.6 versus 6.1 seconds) and 40% faster for writing (2.8 versus 4.7 seconds). It contained a large number of small objects.

对于较小的文件 (18mb)，读取速度提高 75%（1.6 秒对 6.1 秒），写入速度提高 40%（2.8 秒对 4.7 秒）。它包含了大量的小物件。

method       crc     date  time    compressed    uncompressed  ratio
defla   92c9d529   May 19 16:56       6676006        17890857  62.7%

For the text reader/writer I used a 64mb csv text file. The gzip stream around the buffered stream was 11% faster for reading (950 versus 1070 milliseconds) and slightly faster when writing (7.9 versus 8.1 seconds).

对于文本阅读器/编写器，我使用了 64mb csv 文本文件。围绕缓冲流的 gzip 流的读取速度提高了 11%（950 对 1070 毫秒），写入时略快（7.9 对 8.1 秒）。

method       crc     date  time    compressed    uncompressed  ratio
defla   c6b72e34   May 20 09:16      22560860        63465800  64.5%

Answer 2

回答by barfuin

GZIPOutputStreamalready comes with a built-in buffer. So, there is no need to put a BufferedOutputStream right next to it in the chain. gojomo's excellent answer already provides some guidance on where to place the buffer.

GZIPOutputStream已经带有内置缓冲区。因此，无需将 BufferedOutputStream 放在链中它的旁边。gojomo 的出色回答已经提供了一些有关放置缓冲区的位置的指导。

The default buffer size for GZIPOutputStream is only 512 bytes, so you will want to increase it to 8K or even 64K via the constructor parameter. The default buffer size for BufferedOutputStream is 8K, which is why you can measure an advantage when combining the default GZIPOutputStream and BufferedOutputStream. That advantage can also be achieved by properly sizing the GZIPOutputStream's built-in buffer.

GZIPOutputStream 的默认缓冲区大小仅为 512 字节，因此您需要通过构造函数参数将其增加到 8K 甚至 64K。BufferedOutputStream 的默认缓冲区大小为 8K，这就是组合默认 GZIPOutputStream 和 BufferedOutputStream 时可以衡量优势的原因。也可以通过适当调整 GZIPOutputStream 的内置缓冲区的大小来实现这一优势。

So, to answer your question: "Should I use BufferedOutputStream at all?"→ No, in your case, you should not use it, but instead set the GZIPOutputStream's buffer to at least 8K.

所以，回答你的问题：“我应该使用 BufferedOutputStream 吗？” → 不，在您的情况下，您不应使用它，而是将 GZIPOutputStream 的缓冲区设置为至少 8K。

Answer 3

回答by gojomo

The buffering helps when the ultimate destination of the data is best read/written in larger chunks than your code would otherwise push it. So you generally want the buffering to be as close to the place-that-wants-larger-chunks. In your examples, that's the elided "...", so wrap the BufferedOutputStream with the GzipOutputStream. And, tune the BufferedOutputStream buffer size to match what testing shows works best with the destination.

当数据的最终目的地最好以比您的代码将其推送的更大的块读取/写入时，缓冲会有所帮助。所以你通常希望缓冲尽可能靠近想要更大块的地方。在您的示例中，这是省略的“...”，因此用 GzipOutputStream 包装 BufferedOutputStream。并且，调整 BufferedOutputStream 缓冲区大小以匹配测试显示最适合目标的内容。

I doubt the BufferedOutputStream on the outside would help much, if at all, over no explicit buffering. Why not? The GzipOutputStream will do its write()s to "..." in the same-sized chunks whether the outside buffering is present or not. So there's no optimizing for "..." possible; you're stuck with what sizes GzipOutputStream write()s.

我怀疑外部的 BufferedOutputStream 在没有显式缓冲的情况下会有所帮助（如果有的话）。为什么不？无论外部缓冲是否存在，GzipOutputStream 都会在相同大小的块中将其 write()s 写入“...”。所以不可能对“...”进行优化；您对 GzipOutputStream write()s 的大小感到困惑。

Note also that you're using memory more efficiently by buffering the compressed data rather than the uncompressed data. If your data often acheives 6X compression, the 'inside' buffer is equivalent to an 'outside' buffer 6X as big.

另请注意，通过缓冲压缩数据而不是未压缩数据，您可以更有效地使用内存。如果您的数据通常实现 6 倍压缩，则“内部”缓冲区相当于 6 倍大的“外部”缓冲区。

Answer 4

回答by roozbeh

Normally you want a buffer close to your FileOutputStream (assuming that's what ... represents) to avoid too many calls into the OS and frequent disk access. However, if you're writing a lot of small chunks to the GZIPOutputStream you might benefit from a buffer around GZIPOS as well. The reason being the write method in GZIPOS is synchronized and also leads to few other synchronized calls and a couple of native (JNI) calls (to update the CRC32 and do the actual compression). These all add extra overhead per call. So in that case I'd say you'll benefit from both buffers.

通常，您需要一个靠近 FileOutputStream 的缓冲区（假设这就是……所代表的内容）以避免过多调用操作系统和频繁访问磁盘。但是，如果您将大量小块写入 GZIPOutputStream，您也可能会从 GZIPOS 周围的缓冲区中受益。原因是 GZIPOS 中的 write 方法是同步的，并且还导致很少其他同步调用和几个本机 (JNI) 调用（更新 CRC32 并进行实际压缩）。这些都会增加每次调用的额外开销。因此，在这种情况下，我会说您将从两个缓冲区中受益。

Answer 5

回答by Peter Lawrey

I suggest you try a simple benchmark to time how long it take to compress a large file and see if it makes much difference. GzipOutputStream does have buffering but it is a smaller buffer. I would do the first with a 64K buffer, but you might find that doing both is better.

我建议你尝试一个简单的基准测试来计算压缩一个大文件需要多长时间，看看它是否有很大的不同。GzipOutputStream 确实有缓冲，但它是一个较小的缓冲区。我会用 64K 缓冲区做第一个，但你可能会发现两者都做更好。

Answer 6

回答by mP.

Read the javadoc, and you will discover that BIS is used to buffer bytes read from some original source. Once you get the raw bytes you want to compress them so you wrap BIS with a GIS. It makes no sense to buffer the output from a GZIP, because one needs to think what about buffering GZIP, who is going to do that ?

阅读 javadoc，您会发现 BIS 用于缓冲从某些原始来源读取的字节。获得原始字节后，您要压缩它们，以便用 GIS 包装 BIS。缓冲来自 GZIP 的输出是没有意义的，因为人们需要考虑缓冲 GZIP 怎么样，谁来做？

new GzipInputStream( new BufferedInputStream ( new FileInputXXX

java 我应该按什么顺序使用 GzipOutputStream 和 BufferedOutputStream

提问by sanity

采纳答案by Gray

回答by barfuin

回答by gojomo

回答by roozbeh

回答by Peter Lawrey

回答by mP.

相关推荐

最近更新

标签

java 我应该按什么顺序使用 GzipOutputStream 和 BufferedOutputStream

提问by sanity

采纳答案by Gray

回答by barfuin

回答by gojomo

回答by roozbeh

回答by Peter Lawrey

回答by mP.

相关推荐

Java 相当于 .Net 的 NotSupportedException

在 Java 中将声音 (.wav/.mp3) 显示为图形

java JVM 如何确保 System.identityHashCode() 永远不会改变？

什么是 java 的 ManualResetEvent 等价物？

相关推荐

最近更新

标签