java 尽管 CRC 错误,强制 gzip 解压缩
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13149751/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Force gzip to decompress despite CRC error
提问by user1777900
I think there's a way to do this but I'm not sure how? Basically, I was writing a compression program that resulted in a crc error when I tried to unzip the compressed data. Normally this means that the decompressor actually recognized my data as being in the right format and decompressed it, but when it compared the result to the expected length as indicated by the CRC, they weren't the same.
我认为有一种方法可以做到这一点,但我不确定如何?基本上,我正在编写一个压缩程序,当我尝试解压缩压缩数据时,该程序导致了 crc 错误。通常这意味着解压缩器实际上将我的数据识别为正确的格式并对其进行解压缩,但是当它将结果与 CRC 指示的预期长度进行比较时,它们并不相同。
However, for comparison reasons, I actually do want to see the output to see if it's just a concatenation issue (which should be relatively obvious if the decompressed output isn't gibberish but just in the wrong order).
但是,出于比较的原因,我实际上确实想查看输出以查看它是否只是连接问题(如果解压缩的输出不是乱码而是顺序错误,则应该相对明显)。
回答by Mark Adler
You said "unzip", but the question says "gzip". Which is it? Those are two different programs that operate on two different formats. I will assume gzip. Also the length is not "indicated by the CRC". The gzip trailer contains a CRC and an uncompressed length (modulo 232), which are two different things.
您说的是“解压缩”,但问题是“gzip”。是哪个?这是两个不同的程序,它们以两种不同的格式运行。我将假设 gzip。此外,长度不是“由 CRC 指示”。gzip 预告片包含一个 CRC 和一个未压缩的长度(模 2 32),这是两个不同的东西。
The gzip
command will decompress all valid deflate data and write it out before checking the crc. So if, for example, I take a .gz
file and corrupt just the crc (or length) at the end, and do:
该gzip
命令将解压所有有效的 deflate 数据并在检查 crc 之前将其写出。因此,例如,如果我获取一个.gz
文件并在最后损坏了 crc(或长度),然后执行以下操作:
gzip -dc < corrupt.gz > result
then result will be the entire, correct uncompressed data stream. There is no need to modify and recompile gzip
, nor to write your own ungzipper. gzip will complain about the crc, but all of the data will be written nevertheless.
那么结果将是完整的、正确的未压缩数据流。无需修改和重新编译gzip
,也无需编写自己的解压缩器。gzip 会抱怨 crc,但仍然会写入所有数据。
回答by Neil Coffey
As far as I'm aware, the CRC check is part of the GZIP wrapper, not part of the actual compressed data in DEFLATE format.
据我所知,CRC 校验是 GZIP 包装器的一部分,而不是 DEFLATE 格式的实际压缩数据的一部分。
So you should be able to take literally just the bytes that are the compressed data stream, ignoring the GZIP header and CRC at the end, and pass it through an Inflater.
因此,您应该能够从字面上获取压缩数据流的字节,忽略最后的 GZIP 标头和 CRC,并将其通过 Inflater。
In other words, you need to take just the bytes corresponding to those referred to as "compressed blocks" in the GZIP File format specificationand try to decompress using a Java Inflater object. A little bit of work but possibly less than re-compiling the GZIP code as Greg suggests (though his option would also work in principle).
换句话说,您只需要获取与GZIP 文件格式规范中称为“压缩块”的字节相对应的字节,并尝试使用 Java Inflater 对象进行解压缩。需要做一点工作,但可能比 Greg 建议的重新编译 GZIP 代码要少(尽管他的选择原则上也可行)。