在 Java 中解压 GZip 字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3621750/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 02:41:52  来源:igfitidea点击:

Decompress GZip string in Java

javagzip

提问by Matt

I can find plenty of functions that let you decompress a GZip file, but how do I decompress a GZip string?

我可以找到很多可以解压缩 GZip 文件的函数,但是如何解压缩 GZip 字符串呢?

I'm trying to parse a HTTP response where the response body is compressed with GZip. However, the entire response is simply stored in a string so part of the string contains binary chars.

我正在尝试解析 HTTP 响应,其中响应正文用 GZip 压缩。但是,整个响应只是存储在一个字符串中,因此部分字符串包含二进制字符。

I'm attempting to use:

我正在尝试使用:

byte responseBodyBytes[] = responseBody.getBytes();
ByteArrayInputStream bais = new ByteArrayInputStream(responseBodyBytes); 
GZIPInputStream gzis = new GZIPInputStream(bais);

But that just throws an exception: java.io.IOException: Not in GZIP format

但这只会引发异常:java.io.IOException: Not in GZIP format

回答by Jon Skeet

There's no such thing as a GZip string. GZip is binary, strings are text.

没有 GZip 字符串这样的东西。GZip 是二进制的,字符串是文本。

If you want to compress a string, you need to convert it into binary first - e.g. with OutputStreamWriterchained to a compressing OutputStream(e.g. a GZIPOutputStream)

如果你想压缩一个字符串,你需要先将它转换成二进制 - 例如OutputStreamWriter链接到压缩OutputStream(例如 a GZIPOutputStream

Likewise to readthe data, you can use an InputStreamReaderchained to a decompressing InputStream(e.g. a GZIPInputStream).

同样要读取数据,您可以使用InputStreamReader链接到解压缩InputStream(例如 a GZIPInputStream)。

One way of easily reading from a Readeris to use CharStreams.toString(Readable)from Guava, or a similar library.

从一个易于阅读的一种方法Reader是使用CharStreams.toString(Readable)番石榴,或类似的库。

回答by gb96

Ideally you should use a high-level library to handle this stuff for you. That way whenever a new version of HTTP is released, the library maintainer hopefully does all the hard work for you and you just need the updated version of the library.

理想情况下,您应该使用高级库来为您处理这些东西。这样,每当新版本的 HTTP 发布时,库维护者希望为您完成所有艰苦的工作,而您只需要更新版本的库。

That aside, it is a nice exercise to try doing it yourself.

除此之外,尝试自己做也是一个很好的练习。

Lets assume you are reading an HTTP Response as a stream of bytes from a TCP socket. If there was no gzip encoding, then putting the whole response into a String could work. However the presence of a "Content-Encoding: gzip" header means the response body will (as you noted) be binary.

假设您正在从 TCP 套接字读取 HTTP 响应作为字节流。如果没有 gzip 编码,则可以将整个响应放入字符串中。但是,“ Content-Encoding: gzip”标头的存在意味着响应主体将(如您所述)是二进制的。

You can identify the start of the response body as the first byte following the first occurrence of the String sequence "\r\n\r\n" (or the 4 bytes 0x0d, 0x0a, 0x0d, 0x0a).

您可以将响应正文的开始标识为字符串序列“\r\n\r\n”(或 4 个字节 0x0d、0x0a、0x0d、0x0a)第一次出现之后的第一个字节。

The gzip encoding has a special header, and you should test the first 3 body bytes for that:

gzip 编码有一个特殊的标头,您应该为此测试前 3 个正文字节:

                byte[] buf;  // from the HTTP Response stream
                // ... insert code here to populate buf from HTTP Response stream
                // ...
                int bodyLen = 1234;  // populate this value from 'Content-length' header
                int bodyStart = 123; // index of byte buffer where body starts
                if (bodyLen > 4 && buf[bodyStart] == 0x1f && buf[bodyStart + 1] == (byte) 0x8b && buf[bodyStart + 2] == 0x08) {
                    // gzip compressed body
                    ByteArrayInputStream bais = new ByteArrayInputStream(buf);
                    if (bodyStart > 0) bais.skip(bodyStart);

                    // Decompress the bytes
                    byte[] decompressedBytes = new byte[bodyLen * 4];
                    int decompressedDataLength = 0;
                    try {
                        // note: replace this try-catch with try-with-resources here where possible
                        GZIPInputStream gzis = new GZIPInputStream(bais);
                        decompressedDataLength = gzis.read(decompressedBytes);
                        gzis.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }

The "Not in GZIP format" error is produced by GZIPInputStream if the first 3 bytes do not match the magic GZIP header values, so testing for these will help resolve your particular issue.

如果前 3 个字节与魔术 GZIP 标头值不匹配,则 GZIPInputStream 会产生“非 GZIP 格式”错误,因此测试这些将有助于解决您的特定问题。

There is also a CRC checksum within the GZIP format, however if that is missing or incorrect you should see a different error.

GZIP 格式中还有一个 CRC 校验和,但是如果它丢失或不正确,您应该会看到不同的错误。

回答by Abbin Varghese

May be this helps :

可能这有帮助:

try (final GZIPInputStream gzipInput = new GZIPInputStream(new ByteArrayInputStream(compressedByteArray));
        final StringWriter stringWriter = new StringWriter()) {
        org.apache.commons.io.IOUtils.copy(gzipInput, stringWriter, "UTF_8");
        String decodedString = stringWriter.toString();
    } catch (IOException e) {
        throw new UncheckedIOException("Error while decompression!", e);
    }