在 Java 中解压缩 GZIPed HTTP 响应

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2474193/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:09:43  来源:igfitidea点击:

Uncompress GZIPed HTTP Response in Java

javagziphttpresponsecompression

提问by bill0ute

I'm trying to uncompress a GZIPed HTTP Response by using GZIPInputStream. However I always have the same exception when I try to read the stream : java.util.zip.ZipException: invalid bit length repeat

我正在尝试使用 .zip 解压缩 GZIPed HTTP 响应GZIPInputStream。但是,当我尝试读取流时,总是遇到相同的异常:java.util.zip.ZipException: invalid bit length repeat

My HTTP Request Header:

我的 HTTP 请求标头:

GET www.myurl.com HTTP/1.0\r\n
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; fr; rv:1.9.2) Gecko/20100115 Firefox/3.6\r\n
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8\r\n
Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3\r\n
Accept-Encoding: gzip,deflate\r\n
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7\r\n
Keep-Alive: 115\r\n
Connection: keep-alive\r\n
X-Requested-With: XMLHttpRequest\r\n
Cookie: Some Cookies\r\n\r\n

At the end of the HTTP Response header, I get path=/Content-Encoding: gzip, followed by the gziped response.

在 HTTP 响应标头的末尾,我得到path=/Content-Encoding: gzip,然后是 gzip 响应。

I tried 2 similars codes to uncompress :

我尝试了 2 个相似代码来解压缩:

UPDATE : In the following codes, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes ();

更新:在以下代码中, tBytes = (the string after 'path=/Content-Encoding: gzip').getBytes ();

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

StringBuffer  szBuffer = new StringBuffer ();

byte  tByte [] = new byte [1024];

while (true)
{
    int  iLength = gzip.read (tByte, 0, 1024); // <-- Error comes here

    if (iLength < 0)
        break;

    szBuffer.append (new String (tByte, 0, iLength));
}

And this one that I get on this forum :

我在这个论坛上得到的这个:

InputStream     gzipStream = new GZIPInputStream   (new ByteArrayInputStream (tBytes));
Reader          decoder    = new InputStreamReader (gzipStream, "UTF-8");//<- I tried ISO-8859-1 and get the same exception
BufferedReader  buffered   = new BufferedReader    (decoder);

I guess this is an encoding error.

我想这是一个编码错误。

Best regards,

此致,

bill0ute

账单

采纳答案by Wim Coenen

You don't show how you get the tBytesthat you use to set up the gzip stream here:

您没有tBytes在此处展示如何获得用于设置 gzip 流的文件:

GZIPInputStream  gzip = new GZIPInputStream (new ByteArrayInputStream (tBytes));

One explanation is that you are including the entire HTTP response in tBytes. Instead, it should be only the content after the HTTP headers.

一种解释是您将整个 HTTP 响应包含在tBytes. 相反,它应该只是 HTTP 标头之后的内容。

Another explanation is that the response is chunked.

另一种解释是响应是分块的

edit: You are taking the data after the content-encoding line as the message body. However, according to the HTTP 1.1 specification the header fields do not come in any particular order, so this is very dangerous.

编辑:您将内容编码行之后的数据作为消息正文。但是,根据 HTTP 1.1 规范,标头字段没有任何特定顺序,因此这是非常危险的。

As explained in this part of the HTTP specification, the message body of a request or response doesn't come after a particular header field but after the first empty line:

正如HTTP 规范的这一部分所解释的,请求或响应的消息正文不在特定的头字段之后,而是在第一个空行之后

Request (section 5) and Response (section 6) messages use the generic message format of RFC 822 [9] for transferring entities (the payload of the message). Both types of message consist of a start-line, zero or more header fields (also known as "headers"), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.

请求(第 5 节)和响应(第 6 节)消息使用 RFC 822 [9] 的通用消息格式来传输实体(消息的有效负载)。两种类型的消息都包含一个起始行、零个或多个头字段(也称为“头”)、一个空行(即在 CRLF 之前没有任何内容的行)指示头字段的结尾,可能还有一个邮件正文。

You still haven't show how exactly you compose tBytes, but at this point I think you're erroneously including the empty line in the data that you try to decompress. The message body starts after the CRLF characters of the empty line.

您仍然没有展示您如何准确地 compose tBytes,但此时我认为您错误地在您尝试解压缩的数据中包含了空行。消息正文在空行的 CRLF 字符之后开始。

May I suggest that you use the httpclientlibrary instead to extract the message body?

我可以建议您改用httpclient库来提取消息正文吗?

回答by Thusitha Nuwan

Well there is the problem I can see here;

好吧,我可以在这里看到问题;

int  iLength = gzip.read (tByte, 0, 1024);

Use following to fix that;

使用以下来解决这个问题;

        byte[] buff = new byte[1024];
byte[] emptyBuff = new byte[1024];
                            StringBuffer unGzipRes = new StringBuffer();

                            int byteCount = 0;
                            while ((byteCount = gzip.read(buff, 0, 1024)) > 0) {
                                // only append the buff elements that
                                // contains data
                                unGzipRes.append(new String(Arrays.copyOf(
                                        buff, byteCount), "utf-8"));

                                // empty the buff for re-usability and
                                // prevent dirty data attached at the
                                // end of the buff
                                System.arraycopy(emptyBuff, 0, buff, 0,
                                        1024);
                            }