bash 使用curl时如何正确处理gzip压缩页面?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8364640/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 21:16:46  来源:igfitidea点击:

How to properly handle a gzipped page when using curl?

bashcurlgzip

提问by BryanH

I wrote a bash script that gets output from a website using curl and does a bunch of string manipulation on the html output. The problem is when I run it against a site that is returning its output gzipped. Going to the site in a browser works fine.

我编写了一个 bash 脚本,它使用 curl 从网站获取输出,并对 html 输出进行一堆字符串操作。问题是当我对一个返回 gzip 输出的站点运行它时。在浏览器中访问该站点工作正常。

When I run curl by hand, I get gzipped output:

当我手动运行 curl 时,我得到 gzip 输出:

$ curl "http://example.com"

Here's the header from that particular site:

这是来自该特定站点的标题:

HTTP/1.1 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
X-Powered-By: PHP/5.2.17
Last-Modified: Sat, 03 Dec 2011 00:07:57 GMT
ETag: "6c38e1154f32dbd9ba211db8ad189b27"
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: must-revalidate
Content-Encoding: gzip
Content-Length: 7796
Date: Sat, 03 Dec 2011 00:46:22 GMT
X-Varnish: 1509870407 1509810501
Age: 504
Via: 1.1 varnish
Connection: keep-alive
X-Cache-Svr: p2137050.pubip.peer1.net
X-Cache: HIT
X-Cache-Hits: 425

I know the returned data is gzipped, because this returns html, as expected:

我知道返回的数据是 gzip 压缩的,因为这会按预期返回 html:

$ curl "http://example.com" | gunzip

I don't want to pipe the output through gunzip, because the script works as-is on other sites, and piping through gzip would break that functionality.

我不想通过 gunzip 管道输出,因为该脚本在其他站点上按原样工作,而通过 gzip 管道会破坏该功能。

What I've tried

我试过的

  1. changing the user-agent (I tried the same string my browser sends, "Mozilla/4.0", etc)
  2. man curl
  3. google search
  4. searching stackoverflow
  1. 更改用户代理(我尝试了浏览器发送的相同字符串,“Mozilla/4.0”等)
  2. 男人卷曲
  3. 谷歌搜索
  4. 搜索堆栈溢出

Everything came up empty

一切都空空如也

Any ideas?

有任何想法吗?

回答by Martin

curlwill automatically decompress the response if you set the --compressedflag:

curl如果您设置--compressed标志,将自动解压缩响应:

curl --compressed "http://example.com"

--compressed(HTTP) Request a compressed response using one of the algorithms libcurl supports, and save the uncompressed document. If this option is used and the server sends an unsupported encoding, curl will report an error.

--compressed(HTTP) 使用 libcurl 支持的算法之一请求压缩响应,并保存未压缩的文档。如果使用这个选项并且服务器发送了不支持的编码,curl 会报错。

gzip is most likely supported, but you can check this by running curl -Vand looking for libzsomewhere in the "Features" line:

gzip 很可能受支持,但您可以通过在“功能”行中的某处运行curl -V并查找libz来检查这一点:

$ curl -V
...
Protocols: ...
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz 


Note that it's really the website in question that is at fault here. If curldid not pass an Accept-Encoding: gziprequest header, the server should not have sent a compressed response.

请注意,这里确实是有问题的网站。如果curl没有传递Accept-Encoding: gzip请求头,则服务器不应该发送压缩响应。