bash 使用curl时如何正确处理gzip压缩页面？

Question

提问by BryanH

I wrote a bash script that gets output from a website using curl and does a bunch of string manipulation on the html output. The problem is when I run it against a site that is returning its output gzipped. Going to the site in a browser works fine.

我编写了一个 bash 脚本，它使用 curl 从网站获取输出，并对 html 输出进行一堆字符串操作。问题是当我对一个返回 gzip 输出的站点运行它时。在浏览器中访问该站点工作正常。

When I run curl by hand, I get gzipped output:

当我手动运行 curl 时，我得到 gzip 输出：

$ curl "http://example.com"

Here's the header from that particular site:

这是来自该特定站点的标题：

HTTP/1.1 200 OK
Server: nginx
Content-Type: text/html; charset=utf-8
X-Powered-By: PHP/5.2.17
Last-Modified: Sat, 03 Dec 2011 00:07:57 GMT
ETag: "6c38e1154f32dbd9ba211db8ad189b27"
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: must-revalidate
Content-Encoding: gzip
Content-Length: 7796
Date: Sat, 03 Dec 2011 00:46:22 GMT
X-Varnish: 1509870407 1509810501
Age: 504
Via: 1.1 varnish
Connection: keep-alive
X-Cache-Svr: p2137050.pubip.peer1.net
X-Cache: HIT
X-Cache-Hits: 425

I know the returned data is gzipped, because this returns html, as expected:

我知道返回的数据是 gzip 压缩的，因为这会按预期返回 html：

$ curl "http://example.com" | gunzip

I don't want to pipe the output through gunzip, because the script works as-is on other sites, and piping through gzip would break that functionality.

我不想通过 gunzip 管道输出，因为该脚本在其他站点上按原样工作，而通过 gzip 管道会破坏该功能。

What I've tried

我试过的

changing the user-agent (I tried the same string my browser sends, "Mozilla/4.0", etc)
man curl
google search
searching stackoverflow

更改用户代理（我尝试了浏览器发送的相同字符串，“Mozilla/4.0”等）
男人卷曲
谷歌搜索
搜索堆栈溢出

Everything came up empty

一切都空空如也

Any ideas?

有任何想法吗？

Answer 1

回答by Martin

curlwill automatically decompress the response if you set the --compressedflag:

curl如果您设置--compressed标志，将自动解压缩响应：

curl --compressed "http://example.com"

--compressed(HTTP) Request a compressed response using one of the algorithms libcurl supports, and save the uncompressed document. If this option is used and the server sends an unsupported encoding, curl will report an error.

--compressed(HTTP) 使用 libcurl 支持的算法之一请求压缩响应，并保存未压缩的文档。如果使用这个选项并且服务器发送了不支持的编码，curl 会报错。

gzip is most likely supported, but you can check this by running curl -Vand looking for libzsomewhere in the "Features" line:

gzip 很可能受支持，但您可以通过在“功能”行中的某处运行curl -V并查找libz来检查这一点：

$ curl -V
...
Protocols: ...
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz

Note that it's really the website in question that is at fault here. If curldid not pass an Accept-Encoding: gziprequest header, the server should not have sent a compressed response.

请注意，这里确实是有问题的网站。如果curl没有传递Accept-Encoding: gzip请求头，则服务器不应该发送压缩响应。

bash 使用curl时如何正确处理gzip压缩页面？

提问by BryanH

What I've tried

我试过的

回答by Martin

相关推荐

最近更新

标签

bash 使用curl时如何正确处理gzip压缩页面？

提问by BryanH

What I've tried

我试过的

回答by Martin

相关推荐

bash 如何在shell脚本中调用ioctl？

以静默方式检查 bash 脚本中是否存在 rpm

bash 匹配前后的 Grep 字符？

bash 使用 linux 'file' 命令确定类型（即图像、音频或视频）

相关推荐

最近更新

标签