是否有curl / wget选项,该选项说在HTTP错误时不保存文件?

时间:2020-03-06 14:19:57  来源:igfitidea点击:

我想在脚本中下载很多URL,但是我不想保存那些导致HTTP错误的URL。

据我在手册页上看到的,curl和wget都不提供这种功能。
有人知道谁做的另一个下载器吗?

解决方案

我认为curl-f选项可以满足需求:

-f, --fail
  
  (HTTP) Fail silently (no output at all) on server errors. This is mostly done to  better
            enable  scripts  etc  to  better deal with failed attempts. In normal cases when an HTTP
            server fails to deliver a document, it returns an HTML document stating so (which  often
            also  describes  why  and  more).  This  flag will prevent curl from outputting that and
            return error 22. [...]

但是,如果响应实际上是301或者302重定向,则即使该响应的目的地将导致错误,该响应仍会保存:

$ curl -fO http://google.com/aoeu
$ cat aoeu
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>301 Moved</TITLE></HEAD><BODY>
<H1>301 Moved</H1>
The document has moved
<A HREF="http://www.google.com/aoeu">here</A>.
</BODY></HTML>

要遵循重定向到死胡同,还可以使用-L选项:

-L, --location
  
  (HTTP/HTTPS)  If  the  server  reports  that the requested page has moved to a different
                location (indicated with a Location: header and a 3XX response code), this  option  will
                make  curl redo the request on the new place. [...]

我只是为此目的而设置的一种衬板:

(仅适用于单个文件,可能对其他文件有用)

A=$$; ( wget -q "http://foo.com/pipo.txt" -O $A.d && mv $A.d pipo.txt ) || (rm $A.d; echo "Removing temp file")

这将尝试从远程主机下载文件。如果出现错误,则不保留文件。在所有其他情况下,将保留并重命名。