Linux wget:不要遵循重定向
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2662943/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
wget: don't follow redirects
提问by flybywire
How do I prevent wgetfrom following redirects?
如何防止wget跟随重定向?
回答by Matt
--max-redirect 0
--max-redirect 0
I haven't tried this, it will either allow none or allow infinite..
我没有试过这个,它要么不允许,要么允许无限..
回答by Paused until further notice.
Use curl
without -L
instead of wget
. Omitting that option when using curl
prevents the redirect from being followed.
使用curl
without-L
而不是wget
。使用时省略该选项curl
可防止重定向被跟踪。
If you use curl -I <URL>
then you'll get the headers instead of the redirect HTML.
如果您使用,curl -I <URL>
那么您将获得标题而不是重定向 HTML。
If you use curl -IL <URL>
then you'll get the headers for the URL, plus those for the URL you're redirected to.
如果您使用,curl -IL <URL>
那么您将获得 URL 的标头,以及您重定向到的 URL 的标头。
回答by Tim McNamara
wget follows up to 20 redirects by default. However, it does not span hosts. If you have asked wget to download example.com
, it will not touch any resources at www.example.com
. wget will detect this as a request to span to another host and decide against it.
默认情况下,wget 最多跟踪 20 个重定向。但是,它不跨主机。如果你已经要求 wget 下载example.com
,它不会触及任何资源www.example.com
。wget 会将此检测为跨越到另一台主机的请求并决定反对它。
In short, you should probably be executing:
简而言之,您可能应该执行:
wget --mirror www.example.com
Rather than
而不是
wget --mirror example.com
Now let's say the owner of www.example.com
has several subdomains at example.com
and we are interested in all of them. How to proceed?
现在假设 的所有者www.example.com
有几个子域,example.com
我们对所有子域都感兴趣。如何进行?
Try this:
尝试这个:
wget --mirror --domains=example.com example.com
wget will now visit all subdomains of example.com, including m.example.com
and www.example.com
.
wget 现在将访问 example.com 的所有子域,包括m.example.com
和www.example.com
。
回答by Mike Nakis
In general, it is not a good idea to depend on a specific number of redirects.
通常,依赖特定数量的重定向并不是一个好主意。
For example, in order to download IntellijIdea, the URL that is promised to always resolve to the latest version of Community Edition for Linux is something like https://download.jetbrains.com/product?code=IIC&latest&distribution=linux
, but if you visit that URL nowadays, you are going to be redirected twice (2 times) before you reach the actual downloadable file. In the future you might be redirected three times, or not at all.
例如,为了下载 IntellijIdea,承诺始终解析为 Linux 社区版最新版本的https://download.jetbrains.com/product?code=IIC&latest&distribution=linux
URL类似于,但是如果您现在访问该 URL,您将被重定向两次(2 次)之前您到达实际的可下载文件。将来您可能会被重定向三次,或者根本不会。
The way to solve this problem is with the use of the HTTP HEAD verb. Here is how I solved it in the case of IntellijIdea:
解决这个问题的方法是使用 HTTP HEAD 动词。以下是我在 IntellijIdea 的情况下解决它的方法:
# This is the starting URL.
URL="https://download.jetbrains.com/product?code=IIC&latest&distribution=linux"
echo "URL: $URL"
# Issue HEAD requests until the actual target is found.
# The result contains the target location, among some irrelevant stuff.
LOC=$(wget --no-verbose --method=HEAD --output-file - $URL)
echo "LOC: $LOC"
# Extract the URL from the result, stripping the irrelevant stuff.
URL=$(cut "--delimiter= " --fields=4 <<< "$LOC")
echo "URL: $URL"
# Optional: download the actual file.
wget "$URL"