Linux wget:不要遵循重定向

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2662943/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 19:57:23  来源:igfitidea点击:

wget: don't follow redirects

linuxhttpbashredirectwget

提问by flybywire

How do I prevent wgetfrom following redirects?

如何防止wget跟随重定向?

回答by Matt

--max-redirect 0

--max-redirect 0

I haven't tried this, it will either allow none or allow infinite..

我没有试过这个,它要么不允许,要么允许无限..

回答by Pekka

Some versions of wgethave a --max-redirectoption: See here

某些版本wget有一个--max-redirect选项:见这里

回答by Paused until further notice.

Use curlwithout -Linstead of wget. Omitting that option when using curlprevents the redirect from being followed.

使用curlwithout-L而不是wget。使用时省略该选项curl可防止重定向被跟踪。

If you use curl -I <URL>then you'll get the headers instead of the redirect HTML.

如果您使用,curl -I <URL>那么您将获得标题而不是重定向 HTML。

If you use curl -IL <URL>then you'll get the headers for the URL, plus those for the URL you're redirected to.

如果您使用,curl -IL <URL>那么您将获得 URL 的标头,以及您重定向到的 URL 的标头。

回答by Tim McNamara

wget follows up to 20 redirects by default. However, it does not span hosts. If you have asked wget to download example.com, it will not touch any resources at www.example.com. wget will detect this as a request to span to another host and decide against it.

默认情况下,wget 最多跟踪 20 个重定向。但是,它不跨主机。如果你已经要求 wget 下载example.com,它不会触及任何资源www.example.com。wget 会将此检测为跨越到另一台主机的请求并决定反对它。

In short, you should probably be executing:

简而言之,您可能应该执行:

wget --mirror www.example.com

Rather than

而不是

wget --mirror example.com

Now let's say the owner of www.example.comhas several subdomains at example.comand we are interested in all of them. How to proceed?

现在假设 的所有者www.example.com有几个子域,example.com我们对所有子域都感兴趣。如何进行?

Try this:

尝试这个:

wget --mirror --domains=example.com example.com

wget will now visit all subdomains of example.com, including m.example.comand www.example.com.

wget 现在将访问 example.com 的所有子域,包括m.example.comwww.example.com

回答by Mike Nakis

In general, it is not a good idea to depend on a specific number of redirects.

通常,依赖特定数量的重定向并不是一个好主意。

For example, in order to download IntellijIdea, the URL that is promised to always resolve to the latest version of Community Edition for Linux is something like https://download.jetbrains.com/product?code=IIC&latest&distribution=linux, but if you visit that URL nowadays, you are going to be redirected twice (2 times) before you reach the actual downloadable file. In the future you might be redirected three times, or not at all.

例如,为了下载 IntellijIdea,承诺始终解析为 Linux 社区版最新版本的https://download.jetbrains.com/product?code=IIC&latest&distribution=linuxURL类似于,但是如果您现在访问该 URL,您将被重定向两次(2 次)之前您到达实际的可下载文件。将来您可能会被重定向三次,或者根本不会。

The way to solve this problem is with the use of the HTTP HEAD verb. Here is how I solved it in the case of IntellijIdea:

解决这个问题的方法是使用 HTTP HEAD 动词。以下是我在 IntellijIdea 的情况下解决它的方法:

# This is the starting URL.
URL="https://download.jetbrains.com/product?code=IIC&latest&distribution=linux"
echo "URL: $URL"

# Issue HEAD requests until the actual target is found.
# The result contains the target location, among some irrelevant stuff.
LOC=$(wget --no-verbose --method=HEAD --output-file - $URL)
echo "LOC: $LOC"

# Extract the URL from the result, stripping the irrelevant stuff.
URL=$(cut "--delimiter= " --fields=4 <<< "$LOC")
echo "URL: $URL"

# Optional: download the actual file.
wget "$URL"