bash 多个带有 404 检测的 cURL 下载
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24966975/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
multiple cURL downloading with 404 detection
提问by natehome
I'm trying to write a shell script that downloads files using cURL and detects if a url leads to a 404 error. If a url is a 404 then I want to save the url link or filename to a text file.
我正在尝试编写一个 shell 脚本,该脚本使用 cURL 下载文件并检测 url 是否导致 404 错误。如果 url 是 404,那么我想将 url 链接或文件名保存到文本文件中。
The url format is http://server.com/somefile[00-31].txt
url 格式为http://server.com/somefile[00-31].txt
I've been messing around with what I've found on google and currently have the following code:
我一直在搞乱我在谷歌上找到的东西,目前有以下代码:
#!/bin/bash
if curl --fail -O "http://server.com/somefile[00-31].mp4"
then
echo "$var" >> "files.txt"
else
echo "Saved!"
fi
回答by denisvm
#!/bin/bash
URLFORMAT="http://server.com/somefile%02d.txt"
for num in {0..31}; do
# build url and try to download
url=$(printf "$URLFORMAT" $num)
CURL=$(curl --fail -O "$url" 2>&1)
# check if error and 404, and save in file
if [ $? -ne 0 ]; then
echo $CURL | grep --quiet 'The requested URL returned error: 404'
[ $? -eq 0 ] && echo "$url" >> "files.txt"
fi
done
Here is a version which uses the arguments for URLs:
这是一个使用 URL 参数的版本:
#!/bin/sh
for url in "$@"; do
CURL=$(curl --fail -O "$url" 2>&1)
# check if error and 404, and save in file
if [ $? -ne 0 ]; then
echo $CURL | grep --quiet 'The requested URL returned error: 404'
[ $? -eq 0 ] && echo "$url" >> "files.txt"
fi
done
You can use this like: sh download-script.sh http://server.com/files{00..21}.png http://server.com/otherfiles{00..12}.gif
你可以这样使用:sh download-script.sh http://server.com/files{00..21}.png http://server.com/otherfiles{00..12}.gif
The range expansion will work on bash shells.
范围扩展将适用于 bash shell。
回答by Stephen Quan
You can use the -D, --dump-header <file>
option to capture all headers, including Content-Type and HTTP Status Code. Note that the headers may be terminated with a DOS newline (CR LF), so you may want to strip the CR character.
您可以使用该-D, --dump-header <file>
选项来捕获所有标头,包括 Content-Type 和 HTTP 状态代码。请注意,标头可能以 DOS 换行符 (CR LF) 终止,因此您可能需要去除 CR 字符。
curl -s -D headers.txt -o out.dat "http://server.com/somefile[00-31].mp4"
httpStatus=$(head -1 headers.txt | awk '{print }')
contentType=$(grep "Content-Type:" headers.txt | tr -d '\r')
contentType=${contentType#*: }
if [ "$httpStatus" != "200" ]; then
echo "FAILED - HTTP STATUS $httpStatus"
else
echo "SAVED"
fi