bash 多个带有 404 检测的 cURL 下载

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24966975/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 10:58:02  来源:igfitidea点击:

multiple cURL downloading with 404 detection

bashshellcurlhttp-status-code-404

提问by natehome

I'm trying to write a shell script that downloads files using cURL and detects if a url leads to a 404 error. If a url is a 404 then I want to save the url link or filename to a text file.

我正在尝试编写一个 shell 脚本,该脚本使用 cURL 下载文件并检测 url 是否导致 404 错误。如果 url 是 404,那么我想将 url 链接或文件名保存到文本文件中。

The url format is http://server.com/somefile[00-31].txt

url 格式为http://server.com/somefile[00-31].txt

I've been messing around with what I've found on google and currently have the following code:

我一直在搞乱我在谷歌上找到的东西,目前有以下代码:

#!/bin/bash

if curl --fail -O "http://server.com/somefile[00-31].mp4"
then
  echo "$var" >> "files.txt"    
else
  echo "Saved!"
fi

回答by denisvm

#!/bin/bash

URLFORMAT="http://server.com/somefile%02d.txt"

for num in {0..31}; do
    # build url and try to download
    url=$(printf "$URLFORMAT" $num)
    CURL=$(curl --fail -O "$url" 2>&1)

    # check if error and 404, and save in file
    if [ $? -ne 0 ]; then
        echo $CURL | grep --quiet 'The requested URL returned error: 404'
        [ $? -eq 0 ] && echo "$url" >> "files.txt"
    fi

done

Here is a version which uses the arguments for URLs:

这是一个使用 URL 参数的版本:

#!/bin/sh

for url in "$@"; do
    CURL=$(curl --fail -O "$url" 2>&1)
    # check if error and 404, and save in file
    if [ $? -ne 0 ]; then
        echo $CURL | grep --quiet 'The requested URL returned error: 404'
        [ $? -eq 0 ] && echo "$url" >> "files.txt"
    fi
done

You can use this like: sh download-script.sh http://server.com/files{00..21}.png http://server.com/otherfiles{00..12}.gif

你可以这样使用:sh download-script.sh http://server.com/files{00..21}.png http://server.com/otherfiles{00..12}.gif

The range expansion will work on bash shells.

范围扩展将适用于 bash shell。

回答by Stephen Quan

You can use the -D, --dump-header <file>option to capture all headers, including Content-Type and HTTP Status Code. Note that the headers may be terminated with a DOS newline (CR LF), so you may want to strip the CR character.

您可以使用该-D, --dump-header <file>选项来捕获所有标头,包括 Content-Type 和 HTTP 状态代码。请注意,标头可能以 DOS 换行符 (CR LF) 终止,因此您可能需要去除 CR 字符。

curl -s -D headers.txt -o out.dat "http://server.com/somefile[00-31].mp4"
httpStatus=$(head -1 headers.txt | awk '{print }')
contentType=$(grep "Content-Type:" headers.txt | tr -d '\r')
contentType=${contentType#*: }
if [ "$httpStatus" != "200" ]; then
    echo "FAILED - HTTP STATUS $httpStatus"
else
    echo "SAVED"
fi