BASH 脚本:使用 wget 下载连续编号的文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1426522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:31:47  来源:igfitidea点击:

BASH script: Downloading consecutive numbered files with wget

bashscriptingwget

提问by wonderer

I have a web server that saves the logs files of a web application numbered. A file name example for this would be:

我有一个 Web 服务器,用于保存编号的 Web 应用程序的日志文件。一个文件名示例是:

dbsclog01s001.log
dbsclog01s002.log
dbsclog01s003.log

The last 3 digits are the counter and they can get sometime up to 100.

最后 3 位数字是计数器,它们有时可以达到 100。

I usually open a web browser, browse to the file like:

我通常打开一个网络浏览器,浏览到文件,如:

http://someaddress.com/logs/dbsclog01s001.log

and save the files. This of course gets a bit annoying when you get 50 logs. I tried to come up with a BASH script for using wget and passing

并保存文件。当您获得 50 个日志时,这当然会有点烦人。我试图想出一个 BASH 脚本来使用 wget 和传递

http://someaddress.com/logs/dbsclog01s*.log

but I am having problems with my the script. Anyway, anyone has a sample on how to do this?

但我的脚本有问题。无论如何,有人有关于如何做到这一点的样本吗?

thanks!

谢谢!

回答by ephemient

#!/bin/sh

if [ $# -lt 3 ]; then
        echo "Usage: 
$ ./seq_wget http://someaddress.com/logs/dbsclog01s%03d.log 1 50
url_format seq_start seq_end [wget_args]" exit fi url_format= seq_start= seq_end= shift 3 printf "$url_format\n" `seq $seq_start $seq_end` | wget -i- "$@"

Save the above as seq_wget, give it execution permission (chmod +x seq_wget), and then run, for example:

将上面的另存为seq_wget,赋予其执行权限(chmod +x seq_wget),然后运行,例如:

$ wget http://someaddress.com/logs/dbsclog01s{001..050}.log

Or, if you have Bash 4.0, you could just type

或者,如果你有 Bash 4.0,你可以输入

URL  
       The URL syntax is protocol dependent. You'll find a  detailed  descrip‐
       tion in RFC 3986.

       You  can  specify  multiple  URLs or parts of URLs by writing part sets
       within braces as in:

        http://site.{one,two,three}.com

       or you can get sequences of alphanumeric series by using [] as in:

        ftp://ftp.numericals.com/file[1-100].txt
        ftp://ftp.numericals.com/file[001-100].txt    (with leading zeros)
        ftp://ftp.letters.com/file[a-z].txt

       No nesting of the sequences is supported at the moment, but you can use
       several ones next to each other:

        http://any.org/archive[1996-1999]/vol[1-4]/part{a,b,c}.html

       You  can  specify  any amount of URLs on the command line. They will be
       fetched in a sequential manner in the specified order.

       Since curl 7.15.1 you can also specify step counter for the ranges,  so
       that you can get every Nth number or letter:

        http://www.numericals.com/file[1-100:10].txt
        http://www.letters.com/file[a-z:2].txt

Or, if you have curlinstead of wget, you could follow Dennis Williamson's answer.

或者,如果你有curl而不是wget,你可以按照丹尼斯威廉姆森的回答。

回答by Paused until further notice.

curlseems to support ranges. From the manpage:

curl似乎支持范围。从man页面:

for i in {1..999}; do
wget -k http://someaddress.com/logs/dbsclog01s$i.log -O your_local_output_dir_$i;
done

You may have noticed that it says "with leading zeros"!

您可能已经注意到它说“带前导零”!

回答by Stephan

You can use echo type sequences in the wget url to download a string of numbers...

您可以在 wget url 中使用 echo 类型序列来下载一串数字...

wget http://someaddress.com/logs/dbsclog01s00{1..3}.log

wget http://someaddress.com/logs/dbsclog01s00{1..3}.log

This also works with letters

这也适用于字母

{a..z} {A..Z}

{a..z} {A..Z}

回答by anschauung

Not sure precisely what problems you were experiencing, but it sounds like a simple for loop in bash would do it for you.

不确定您遇到了什么问题,但听起来 bash 中的一个简单 for 循环可以为您解决。

$ for i in {1..10}; do echo "http://www.com/myurl`printf "%03d" $i`.html"; done
http://www.com/myurl001.html
http://www.com/myurl002.html
http://www.com/myurl003.html
http://www.com/myurl004.html
http://www.com/myurl005.html
http://www.com/myurl006.html
http://www.com/myurl007.html
http://www.com/myurl008.html
http://www.com/myurl009.html
http://www.com/myurl010.html

回答by Mark Rushakoff

You can use a combination of a for loop in bash with the printfcommand (of course modifying echoto wgetas needed):

可以使用的组合for循环我Ñ的bash与printf的命令(当然修改echowget根据需要):

#!/bin/bash
# fixed vars
URL=http://domain.com/logs/     # URL address 'till logfile name
PREF=logprefix                  # logfile prefix (before number)
POSTF=.log                      # logfile suffix (after number)
DIGITS=3                        # how many digits logfile's number have
DLDIR=~/Downloads               # download directory
TOUT=5                          # timeout for quit
# code
for((i=1;i<10**$DIGITS;++i))
do
        file=$PREF`printf "%0${DIGITS}d" $i`$POSTF   # local file name
        dl=$URL$file                                 # full URL to download    
        echo "$dl -> $DLDIR/$file"                   # monitoring, can be commented
        wget -T $TOUT -q $dl -O $file
        if [ "$?" -ne 0 ]                            # test if we finished
        then
                exit
        fi
done

回答by igustin

Interesting task, so I wrote full script for you (combined several answers and more). Here it is:

有趣的任务,所以我为你写了完整的脚本(结合了几个答案和更多)。这里是:

for i in $(seq -f "%03g" 1 10); do wget "http://.../dbsclog${i}.log"; done

At the beggiing of the script you can set URL, log file prefix and suffix, how many digits you have in numbering part and download directory. Loop will download all logfiles it found, and automaticaly exit on first non-existant (using wget's timeout).

在脚本开始时,您可以设置 URL、日志文件前缀和后缀、编号部分和下载目录的位数。Loop 将下载它找到的所有日志文件,并在第一个不存在时自动退出(使用 wget 的超时)。

Note that this script assumes that logfile indexing starts with 1, not zero, as you mentioned in example.

请注意,此脚本假设日志文件索引从 1 开始,而不是像您在示例中提到的那样从零开始。

Hope this helps.

希望这可以帮助。

回答by Hai Vu

Check to see if your system has seq, then it would be easy:

检查您的系统是否有 seq,然后就很容易了:

for i in $(jot -w "http://.../dbsclog%03d.log" 10); do wget $i; done

If your system has the jot command instead of seq:

如果您的系统有 jot 命令而不是 seq:

for a in `seq 1 999`; do
if [ ${#a} -eq 1 ]; then
    b="00"
elif [ ${#a} -eq 2 ]; then
    b="0"
fi
echo "$a of 231"
wget -q http://site.com/path/fileprefix$b$a.jpg

回答by Doug A.K.

Oh! this is a similar problem I ran into when learning bash to automate manga downloads.

哦!这是我在学习 bash 以自动下载漫画时遇到的类似问题。

Something like this should work:

这样的事情应该工作:

#!/usr/bin/perl
$program="wget"; #change this to proz if you have it ;-)
my $count=1; #the lesson number starts from 1
my $base_url= "http://www.und.nodak.edu/org/crypto/crypto/lanaki.crypt.class/lessons/lesson";
my $format=".zip"; #the format of the file to download
my $max=24; #the total number of files to download
my $url;

for($count=1;$count<=$max;$count++) {
    if($count<10) {
    $url=$base_url."0".$count.$format; #insert a '0' and form the URL
    }
    else {
    $url=$base_url.$count.$format; #no need to insert a zero
    }
    system("$program $url");
}

done

完毕

回答by pavium

I just had a look at the wget manpage discussion of 'globbing':

我刚刚看了关于“通配”的 wget 联机帮助页讨论:

By default, globbing will be turned on if the URL contains a globbing character. This option may be used to turn globbing on or off permanently. You may have to quote the URL to protect it from being expanded by your shell. Globbing makes Wget look for a directory listing, which is system-specific. This is why it currently works only with Unix FTP servers(and the ones emulating Unix "ls" output).

默认情况下,如果 URL 包含通配符,通配​​符将被打开。此选项可用于永久打开或关闭通配符。您可能需要引用 URL 以防止它被 shell 扩展。Globbing 使 Wget 查找特定于系统的目录列表。 这就是为什么它目前仅适用于 Unix FTP 服务器(以及模拟 Unix "ls" 输出的服务器)。

So wget http://... won't work with globbing.

所以 wget http://... 不适用于通配。

回答by Carlos Tasada

Here you can find a Perl script that looks like what you want

在这里你可以找到一个看起来像你想要的 Perl 脚本

http://osix.net/modules/article/?id=677

http://osix.net/modules/article/?id=677

##代码##