bash 从网站下载图片

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10442841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 02:11:28  来源:igfitidea点击:

Download images from website

imagebashwget

提问by kev

I want to have a local copy of a gallery on a website. The gallery shows the pictures at domain.com/id/1 (id increases in increments of 1) and then the image is stored at pics.domain.com/pics/original/image.format. The exact line that the images have in the HTML are

我想在网站上有一个画廊的本地副本。图库在 domain.com/id/1(id 以 1 为增量增加)显示图片,然后图像存储在 pics.domain.com/pics/original/image.format。图像在 HTML 中的确切行是

<div id="bigwall" class="right"> 
    <img border=0 src='http://pics.domain.com/pics/original/image.jpg' name='pic' alt='' style='top: 0px; left: 0px; margin-top: 50px; height: 85%;'> 
</div>

So I want to write a script that does something like this (in pseudo-code):

所以我想写一个脚本来做这样的事情(用伪代码):

for(id = 1; id <= 151468; id++) {
     page = "http://domain.com/id/" + id.toString();
     src = returnSrc(); // Searches the html for img with name='pic' and saves the image location as a string
     getImg(); // Downloads the file named in src
}

I'm not sure exactly how to do this though. I suppose I could do it in bash, using wget to download the html and then search the html manually for http://pics.domain.com/pics/original/.then use wget again to save the file, remove the html file, increment the id and repeat. The only thing is I'm not good at handling strings, so if anyone could tell me how to search for the url and replace the *s with the file name and format I should be able to get the rest going. Or if my method is stupid and you have a better one please share.

我不确定如何做到这一点。我想我可以在 bash 中完成,使用 wget 下载 html,然后手动搜索 html 以查找http://pics.domain.com/pics/original/ 然后再次使用 wget 保存文件,删除 html 文件,增加 id 并重复。唯一的问题是我不擅长处理字符串,所以如果有人能告诉我如何搜索 url 并将 *s 替换为文件名和格式,我应该能够完成其余的工作。或者,如果我的方法很愚蠢,而您有更好的方法,请分享。

回答by kev

# get all pages
curl 'http://domain.com/id/[1-151468]' -o '#1.html'

# get all images
grep -oh 'http://pics.domain.com/pics/original/.*jpg' *.html >urls.txt

# download all images
sort -u urls.txt | wget -i-