Linux wget 如何仅保存从目标页面链接的页面链接到的某些文件类型?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6643475/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can wget save only certains file types linked to from pages linked to by the target page?
提问by Nomen
How can wget save only certain file types linked to from pages linked to by the target page, regardless of the domain in which the certain files are?
wget 如何只保存从目标页面链接的页面链接到的某些文件类型,而不管某些文件所在的域?
Trying to speed up a task I have to do often.
试图加快我必须经常做的任务。
I've been rooting through the wget docs and googling, but nothing seems to work. I keep on either getting just the target page or the subpages without the files (even using -H), so I'm obviously doing badly at this.
我一直在浏览 wget 文档和谷歌搜索,但似乎没有任何效果。我继续要么只获取目标页面,要么获取没有文件的子页面(甚至使用 -H),所以我显然在这方面做得很差。
So, essentially, example.com/index1/ contains links to example.com/subpage1/ and example.com/subpage2/, while the subpages contain links to example2.com/file.ext and example2.com/file2.ext, etc. However, example.com/index1.html may link to example.com/index2/ which has links to more subpages I don't want.
因此,本质上,example.com/index1/ 包含指向 example.com/subpage1/ 和 example.com/subpage2/ 的链接,而子页面包含指向 example2.com/file.ext 和 example2.com/file2.ext 等的链接. 但是,example.com/index1.html 可能会链接到 example.com/index2/,其中包含指向更多我不想要的子页面的链接。
Can wget even do this, and if not then what do you suggest I use? Thanks.
wget 甚至可以做到这一点,如果不能,那么您建议我使用什么?谢谢。
回答by ssapkota
Something like this should Work:
这样的事情应该工作:
wget --accept "*.ext" --level 2 "example.com/index1/"
回答by TheKojuEffect
Following command worked for me.
以下命令对我有用。
wget -r --accept "*.ext" --level 2 "example.com/index1/"
Need to do recursively so -r
should be added.
需要递归执行所以-r
应该添加。