Html 如何下载包含所有文件和子目录的 HTTP 目录,因为它们出现在在线文件/文件夹列表中?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23446635/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to download HTTP directory with all files and sub-directories as they appear on the online files/folders list?
提问by Omar
There is an online HTTP directory that I have access to. I have tried to download all sub-directories and files via wget
. But, the problem is that when wget
downloads sub-directories it downloads the index.html
file which contains the list of files in that directory without downloading the files themselves.
有一个我可以访问的在线 HTTP 目录。我试图通过wget
. 但是,问题在于,当wget
下载子目录时,它会下载index.html
包含该目录中文件列表的文件,而不下载文件本身。
Is there a way to download the sub-directories and files without depth limit (as if the directory I want to download is just a folder which I want to copy to my computer).
有没有办法在没有深度限制的情况下下载子目录和文件(好像我要下载的目录只是我想复制到我的计算机的文件夹)。
回答by Mingjiang Shi
Solution:
解决方案:
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
Explanation:
解释:
- It will download all files and subfolders in ddddirectory
-r
: recursively-np
: not going to upper directories, like ccc/…-nH
: not saving files to hostname folder--cut-dirs=3
: but saving it to dddby omitting first 3 folders aaa, bbb, ccc-R index.html
: excluding index.htmlfiles
- 它将下载ddd目录中的所有文件和子文件夹
-r
: 递归-np
: 不去上层目录,比如ccc/...-nH
: 不将文件保存到主机名文件夹--cut-dirs=3
: 但通过省略前 3 个文件夹aaa, bbb, ccc将其保存到ddd-R index.html
: 排除index.html文件
参考:http: //bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/
回答by mateuscb
I was able to get this to work thanks to this postutilizing VisualWGet. It worked great for me. The important part seems to be to check the -recursive
flag (see image).
由于这篇使用VisualWGet 的帖子,我能够让它工作。它对我很有用。重要的部分似乎是检查-recursive
标志(见图)。
Also found that the -no-parent
flag is important, othewise it will try to download everything.
还发现-no-parent
标志很重要,否则它会尝试下载所有内容。
回答by Natalie Ng
wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/
From man wget
从 man wget
‘-r'‘--recursive'Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.
'-r' '--recursive'打开递归检索。有关更多详细信息,请参阅递归下载。默认最大深度为 5。
‘-np' ‘--no-parent'Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.
'-np' '--no-parent'递归检索时不要上升到父目录。这是一个有用的选项,因为它保证只下载特定层次结构以下的文件。有关更多详细信息,请参阅基于目录的限制。
‘-nH' ‘--no-host-directories'Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/' will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
'-nH' '--no-host-directories'禁用主机前缀目录的生成。默认情况下,使用“-r http://fly.srk.fer.hr/”调用 Wget将创建一个以 fly.srk.fer.hr/ 开头的目录结构。此选项禁用此类行为。
‘--cut-dirs=number'Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
'--cut-dirs=number'忽略编号目录组件。这对于对保存递归检索的目录进行细粒度控制很有用。
Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/'. If you retrieve it with ‘-r', it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH' option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs' comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs' option works.
以“ ftp://ftp.xemacs.org/pub/xemacs/”中的目录为例。如果您使用“-r”检索它,它将在本地保存在 ftp.xemacs.org/pub/xemacs/ 下。虽然 '-nH' 选项可以删除 ftp.xemacs.org/ 部分,但您仍然停留在 pub/xemacs。这就是“--cut-dirs”派上用场的地方;它使 Wget 无法“看到”数字远程目录组件。以下是“--cut-dirs”选项如何工作的几个示例。
No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> .
没有选项 -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> 。
--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd' and ‘-P'. However, unlike ‘-nd', ‘--cut-dirs' does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1', a beta/ subdirectory will be placed to xemacs/beta, as one would expect.
--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... 如果你只是想摆脱目录结构,这个选项类似于'-nd'和'-P'的组合。然而,与 '-nd' 不同,'--cut-dirs' 不会丢失子目录——例如,使用 '-nH --cut-dirs=1',一个 beta/ 子目录将被放置到 xemacs/beta,如人们会期待。
回答by Moscarda
wget
is an invaluable resource and something I use myself. However sometimes there are characters in the address that wget
identifies as syntax errors. I'm sure there is a fix for that, but as this question did not ask specifically about wget
I thought I would offer an alternative for those people who will undoubtedly stumble upon this page looking for a quick fix with no learning curve required.
wget
是一种宝贵的资源,也是我自己使用的东西。但是,有时地址中的字符会wget
标识为语法错误。我确信有一个解决方案,但是由于这个问题没有具体询问wget
我想我会为那些毫无疑问会偶然发现此页面寻找快速修复而无需学习曲线的人提供替代方案。
There are a few browser extensions that can do this, but most require installing download managers, which aren't always free, tend to be an eyesore, and use a lot of resources. Heres one that has none of these drawbacks:
有一些浏览器扩展可以做到这一点,但大多数都需要安装下载管理器,这些管理器并不总是免费的,往往很碍眼,并且会使用大量资源。这是一个没有这些缺点的:
"Download Master" is an extension for Google Chrome that works great for downloading from directories. You can choose to filter which file-types to download, or download the entire directory.
“下载大师”是谷歌浏览器的扩展,非常适合从目录下载。您可以选择过滤要下载的文件类型,或下载整个目录。
https://chrome.google.com/webstore/detail/download-master/dljdacfojgikogldjffnkdcielnklkce
https://chrome.google.com/webstore/detail/download-master/dljdacfojgikogldjffnkdcielnklkce
For an up-to-date feature list and other information, visit the project page on the developer's blog:
有关最新的功能列表和其他信息,请访问开发人员博客上的项目页面:
回答by Rushikesh Tade
You can use thisFirefox addon to download all files in HTTP Directory.
您可以使用此Firefox 插件下载 HTTP 目录中的所有文件。
https://addons.mozilla.org/en-US/firefox/addon/http-directory-downloader/
https://addons.mozilla.org/en-US/firefox/addon/http-directory-downloader/
回答by T.Todua
No Software or Plugin required!
无需软件或插件!
(only usable if you don't need recursive deptch)
(仅当您不需要递归部门时才可用)
Use bookmarklet. Drag this linkin bookmarks, then edit and paste this code:
使用书签。拖动此链接的书签,然后编辑并粘贴此代码:
(function(){ var arr=[], l=document.links; var ext=prompt("select extension for download (all links containing that, will be downloaded.", ".mp3"); for(var i=0; i<l.length; i++) { if(l[i].href.indexOf(ext) !== false){ l[i].setAttribute("download",l[i].text); l[i].click(); } } })();
and go on page (from where you want to download files), and click that bookmarklet.
并转到页面(从您要下载文件的位置),然后单击该书签。
回答by nwgat
you can use lftp, the swish army knife of downloading if you have bigger files you can add --use-pget-n=10
to command
如果你有更大的文件可以添加--use-pget-n=10
到命令中,你可以使用 lftp,这是下载的大军刀
lftp -c 'mirror --parallel=100 https://example.com/files/ ;exit'
回答by Byte Bitter
wget generally works in this way, but some sites may have problems and it may create too many unnecessary html files. In order to make this work easier and to prevent unnecessary file creation, I am sharing my getwebfolder script, which is the first linux script I wrote for myself. This script downloads all content of a web folder entered as parameter.
wget 通常以这种方式工作,但某些站点可能会出现问题,并且可能会创建过多不必要的 html 文件。为了使这项工作更容易并防止创建不必要的文件,我分享了我的 getwebfolder 脚本,这是我为自己编写的第一个 linux 脚本。此脚本下载作为参数输入的 Web 文件夹的所有内容。
When you try to download an open web folder by wget which contains more then one file, wget downloads a file named index.html. This file contains a file list of the web folder. My script converts file names written in index.html file to web addresses and downloads them clearly with wget.
当您尝试通过 wget 下载包含多个文件的打开 Web 文件夹时,wget 会下载名为 index.html 的文件。此文件包含 Web 文件夹的文件列表。我的脚本将 index.html 文件中写入的文件名转换为网址,并使用 wget 清楚地下载它们。
Tested at Ubuntu 18.04 and Kali Linux, It may work at other distros as well.
在 Ubuntu 18.04 和 Kali Linux 上测试,它也可以在其他发行版上工作。
Usage :
用法 :
extract getwebfolder file from zip file provided below
chmod +x getwebfolder
(only for first time)./getwebfolder webfolder_URL
从下面提供的 zip 文件中提取 getwebfolder 文件
chmod +x getwebfolder
(仅限第一次)./getwebfolder webfolder_URL
such as ./getwebfolder http://example.com/example_folder/
如 ./getwebfolder http://example.com/example_folder/