Html 如何下载包含所有文件和子目录的 HTTP 目录，因为它们出现在在线文件/文件夹列表中？

Question

提问by Omar

There is an online HTTP directory that I have access to. I have tried to download all sub-directories and files via wget. But, the problem is that when wgetdownloads sub-directories it downloads the index.htmlfile which contains the list of files in that directory without downloading the files themselves.

有一个我可以访问的在线 HTTP 目录。我试图通过wget. 但是，问题在于，当wget下载子目录时，它会下载index.html包含该目录中文件列表的文件，而不下载文件本身。

Is there a way to download the sub-directories and files without depth limit (as if the directory I want to download is just a folder which I want to copy to my computer).

有没有办法在没有深度限制的情况下下载子目录和文件（好像我要下载的目录只是我想复制到我的计算机的文件夹）。

online HTTP directory

在线HTTP目录

Answer 1

回答by Mingjiang Shi

Solution:

解决方案：

wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

Explanation:

解释：

It will download all files and subfolders in ddddirectory
-r: recursively
-np: not going to upper directories, like ccc/…
-nH: not saving files to hostname folder
--cut-dirs=3: but saving it to dddby omitting first 3 folders aaa, bbb, ccc
-R index.html: excluding index.htmlfiles

它将下载ddd目录中的所有文件和子文件夹
-r: 递归
-np: 不去上层目录，比如ccc/...
-nH: 不将文件保存到主机名文件夹
--cut-dirs=3: 但通过省略前 3 个文件夹aaa, bbb, ccc将其保存到ddd
-R index.html: 排除index.html文件

Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/

参考：http: //bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/

Answer 2

回答by mateuscb

I was able to get this to work thanks to this postutilizing VisualWGet. It worked great for me. The important part seems to be to check the -recursiveflag (see image).

由于这篇使用VisualWGet 的帖子，我能够让它工作。它对我很有用。重要的部分似乎是检查-recursive标志（见图）。

Also found that the -no-parentflag is important, othewise it will try to download everything.

还发现-no-parent标志很重要，否则它会尝试下载所有内容。

enter image description here

在此处输入图片说明

Answer 3

回答by Natalie Ng

wget -r -np -nH --cut-dirs=3 -R index.html http://hostname/aaa/bbb/ccc/ddd/

From man wget

从 man wget

‘-r'‘--recursive'Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.

'-r' '--recursive'打开递归检索。有关更多详细信息，请参阅递归下载。默认最大深度为 5。

‘-np' ‘--no-parent'Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.

'-np' '--no-parent'递归检索时不要上升到父目录。这是一个有用的选项，因为它保证只下载特定层次结构以下的文件。有关更多详细信息，请参阅基于目录的限制。

‘-nH' ‘--no-host-directories'Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/' will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.

'-nH' '--no-host-directories'禁用主机前缀目录的生成。默认情况下，使用“-r http://fly.srk.fer.hr/”调用 Wget将创建一个以 fly.srk.fer.hr/ 开头的目录结构。此选项禁用此类行为。

‘--cut-dirs=number'Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.

'--cut-dirs=number'忽略编号目录组件。这对于对保存递归检索的目录进行细粒度控制很有用。

Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/'. If you retrieve it with ‘-r', it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH' option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs' comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs' option works.

以“ ftp://ftp.xemacs.org/pub/xemacs/”中的目录为例。如果您使用“-r”检索它，它将在本地保存在 ftp.xemacs.org/pub/xemacs/ 下。虽然 '-nH' 选项可以删除 ftp.xemacs.org/ 部分，但您仍然停留在 pub/xemacs。这就是“--cut-dirs”派上用场的地方；它使 Wget 无法“看到”数字远程目录组件。以下是“--cut-dirs”选项如何工作的几个示例。

No options -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> .

没有选项 -> ftp.xemacs.org/pub/xemacs/ -nH -> pub/xemacs/ -nH --cut-dirs=1 -> xemacs/ -nH --cut-dirs=2 -> 。

--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd' and ‘-P'. However, unlike ‘-nd', ‘--cut-dirs' does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1', a beta/ subdirectory will be placed to xemacs/beta, as one would expect.

--cut-dirs=1 -> ftp.xemacs.org/xemacs/ ... 如果你只是想摆脱目录结构，这个选项类似于'-nd'和'-P'的组合。然而，与 '-nd' 不同，'--cut-dirs' 不会丢失子目录——例如，使用 '-nH --cut-dirs=1'，一个 beta/ 子目录将被放置到 xemacs/beta，如人们会期待。

Answer 4

回答by Moscarda

wgetis an invaluable resource and something I use myself. However sometimes there are characters in the address that wgetidentifies as syntax errors. I'm sure there is a fix for that, but as this question did not ask specifically about wgetI thought I would offer an alternative for those people who will undoubtedly stumble upon this page looking for a quick fix with no learning curve required.

wget是一种宝贵的资源，也是我自己使用的东西。但是，有时地址中的字符会wget标识为语法错误。我确信有一个解决方案，但是由于这个问题没有具体询问wget我想我会为那些毫无疑问会偶然发现此页面寻找快速修复而无需学习曲线的人提供替代方案。

There are a few browser extensions that can do this, but most require installing download managers, which aren't always free, tend to be an eyesore, and use a lot of resources. Heres one that has none of these drawbacks:

有一些浏览器扩展可以做到这一点，但大多数都需要安装下载管理器，这些管理器并不总是免费的，往往很碍眼，并且会使用大量资源。这是一个没有这些缺点的：

"Download Master" is an extension for Google Chrome that works great for downloading from directories. You can choose to filter which file-types to download, or download the entire directory.

“下载大师”是谷歌浏览器的扩展，非常适合从目录下载。您可以选择过滤要下载的文件类型，或下载整个目录。

https://chrome.google.com/webstore/detail/download-master/dljdacfojgikogldjffnkdcielnklkce

For an up-to-date feature list and other information, visit the project page on the developer's blog:

有关最新的功能列表和其他信息，请访问开发人员博客上的项目页面：

http://monadownloadmaster.blogspot.com/

Answer 5

回答by Rushikesh Tade

You can use thisFirefox addon to download all files in HTTP Directory.

您可以使用此Firefox 插件下载 HTTP 目录中的所有文件。

https://addons.mozilla.org/en-US/firefox/addon/http-directory-downloader/

Answer 6

回答by T.Todua

No Software or Plugin required!

无需软件或插件！

(only usable if you don't need recursive deptch)

（仅当您不需要递归部门时才可用）

Use bookmarklet. Drag this linkin bookmarks, then edit and paste this code:

使用书签。拖动此链接的书签，然后编辑并粘贴此代码：

(function(){ var arr=[], l=document.links; var ext=prompt("select extension for download (all links containing that, will be downloaded.", ".mp3"); for(var i=0; i<l.length; i++) { if(l[i].href.indexOf(ext) !== false){ l[i].setAttribute("download",l[i].text); l[i].click(); } } })();

and go on page (from where you want to download files), and click that bookmarklet.

并转到页面（从您要下载文件的位置），然后单击该书签。

Answer 7

回答by nwgat

you can use lftp, the swish army knife of downloading if you have bigger files you can add --use-pget-n=10to command

如果你有更大的文件可以添加--use-pget-n=10到命令中，你可以使用 lftp，这是下载的大军刀

lftp -c 'mirror --parallel=100 https://example.com/files/ ;exit'

Answer 8

回答by Byte Bitter

wget generally works in this way, but some sites may have problems and it may create too many unnecessary html files. In order to make this work easier and to prevent unnecessary file creation, I am sharing my getwebfolder script, which is the first linux script I wrote for myself. This script downloads all content of a web folder entered as parameter.

wget 通常以这种方式工作，但某些站点可能会出现问题，并且可能会创建过多不必要的 html 文件。为了使这项工作更容易并防止创建不必要的文件，我分享了我的 getwebfolder 脚本，这是我为自己编写的第一个 linux 脚本。此脚本下载作为参数输入的 Web 文件夹的所有内容。

When you try to download an open web folder by wget which contains more then one file, wget downloads a file named index.html. This file contains a file list of the web folder. My script converts file names written in index.html file to web addresses and downloads them clearly with wget.

当您尝试通过 wget 下载包含多个文件的打开 Web 文件夹时，wget 会下载名为 index.html 的文件。此文件包含 Web 文件夹的文件列表。我的脚本将 index.html 文件中写入的文件名转换为网址，并使用 wget 清楚地下载它们。

Tested at Ubuntu 18.04 and Kali Linux, It may work at other distros as well.

在 Ubuntu 18.04 和 Kali Linux 上测试，它也可以在其他发行版上工作。

Usage :

用法：

extract getwebfolder file from zip file provided below
chmod +x getwebfolder(only for first time)
./getwebfolder webfolder_URL

从下面提供的 zip 文件中提取 getwebfolder 文件
chmod +x getwebfolder（仅限第一次）
./getwebfolder webfolder_URL

such as ./getwebfolder http://example.com/example_folder/

如 ./getwebfolder http://example.com/example_folder/

Html 如何下载包含所有文件和子目录的 HTTP 目录，因为它们出现在在线文件/文件夹列表中？

提问by Omar

回答by Mingjiang Shi

回答by mateuscb

回答by Natalie Ng

回答by Moscarda

回答by Rushikesh Tade

回答by T.Todua

No Software or Plugin required!

无需软件或插件！

回答by nwgat

回答by Byte Bitter

相关推荐

最近更新

标签

Html 如何下载包含所有文件和子目录的 HTTP 目录，因为它们出现在在线文件/文件夹列表中？

提问by Omar

回答by Mingjiang Shi

回答by mateuscb

回答by Natalie Ng

回答by Moscarda

回答by Rushikesh Tade

回答by T.Todua

No Software or Plugin required!

无需软件或插件！

回答by nwgat

回答by Byte Bitter

相关推荐

Html 只是 jumbotron 上的不透明度

Html 覆盖溢出：用 z-index 隐藏

Html 从按钮中删除边框

Html 填充剩余的垂直空间 - 仅 CSS

相关推荐

最近更新

标签