Html 您如何存档整个网站以供离线查看?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/538865/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 23:14:34  来源:igfitidea点击:

How do you archive an entire website for offline viewing?

htmlweb-crawlerarchive

提问by jskunkle

We actually have burned static/archived copies of our asp.net websites for customers many times. We have used WebZipuntil now but we have had endless problems with crashes, downloaded pages not being re-linked correctly, etc.

实际上,我们已经多次为客户烧毁了我们的 asp.net 网站的静态/存档副本。到目前为止,我们一直在使用WebZip,但我们遇到了无数的崩溃问题,下载的页面无法正确重新链接等。

We basically need an application that crawls and downloads static copies of everything on our asp.net website (pages, images, documents, css, etc) and then processes the downloaded pages so that they can be browsed locally without an internet connection (get rid of absolute urls in links, etc). The more idiot proof the better. This seems like a pretty common and (relatively) simple process but I have tried a few other applications and have been really unimpressed

我们基本上需要一个应用程序来抓取和下载我们 asp.net 网站上所有内容的静态副本(页面、图像、文档、css 等),然后处理下载的页面,以便它们可以在没有互联网连接的情况下在本地浏览(摆脱链接中的绝对网址等)。白痴证据越多越好。这似乎是一个非常常见且(相对)简单的过程,但我已经尝试了其他一些应用程序并且真的没有留下深刻印象

Does anyone have archive software they would recommend? Does anyone have a really simple process they would share?

有没有人推荐他们的存档软件?有没有人有一个非常简单的过程他们会分享?

采纳答案by Jesse Dearing

In Windows, you can look at HTTrack. It's very configurable allowing you to set the speed of the downloads. But you can just point it at a website and run it too with no configuration at all.

在 Windows 中,您可以查看HTTrack。它是非常可配置的,允许您设置下载速度。但是您可以将它指向一个网站并运行它,而无需进行任何配置。

In my experience it's been a really good tool and works well. Some of the things I like about HTTrack are:

根据我的经验,它是一个非常好的工具并且运行良好。我喜欢 HTTrack 的一些地方是:

  • Open Source license
  • Resumes stopped downloads
  • Can update an existing archive
  • You can configure it to be non-aggressive when it downloads so it doesn't waste your bandwidth and the bandwidth of the site.
  • 开源许可证
  • 恢复停止的下载
  • 可以更新现有存档
  • 您可以将其配置为非攻击性下载,以免浪费您的带宽和站点的带宽。

回答by chuckg

You could use wget:

你可以使用wget

wget -m -k -K -E http://url/of/web/site

回答by 2540625

The Wayback Machine Downloaderby hartatoris simple and fast.

Wayback机器下载hartator是简单,快捷。

Install via Ruby, then run with the desired domain and optional timestamp from the Internet Archive.

通过 Ruby 安装,然后使用Internet Archive 中所需的域和可选时间戳运行。

sudo gem install wayback_machine_downloader
mkdir example
cd example
wayback_machine_downloader http://example.com --timestamp 19700101000000

回答by Syntax

I use Blue Crabon OSX and WebCopieron Windows.

我在 OSX上使用Blue Crab,在 Windows上使用WebCopier

回答by Joel Hoffman

wget -r -k

wget -r -k

... and investigate the rest of the options. I hope you've followed these guidelines:http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.htmlso all your resources are safe with GET requests.

...并调查其余的选项。我希望您已遵循以下准则:http: //www.w3.org/Protocols/rfc2616/rfc2616-sec9.html,因此您的所有资源对于 GET 请求都是安全的。

回答by Aram Verstegen

I just use: wget -m <url>.

我只是使用:wget -m <url>

回答by user1011743

For OS X users, I've found the sitesucker application found hereworks well without configuring anything but how deep it follows links.

对于 OS X 用户,我发现此处找到的 siteucker 应用程序运行良好,无需配置任何内容,但它遵循链接的深度。

回答by Dieghito

If your customers are archiving for compliance issues, you want to ensure that the content can be authenticated. The options listed are fine for simple viewing, but they aren't legally admissible. In that case, you're looking for timestamps and digital signatures. Much more complicated if you're doing it yourself. I'd suggest a service such as PageFreezer.

如果您的客户正在归档合规性问题,您希望确保可以对内容进行身份验证。列出的选项很适合简单查看,但在法律上是不允许的。在这种情况下,您正在寻找时间戳和数字签名。如果你自己做,那就复杂多了。我建议使用PageFreezer 之类的服务。

回答by Steve Rowe

I've been using HTTrack for several years now. It handles all of the inter-page linking, etc. just fine. My only complaint is that I haven't found a good way to keep it limited to a sub-site very well. For instance, if there is a site www.foo.com/steve that I want to archive, it will likely follow links to www.foo.com/rowe and archive that too. Otherwise it's great. Highly configurable and reliable.

我已经使用 HTTrack 好几年了。它处理所有的页面间链接等就好了。我唯一的抱怨是我还没有找到一种很好的方法来将它很好地限制在子站点中。例如,如果我想存档一个网站 www.foo.com/steve,它可能会跟随指向 www.foo.com/rowe 的链接并将其存档。否则很棒。高度可配置和可靠。