javascript 如何使用javascript下载网页的整个HTML?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8701432/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-26 04:16:39  来源:igfitidea点击:

How to download entire HTML of a webpage using javascript?

javascripthtmlurldownloadfirefox-addon

提问by Meysam

Is it possible to download the entire HTMLof a webpage using JavaScriptgiven the URL? What I want to do is to develop a Firefox add-on to download the content of all the links found in the source of current page of browser.

是否可以HTML使用JavaScript给定的 URL下载整个网页?我想做的是开发一个 Firefox 插件来下载浏览器当前页面源中找到的所有链接的内容。

update: the URLs reside in the same domain

更新:URL 位于同一个域中

回答by erturne

It should be possible to do using jQuery ajax. Javascript in a Firefox extension is not subject to the cross-origin restriction. Here are some tips for using jQuery in a Firefox extension:

应该可以使用 jQuery ajax。Firefox 扩展中的 Javascript 不受跨域限制。以下是在 Firefox 扩展中使用 jQuery 的一些技巧:

  1. Add the jQuery library to your extension's chrome/content/ directory.

  2. Load jQuery in the window load event callback rather than including it in your browser overlay XUL. Otherwise it can cause conflicts (e.g. clobbers a user's customized toolbar).

    (function(loader){ 
    loader.loadSubScript("chrome://ryebox/content/jquery-1.6.2.min.js"); })
    (Components.classes["@mozilla.org/moz/jssubscript-loader;1"].getService(Components.interfaces.mozIJSSubScriptLoader));
    
  3. Use "jQuery" instead of "$". I experienced weird behavior when using $ instead of jQuery (a conflict of some kind I suppose)

  4. Use jQuery(content.document) instead of jQuery(document) to access a page's DOM. In a Firefox extension "document" refers to the browser's XUL whereas "content.document" refers to the page's DOM.

  1. 将 jQuery 库添加到您的扩展程序的 chrome/content/ 目录。

  2. 在窗口加载事件回调中加载 jQuery,而不是将它包含在您的浏览器覆盖 XUL 中。否则它会导致冲突(例如破坏用户的自定义工具栏)。

    (function(loader){ 
    loader.loadSubScript("chrome://ryebox/content/jquery-1.6.2.min.js"); })
    (Components.classes["@mozilla.org/moz/jssubscript-loader;1"].getService(Components.interfaces.mozIJSSubScriptLoader));
    
  3. 使用“jQuery”而不是“$”。我在使用 $ 而不是 jQuery 时遇到了奇怪的行为(我认为是某种冲突)

  4. 使用 jQuery(content.document) 而不是 jQuery(document) 来访问页面的 DOM。在 Firefox 扩展中,“document”是指浏览器的 XUL,而“content.document”是指页面的 DOM。

I wrote a Firefox extension for getting bookmarks from my friend's bookmark site. It uses jQuery to fetch my bookmarks in a JSON response from his service, then creates a menu of those bookmarks so that I can easily access them. You can browse the source at https://github.com/erturne/ryebox

我编写了一个 Firefox 扩展,用于从我朋友的书签站点获取书签。它使用 jQuery 从他的服务的 JSON 响应中获取我的书签,然后创建这些书签的菜单,以便我可以轻松访问它们。您可以在https://github.com/erturne/ryebox浏览源代码

回答by Christofer Eliasson

For JavaScript in general, the short answer is no, not unless all pages are within the same domain. JavaScript is limited by the same-origin policy, so for security reasons, you cannot do cross-domain requests like that.

一般来说,对于 JavaScript,简短的回答是否定的,除非所有页面都在同一个域中。JavaScript 受同源策略限制,因此出于安全原因,您不能进行这样的跨域请求。

However, as pointed out by Max and erturne in the comments, when JavaScript is written as part of an extension/add-on to the browser, the regular rules about same origin policy and cross-domain requests does not seem to apply - at least not for Firefox and Chrome. Therefor, using JavaScript to download the pages should be possible using a XMLHttpRequest, or using some of the wrapper methods included in your favorite JS-library.

然而,正如 Max 和 erturne 在评论中指出的那样,当 JavaScript 作为浏览器扩展/附加组件的一部分编写时,关于同源策略和跨域请求的常规规则似乎并不适用 - 至少不适用于 Firefox 和 Chrome。因此,使用 JavaScript 下载页面应该可以使用 XMLHttpRequest 或使用您最喜欢的 JS 库中包含的一些包装方法。

If you like me prefer jQuery, you can have a look at jQuery's .load()method, that loads HTML from a given resource, and inject it into an element that you specify.

如果你喜欢我喜欢 jQuery,你可以看看 jQuery 的.load()方法,它从给定的资源加载 HTML,并将它注入到你指定的元素中。

Edit:Made some updates to my answer based on the comments about cross-domain requests made by add-ons.

编辑:根据关于附加组件提出的跨域请求的评论,对我的答案进行了一些更新。

回答by Thomas Johan Eggum

You can do XmlHttpRequests (XHR`s) if the combination scheme://domain:port is the same for the page hosting the JavaScript that should fetch the HTML.

如果组合 scheme://domain:port 与托管应获取 HTML 的 JavaScript 的页面相同,则您可以执行 XmlHttpRequests (XHR`s)。

Many JS-frameworks gives you easy XHR-support, Jquery, Dojo, etc. Example using DOJO:

许多 JS 框架为您提供了简单的 XHR 支持、Jquery、Dojo 等。 使用 DOJO 的示例:

function getText() {
  dojo.xhrGet({
    url: "test/someHtml.html",
        load: function(response, ioArgs){
      //The repsone is the HTML
      return response;
    },
    error: function(response, ioArgs){
      return response;
    },
    handleAs: "text"
  });
}

If you prefer writing your own XMLHttpRequest-handler, take a look here: http://www.w3schools.com/xml/xml_http.asp

如果您更喜欢编写自己的 XMLHttpRequest 处理程序,请查看这里:http: //www.w3schools.com/xml/xml_http.asp

回答by qidizi

if you only write a text web page downloader with your mind,and you only know htmland javascript, you can write a downloader name "download.hta" with htmland javascriptto control Msxml2.ServerXMLHTTP.6.0and FSO

如果你只写一个文本网页下载你的心,你只知道htmljavascript,你可以写一个下载名称为“download.hta”以htmljavascript对控制Msxml2.ServerXMLHTTP.6.0FSO