使用 javascript 获取原始 html 代码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3905219/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-25 02:38:08  来源:igfitidea点击:

Use javascript to get raw html code

javascripthtml

提问by Melina

I need to get the actual html code of an element in a web page.

我需要获取网页中元素的实际 html 代码。

For example if the actual html code inside the element is "How to fix"

例如,如果元素内的实际 html 代码是 "How to fix"

Running this javascript getElementById('myE').innerHTMLgives me "How to fix"which is the decoded form

运行这个 javascript getElementById('myE').innerHTML给我"How to fix"这是解码的形式

How can I get "How to fix"using javascript?

我怎样才能"How to fix"使用 javascript?

采纳答案by Nick Craver

What you have should work:

你有什么应该工作:

Element test:

元素测试:

<div id="myE">How to&nbsp;fix</div>?

JavaScript test:

JavaScript 测试:

alert(document.getElementById("myE????????").innerHTML); //alerts "How to&nbsp;fix"

You can try it out here. Make sure that wherever you're usingthe result isn't show &nbsp;as a space, which is likely the case. If you want to show it somewhere that's designed for HTML, you'll need to escape it.

你可以在这里试一试。确保无论您在哪里使用结果都不会显示&nbsp;为空格,这很可能是这种情况。如果你想在专为 HTML 设计的地方展示它,你需要转义它。

回答by bobince

You cannot get the actualHTML source of part of your web page.

您无法获得部分网页的实际HTML 源代码。

When you give a web browser an HTML page, it parses the HTML into some DOM nodes that are the definitive version of your document as far as the browser is concerned. The DOM keeps the significant information from the HTML—like that you used the Unicode character U+00A0 Non-Breaking Space before the word fix—but not the irrelevent information that you used it by means of an entity reference rather than just typing it raw (?).

当您向 Web 浏览器提供 HTML 页面时,它会将 HTML 解析为一些 DOM 节点,这些节点是浏览器所关注的文档的最终版本。DOM 保留了来自 HTML 的重要信息——就像你在单词前使用了 Unicode 字符 U+00A0 Non-Breaking Space——fix但不会保留你通过实体引用而不是直接输入原始信息使用它的无关信息(?)。

When you ask the browser for an element node's innerHTML, it doesn't give you the original HTML source that was parsed to produce that node, because it no longer has that information. Instead, it generates new HTML from the data stored in the DOM. The browser decides on how to format that HTML serialisation; different browsers produce different HTML, and chances are it won't be the same way you formatted it originally.

当您向浏览器询问元素节点的 时innerHTML,它不会为您提供经过解析以生成该节点的原始 HTML 源代码,因为它不再具有该信息。相反,它从存储在 DOM 中的数据生成新的 HTML。浏览器决定如何格式化 HTML 序列化;不同的浏览器会生成不同的 HTML,而且很可能与您最初对其进行格式化的方式不同。

In particular,

特别是,

  • element names may be upper- or lower-cased;

  • attributes may not be in the same order as you stated them in the HTML;

  • attribute quoting may not be the same as in your source. IE often generates unquoted attributes that aren't even valid HTML; all you can be sure of is that the innerHTMLgenerated will be safe to use in the same browser by writing it to another element's innerHTML;

  • it may not use entity references for anything but characters that would otherwise be impossible to include directly in text content: ampersands, less-thans and attribute-value-quotes. Instead of returning &nbsp;it may simply give you the raw ?character.

  • 元素名称可以大写或小写;

  • 属性的顺序可能与您在 HTML 中声明的顺序不同;

  • 属性引用可能与您的来源不同。IE 经常生成不带引号的属性,这些属性甚至不是有效的 HTML;您可以确定的是,innerHTML通过将生成的内容写入另一个元素的innerHTML;可以安全地在同一浏览器中使用。

  • 除了无法直接包含在文本内容中的字符外,它可能不会使用实体引用:&、小于和属性值引用。而不是返回&nbsp;它可能只是给你原始?字符。

You may not be able to seethat that's a non-breaking space, but it still is one and if you insert that HTML into another element it will act as one. You shouldn't need to rely anywhere on a non-breaking space character being entity-escaped to &nbsp;... if you do, for some reason, you can get that by doing:

您可能无法看到这是一个不间断的空间,但它仍然是一个,如果您将该 HTML 插入另一个元素,它将作为一个元素。您不需要依赖实体转义的不间断空格字符到任何地方&nbsp;......如果您这样做,出于某种原因,您可以通过执行以下操作来实现:

x= el.innerHTML.replace(/\xA0/g, '&nbsp;')

but that's only escaping U+00A0 and not any of the other thousands of possible Unicode characters, so it's a bit questionable.

但这只是转义 U+00A0 而不是其他数千个可能的 Unicode 字符中的任何一个,所以这有点值得怀疑。

If you really really need to get your page's actual source HTML, you can make an XMLHttpRequestto your own URL (location.href) and get the full, unparsed HTML source in the responseText. There is almost never a good reason to do this.

如果您真的需要获取页面的实际 HTML 源代码,您可以创建XMLHttpRequest自己的 URL ( location.href) 并在responseText. 几乎从来没有一个很好的理由这样做。