Javascript 用 JS 解析 HTML 字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10585029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse an HTML string with JS
提问by stage
I searched for a solution but nothing was relevant, so here is my problem:
我搜索了一个解决方案,但没有任何相关性,所以这是我的问题:
I want to parse a string which contains HTML text. I want to do it in JavaScript.
我想解析一个包含 HTML 文本的字符串。我想用 JavaScript 来做。
I tried this librarybut it seems that it parses the HTML of my current page, not from a string. Because when I try the code below, it changes the title of my page:
我尝试过这个库,但它似乎解析了我当前页面的 HTML,而不是字符串。因为当我尝试下面的代码时,它会更改我的页面标题:
var parser = new HTMLtoDOM("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>", document);
My goal is to extract links from an HTML external page that I read just like a string.
我的目标是从我像字符串一样读取的 HTML 外部页面中提取链接。
Do you know an API to do it?
你知道一个API来做到这一点吗?
回答by Florian Margaine
Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.
创建一个虚拟 DOM 元素并将字符串添加到其中。然后,您可以像操作任何 DOM 元素一样操作它。
var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements
Edit: adding a jQuery answer to please the fans!
编辑:添加一个 jQuery 答案来取悦粉丝!
var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");
$('a', el) // All the anchor elements
回答by Cilan
It's quite simple:
这很简单:
var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/html');
// do whatever you want with htmlDoc.getElementsByTagName('a');
According to MDN, to do this in chrome you need to parse as XML like so:
根据 MDN,要在 chrome 中执行此操作,您需要像这样解析为 XML:
var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/xml');
// do whatever you want with htmlDoc.getElementsByTagName('a');
It is currently unsupported by webkit and you'd have to follow Florian's answer, and it is unknown to work in most cases on mobile browsers.
它目前不受 webkit 支持,您必须遵循 Florian 的回答,并且在大多数情况下无法在移动浏览器上工作。
Edit: Now widely supported
编辑:现在得到广泛支持
回答by Munawwar
EDIT: The solution below is only for HTML "fragments" since html,head and body are removed. I guess the solution for this question is DOMParser's parseFromString() method.
编辑:下面的解决方案仅适用于 HTML“片段”,因为删除了 html、head 和 body。我想这个问题的解决方案是 DOMParser 的 parseFromString() 方法。
For HTML fragments, the solutions listed here works for most HTML, however for certain cases it won't work.
对于 HTML 片段,此处列出的解决方案适用于大多数 HTML,但在某些情况下不起作用。
For example try parsing <td>Test</td>
. This one won't work on the div.innerHTML solution nor DOMParser.prototype.parseFromString nor range.createContextualFragment solution. The td tag goes missing and only the text remains.
例如尝试解析<td>Test</td>
. 这个不适用于 div.innerHTML 解决方案、DOMParser.prototype.parseFromString 或 range.createContextualFragment 解决方案。td 标签不见了,只剩下文本。
Only jQuery handles that case well.
只有 jQuery 可以很好地处理这种情况。
So the future solution (MS Edge 13+) is to use template tag:
所以未来的解决方案(MS Edge 13+)是使用模板标签:
function parseHTML(html) {
var t = document.createElement('template');
t.innerHTML = html;
return t.content.cloneNode(true);
}
var documentFragment = parseHTML('<td>Test</td>');
For older browsers I have extracted jQuery's parseHTML() method into an independent gist - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99
对于较旧的浏览器,我已将 jQuery 的 parseHTML() 方法提取到一个独立的要点中 - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99
回答by Mathieu
var doc = new DOMParser().parseFromString(html, "text/html");
var links = doc.querySelectorAll("a");
回答by John Slegers
The following function parseHTML
will return either :
以下函数parseHTML
将返回:
a
Document
when your file starts with a doctype.a
DocumentFragment
when your file doesn't start with a doctype.
a
Document
当您的文件以 doctype 开头时。a
DocumentFragment
当您的文件不以 doctype 开头时。
The code :
编码 :
function parseHTML(markup) {
if (markup.toLowerCase().trim().indexOf('<!doctype') === 0) {
var doc = document.implementation.createHTMLDocument("");
doc.documentElement.innerHTML = markup;
return doc;
} else if ('content' in document.createElement('template')) {
// Template tag exists!
var el = document.createElement('template');
el.innerHTML = markup;
return el.content;
} else {
// Template tag doesn't exist!
var docfrag = document.createDocumentFragment();
var el = document.createElement('body');
el.innerHTML = markup;
for (i = 0; 0 < el.childNodes.length;) {
docfrag.appendChild(el.childNodes[i]);
}
return docfrag;
}
}
How to use :
如何使用 :
var links = parseHTML('<!doctype html><html><head></head><body><a>Link 1</a><a>Link 2</a></body></html>').getElementsByTagName('a');
回答by Joel Richard
The fastest way to parse HTML in Chrome and Firefox is Range#createContextualFragment:
在 Chrome 和 Firefox 中解析 HTML 的最快方法是 Range#createContextualFragment:
var range = document.createRange();
range.selectNode(document.body); // required in Safari
var fragment = range.createContextualFragment('<h1>html...</h1>');
var firstNode = fragment.firstChild;
I would recommend to create a helper function which uses createContextualFragment if available and falls back to innerHTML otherwise.
我建议创建一个辅助函数,如果可用,它使用 createContextualFragment,否则回退到innerHTML。
Benchmark: http://jsperf.com/domparser-vs-createelement-innerhtml/3
基准:http: //jsperf.com/domparser-vs-createelement-innerhtml/3
回答by AnthumChris
const parse = Range.prototype.createContextualFragment.bind(document.createRange());
document.body.appendChild( parse('<p><strong>Today is:</strong></p>') ),
document.body.appendChild( parse(`<p style="background: #eee">${new Date()}</p>`) );
只有
Node
Node
父级Node
Node
( 的开头)中的有效 childRange
Range
才会被解析。否则,可能会出现意想不到的结果:// <body> is "parent" Node, start of Range
const parseRange = document.createRange();
const parse = Range.prototype.createContextualFragment.bind(parseRange);
// Returns Text "1 2" because td, tr, tbody are not valid children of <body>
parse('<td>1</td> <td>2</td>');
parse('<tr><td>1</td> <td>2</td></tr>');
parse('<tbody><tr><td>1</td> <td>2</td></tr></tbody>');
// Returns <table>, which is a valid child of <body>
parse('<table> <td>1</td> <td>2</td> </table>');
parse('<table> <tr> <td>1</td> <td>2</td> </tr> </table>');
parse('<table> <tbody> <td>1</td> <td>2</td> </tbody> </table>');
// <tr> is parent Node, start of Range
parseRange.setStart(document.createElement('tr'), 0);
// Returns [<td>, <td>] element array
parse('<td>1</td> <td>2</td>');
parse('<tr> <td>1</td> <td>2</td> </tr>');
parse('<tbody> <td>1</td> <td>2</td> </tbody>');
parse('<table> <td>1</td> <td>2</td> </table>');
回答by jmar777
If you're open to using jQuery, it has some nice facilities for creating detached DOM elements from strings of HTML. These can then be queried through the usual means, E.g.:
如果您愿意使用 jQuery,它有一些很好的工具可以从 HTML 字符串创建分离的 DOM 元素。然后可以通过通常的方式查询这些,例如:
var html = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
var anchors = $('<div/>').append(html).find('a').get();
Edit - just saw @Florian's answer which is correct. This is basically exactly what he said, but with jQuery.
编辑 - 刚刚看到@Florian 的答案是正确的。这基本上正是他所说的,但使用 jQuery。
回答by NaabNuts
with this simple code you can do that:
使用这个简单的代码,您可以做到:
let el = $('<div></div>');
$(document.body).append(el);
el.html(`<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>`);
console.log(el.find('a[href="test0"]'));