Javascript 用 JS 解析 HTML 字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10585029/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 01:59:09  来源:igfitidea点击:

Parse an HTML string with JS

javascripthtmldomhtml-parsing

提问by stage

I searched for a solution but nothing was relevant, so here is my problem:

我搜索了一个解决方案,但没有任何相关性,所以这是我的问题:

I want to parse a string which contains HTML text. I want to do it in JavaScript.

我想解析一个包含 HTML 文本的字符串。我想用 JavaScript 来做。

I tried this librarybut it seems that it parses the HTML of my current page, not from a string. Because when I try the code below, it changes the title of my page:

我尝试过这个库,但它似乎解析了我当前页面的 HTML,而不是字符串。因为当我尝试下面的代码时,它会更改我的页面标题:

var parser = new HTMLtoDOM("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>", document);

My goal is to extract links from an HTML external page that I read just like a string.

我的目标是从我像字符串一样读取的 HTML 外部页面中提取链接。

Do you know an API to do it?

你知道一个API来做到这一点吗?

回答by Florian Margaine

Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.

创建一个虚拟 DOM 元素并将字符串添加到其中。然后,您可以像操作任何 DOM 元素一样操作它。

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";

el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements

Edit: adding a jQuery answer to please the fans!

编辑:添加一个 jQuery 答案来取悦粉丝!

var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");

$('a', el) // All the anchor elements

回答by Cilan

It's quite simple:

这很简单:

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/html');
// do whatever you want with htmlDoc.getElementsByTagName('a');

According to MDN, to do this in chrome you need to parse as XML like so:

根据 MDN,要在 chrome 中执行此操作,您需要像这样解析为 XML:

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/xml');
// do whatever you want with htmlDoc.getElementsByTagName('a');

It is currently unsupported by webkit and you'd have to follow Florian's answer, and it is unknown to work in most cases on mobile browsers.

它目前不受 webkit 支持,您必须遵循 Florian 的回答,并且在大多数情况下无法在移动浏览器上工作。

Edit: Now widely supported

编辑:现在得到广泛支持

回答by Munawwar

EDIT: The solution below is only for HTML "fragments" since html,head and body are removed. I guess the solution for this question is DOMParser's parseFromString() method.

编辑:下面的解决方案仅适用于 HTML“片段”,因为删除了 html、head 和 body。我想这个问题的解决方案是 DOMParser 的 parseFromString() 方法。



For HTML fragments, the solutions listed here works for most HTML, however for certain cases it won't work.

对于 HTML 片段,此处列出的解决方案适用于大多数 HTML,但在某些情况下不起作用。

For example try parsing <td>Test</td>. This one won't work on the div.innerHTML solution nor DOMParser.prototype.parseFromString nor range.createContextualFragment solution. The td tag goes missing and only the text remains.

例如尝试解析<td>Test</td>. 这个不适用于 div.innerHTML 解决方案、DOMParser.prototype.parseFromString 或 range.createContextualFragment 解决方案。td 标签不见了,只剩下文本。

Only jQuery handles that case well.

只有 jQuery 可以很好地处理这种情况。

So the future solution (MS Edge 13+) is to use template tag:

所以未来的解决方案(MS Edge 13+)是使用模板标签:

function parseHTML(html) {
    var t = document.createElement('template');
    t.innerHTML = html;
    return t.content.cloneNode(true);
}

var documentFragment = parseHTML('<td>Test</td>');

For older browsers I have extracted jQuery's parseHTML() method into an independent gist - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99

对于较旧的浏览器,我已将 jQuery 的 parseHTML() 方法提取到一个独立的要点中 - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99

回答by Mathieu

var doc = new DOMParser().parseFromString(html, "text/html");
var links = doc.querySelectorAll("a");

回答by John Slegers

The following function parseHTMLwill return either :

以下函数parseHTML将返回:



The code :

编码 :

function parseHTML(markup) {
    if (markup.toLowerCase().trim().indexOf('<!doctype') === 0) {
        var doc = document.implementation.createHTMLDocument("");
        doc.documentElement.innerHTML = markup;
        return doc;
    } else if ('content' in document.createElement('template')) {
       // Template tag exists!
       var el = document.createElement('template');
       el.innerHTML = markup;
       return el.content;
    } else {
       // Template tag doesn't exist!
       var docfrag = document.createDocumentFragment();
       var el = document.createElement('body');
       el.innerHTML = markup;
       for (i = 0; 0 < el.childNodes.length;) {
           docfrag.appendChild(el.childNodes[i]);
       }
       return docfrag;
    }
}


How to use :

如何使用 :

var links = parseHTML('<!doctype html><html><head></head><body><a>Link 1</a><a>Link 2</a></body></html>').getElementsByTagName('a');

回答by Joel Richard

The fastest way to parse HTML in Chrome and Firefox is Range#createContextualFragment:

在 Chrome 和 Firefox 中解析 HTML 的最快方法是 Range#createContextualFragment:

var range = document.createRange();
range.selectNode(document.body); // required in Safari
var fragment = range.createContextualFragment('<h1>html...</h1>');
var firstNode = fragment.firstChild;

I would recommend to create a helper function which uses createContextualFragment if available and falls back to innerHTML otherwise.

我建议创建一个辅助函数,如果可用,它使用 createContextualFragment,否则回退到innerHTML。

Benchmark: http://jsperf.com/domparser-vs-createelement-innerhtml/3

基准:http: //jsperf.com/domparser-vs-createelement-innerhtml/3

回答by AnthumChris

const parse = Range.prototype.createContextualFragment.bind(document.createRange());

document.body.appendChild( parse('<p><strong>Today is:</strong></p>') ),
document.body.appendChild( parse(`<p style="background: #eee">${new Date()}</p>`) );



只有NodeNode父级NodeNode( 的开头)中的有效 childRangeRange才会被解析。否则,可能会出现意想不到的结果:

// <body> is "parent" Node, start of Range
const parseRange = document.createRange();
const parse = Range.prototype.createContextualFragment.bind(parseRange);

// Returns Text "1 2" because td, tr, tbody are not valid children of <body>
parse('<td>1</td> <td>2</td>');
parse('<tr><td>1</td> <td>2</td></tr>');
parse('<tbody><tr><td>1</td> <td>2</td></tr></tbody>');

// Returns <table>, which is a valid child of <body>
parse('<table> <td>1</td> <td>2</td> </table>');
parse('<table> <tr> <td>1</td> <td>2</td> </tr> </table>');
parse('<table> <tbody> <td>1</td> <td>2</td> </tbody> </table>');

// <tr> is parent Node, start of Range
parseRange.setStart(document.createElement('tr'), 0);

// Returns [<td>, <td>] element array
parse('<td>1</td> <td>2</td>');
parse('<tr> <td>1</td> <td>2</td> </tr>');
parse('<tbody> <td>1</td> <td>2</td> </tbody>');
parse('<table> <td>1</td> <td>2</td> </table>');

回答by jmar777

If you're open to using jQuery, it has some nice facilities for creating detached DOM elements from strings of HTML. These can then be queried through the usual means, E.g.:

如果您愿意使用 jQuery,它有一些很好的工具可以从 HTML 字符串创建分离的 DOM 元素。然后可以通过通常的方式查询这些,例如:

var html = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
var anchors = $('<div/>').append(html).find('a').get();


Edit - just saw @Florian's answer which is correct. This is basically exactly what he said, but with jQuery.

编辑 - 刚刚看到@Florian 的答案是正确的。这基本上正是他所说的,但使用 jQuery。

回答by NaabNuts

with this simple code you can do that:

使用这个简单的代码,您可以做到:

let el = $('<div></div>');
$(document.body).append(el);
el.html(`<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>`);
console.log(el.find('a[href="test0"]'));