Javascript 用 JS 解析 HTML 字符串

Question

提问by stage

I searched for a solution but nothing was relevant, so here is my problem:

我搜索了一个解决方案，但没有任何相关性，所以这是我的问题：

I want to parse a string which contains HTML text. I want to do it in JavaScript.

我想解析一个包含 HTML 文本的字符串。我想用 JavaScript 来做。

I tried this librarybut it seems that it parses the HTML of my current page, not from a string. Because when I try the code below, it changes the title of my page:

我尝试过这个库，但它似乎解析了我当前页面的 HTML，而不是字符串。因为当我尝试下面的代码时，它会更改我的页面标题：

var parser = new HTMLtoDOM("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>", document);

My goal is to extract links from an HTML external page that I read just like a string.

我的目标是从我像字符串一样读取的 HTML 外部页面中提取链接。

Do you know an API to do it?

你知道一个API来做到这一点吗？

Answer 1

回答by Florian Margaine

Create a dummy DOM element and add the string to it. Then, you can manipulate it like any DOM element.

创建一个虚拟 DOM 元素并将字符串添加到其中。然后，您可以像操作任何 DOM 元素一样操作它。

var el = document.createElement( 'html' );
el.innerHTML = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";

el.getElementsByTagName( 'a' ); // Live NodeList of your anchor elements

Edit: adding a jQuery answer to please the fans!

编辑：添加一个 jQuery 答案来取悦粉丝！

var el = $( '<div></div>' );
el.html("<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>");

$('a', el) // All the anchor elements

Answer 2

回答by Cilan

It's quite simple:

这很简单：

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/html');
// do whatever you want with htmlDoc.getElementsByTagName('a');

According to MDN, to do this in chrome you need to parse as XML like so:

根据 MDN，要在 chrome 中执行此操作，您需要像这样解析为 XML：

var parser = new DOMParser();
var htmlDoc = parser.parseFromString(txt, 'text/xml');
// do whatever you want with htmlDoc.getElementsByTagName('a');

~~It is currently unsupported by webkit and you'd have to follow Florian's answer, and it is unknown to work in most cases on mobile browsers.~~

~~它目前不受 webkit 支持，您必须遵循 Florian 的回答，并且在大多数情况下无法在移动浏览器上工作。~~

Edit: Now widely supported

编辑：现在得到广泛支持

Answer 3

回答by Munawwar

EDIT: The solution below is only for HTML "fragments" since html,head and body are removed. I guess the solution for this question is DOMParser's parseFromString() method.

编辑：下面的解决方案仅适用于 HTML“片段”，因为删除了 html、head 和 body。我想这个问题的解决方案是 DOMParser 的 parseFromString() 方法。

For HTML fragments, the solutions listed here works for most HTML, however for certain cases it won't work.

对于 HTML 片段，此处列出的解决方案适用于大多数 HTML，但在某些情况下不起作用。

For example try parsing <td>Test</td>. This one won't work on the div.innerHTML solution nor DOMParser.prototype.parseFromString nor range.createContextualFragment solution. The td tag goes missing and only the text remains.

例如尝试解析<td>Test</td>. 这个不适用于 div.innerHTML 解决方案、DOMParser.prototype.parseFromString 或 range.createContextualFragment 解决方案。td 标签不见了，只剩下文本。

Only jQuery handles that case well.

只有 jQuery 可以很好地处理这种情况。

So the future solution (MS Edge 13+) is to use template tag:

所以未来的解决方案（MS Edge 13+）是使用模板标签：

function parseHTML(html) {
    var t = document.createElement('template');
    t.innerHTML = html;
    return t.content.cloneNode(true);
}

var documentFragment = parseHTML('<td>Test</td>');

For older browsers I have extracted jQuery's parseHTML() method into an independent gist - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99

对于较旧的浏览器，我已将 jQuery 的 parseHTML() 方法提取到一个独立的要点中 - https://gist.github.com/Munawwar/6e6362dbdf77c7865a99

Answer 4

回答by Mathieu

var doc = new DOMParser().parseFromString(html, "text/html");
var links = doc.querySelectorAll("a");

Answer 5

回答by John Slegers

The following function parseHTMLwill return either :

以下函数parseHTML将返回：

a Documentwhen your file starts with a doctype.
a DocumentFragmentwhen your file doesn't start with a doctype.

aDocument当您的文件以 doctype 开头时。
aDocumentFragment当您的文件不以 doctype 开头时。

The code :

编码：

function parseHTML(markup) {
    if (markup.toLowerCase().trim().indexOf('<!doctype') === 0) {
        var doc = document.implementation.createHTMLDocument("");
        doc.documentElement.innerHTML = markup;
        return doc;
    } else if ('content' in document.createElement('template')) {
       // Template tag exists!
       var el = document.createElement('template');
       el.innerHTML = markup;
       return el.content;
    } else {
       // Template tag doesn't exist!
       var docfrag = document.createDocumentFragment();
       var el = document.createElement('body');
       el.innerHTML = markup;
       for (i = 0; 0 < el.childNodes.length;) {
           docfrag.appendChild(el.childNodes[i]);
       }
       return docfrag;
    }
}

How to use :

如何使用：

var links = parseHTML('<!doctype html><html><head></head><body><a>Link 1</a><a>Link 2</a></body></html>').getElementsByTagName('a');

Answer 6

回答by Joel Richard

The fastest way to parse HTML in Chrome and Firefox is Range#createContextualFragment:

在 Chrome 和 Firefox 中解析 HTML 的最快方法是 Range#createContextualFragment：

var range = document.createRange();
range.selectNode(document.body); // required in Safari
var fragment = range.createContextualFragment('<h1>html...</h1>');
var firstNode = fragment.firstChild;

I would recommend to create a helper function which uses createContextualFragment if available and falls back to innerHTML otherwise.

我建议创建一个辅助函数，如果可用，它使用 createContextualFragment，否则回退到innerHTML。

Benchmark: http://jsperf.com/domparser-vs-createelement-innerhtml/3

基准：http: //jsperf.com/domparser-vs-createelement-innerhtml/3

Answer 7

回答by AnthumChris

const parse = Range.prototype.createContextualFragment.bind(document.createRange());

document.body.appendChild( parse('<p><strong>Today is:</strong></p>') ),
document.body.appendChild( parse(`<p style="background: #eee">${new Date()}</p>`) );

只有NodeNode父级NodeNode（的开头）中的有效 childRangeRange才会被解析。否则，可能会出现意想不到的结果：

// <body> is "parent" Node, start of Range
const parseRange = document.createRange();
const parse = Range.prototype.createContextualFragment.bind(parseRange);

// Returns Text "1 2" because td, tr, tbody are not valid children of <body>
parse('<td>1</td> <td>2</td>');
parse('<tr><td>1</td> <td>2</td></tr>');
parse('<tbody><tr><td>1</td> <td>2</td></tr></tbody>');

// Returns <table>, which is a valid child of <body>
parse('<table> <td>1</td> <td>2</td> </table>');
parse('<table> <tr> <td>1</td> <td>2</td> </tr> </table>');
parse('<table> <tbody> <td>1</td> <td>2</td> </tbody> </table>');

// <tr> is parent Node, start of Range
parseRange.setStart(document.createElement('tr'), 0);

// Returns [<td>, <td>] element array
parse('<td>1</td> <td>2</td>');
parse('<tr> <td>1</td> <td>2</td> </tr>');
parse('<tbody> <td>1</td> <td>2</td> </tbody>');
parse('<table> <td>1</td> <td>2</td> </table>');

Answer 8

回答by jmar777

If you're open to using jQuery, it has some nice facilities for creating detached DOM elements from strings of HTML. These can then be queried through the usual means, E.g.:

如果您愿意使用 jQuery，它有一些很好的工具可以从 HTML 字符串创建分离的 DOM 元素。然后可以通过通常的方式查询这些，例如：

var html = "<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>";
var anchors = $('<div/>').append(html).find('a').get();

Edit - just saw @Florian's answer which is correct. This is basically exactly what he said, but with jQuery.

编辑 - 刚刚看到@Florian 的答案是正确的。这基本上正是他所说的，但使用 jQuery。

Answer 9

回答by NaabNuts

with this simple code you can do that:

使用这个简单的代码，您可以做到：

let el = $('<div></div>');
$(document.body).append(el);
el.html(`<html><head><title>titleTest</title></head><body><a href='test0'>test01</a><a href='test1'>test02</a><a href='test2'>test03</a></body></html>`);
console.log(el.find('a[href="test0"]'));

Javascript 用 JS 解析 HTML 字符串

提问by stage

回答by Florian Margaine

回答by Cilan

回答by Munawwar

回答by Mathieu

回答by John Slegers

The code :

编码：

How to use :

如何使用：

回答by Joel Richard

回答by AnthumChris

回答by jmar777

回答by NaabNuts

相关推荐

最近更新

标签

Javascript 用 JS 解析 HTML 字符串

提问by stage

回答by Florian Margaine

回答by Cilan

回答by Munawwar

回答by Mathieu

回答by John Slegers

The code :

编码 ：

How to use :

如何使用 ：

回答by Joel Richard

回答by AnthumChris

回答by jmar777

回答by NaabNuts

相关推荐

Javascript 使用 Socket.io 更新所有客户端？

Javascript 如何在 $.post jQuery 中等待响应

JavaScript 选择器

Javascript setInterval(function(),time) 在运行时更改时间

相关推荐

最近更新

标签

编码：

如何使用：