如何从 C# 中的 HTML 字符串操作 DOM?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/232004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 19:08:56  来源:igfitidea点击:

How can I manipulate the DOM from a string of HTML in C#?

c#.nethtmldom.net-2.0

提问by Patrick Desjardins

For the moment the best way that I have found to be able to manipulate DOM from a string that contain HTML is:

目前,我发现能够从包含 HTML 的字符串操作 DOM 的最佳方法是:

WebBrowser webControl = new WebBrowser();
webControl.DocumentText = html;
HtmlDocument doc = webControl.Document;

There are two problems:

有两个问题:

  1. Requires the WebBrowserobject!
  2. This can't be used with multiple threads; I need something that would work on different thread (other than the main thread).
  1. 需要WebBrowser对象!
  2. 这不能用于多线程;我需要一些可以在不同线程(主线程除外)上工作的东西。

Any ideas?

有任何想法吗?

采纳答案by Patrick Desjardins

I did a search to GooglePlex for HTML and I found Html Agility PackI do not know if it's for that or not, I am downloading it right now to give a try.

我在 GooglePlex 上搜索了 HTML 并找到了Html Agility Pack我不知道它是否适用于此,我现在正在下载它以进行尝试。

回答by Jason Bunting

Depending on what you are trying to do (maybe you can give us more details?) and depending on whether or not the HTML is well-formed, you couldconvert this to an XmlDocument:

根据您尝试执行的操作(也许您可以提供更多详细信息?)以及 HTML 格式是否正确,您可以将其转换为XmlDocument

System.Xml.XmlDocument x = new System.Xml.XmlDocument();
x.LoadXml(html); // as long as html is well-formed, i.e. XHTML

Then you could manipulate it easily, without the WebBrowserinstance. As for threads, I don't know enough about the implementation of XmlDocumentto know the answer to that part.

然后你可以轻松地操作它,而无需WebBrowser实例。至于线程,我对 的实现了解不够,XmlDocument无法知道该部分的答案。



If the document isn't in proper form, you could use NTidy(.NET wrapper for HTML Tidy) to get it in shape first; I had to do this very thing for a project once and it really wasn't too bad.

如果文档格式不正确,您可以先使用NTidyHTML Tidy 的.NET 包装器)使其成形;我不得不为一个项目做一次这件事,这真的还不错。

回答by Martin Kool

JasonBunting already posted this, but it really works to use a .net wrapper around HTML tidy and load it up in an XmlDocument.

JasonBunting 已经发布了这个,但它确实可以在 HTML tidy 周围使用 .net 包装器并将其加载到 XmlDocument 中。

I have used this .net wrapper before :

我之前使用过这个 .net 包装器:

http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx

http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx

And implemented it somewhat like this:

并实现它有点像这样:

string input = "<p>crappy html<br <img src=foo></div>";
HtmlTidy tidy = new HtmlTidy()
string output = tidy.CleanHtml(input, HtmlTidyOptions.ConvertToXhtml);
XmlDocument doc = new XmlDocument();
doc.LoadXml(output);

Sorry if considered a repost :)

对不起,如果被认为是转帖:)

回答by Ashraf Sabry

This is an old question. Now there are:

这是一个老问题。现在有: