如何从 C# 中的 HTML 字符串操作 DOM?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/232004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I manipulate the DOM from a string of HTML in C#?
提问by Patrick Desjardins
For the moment the best way that I have found to be able to manipulate DOM from a string that contain HTML is:
目前,我发现能够从包含 HTML 的字符串操作 DOM 的最佳方法是:
WebBrowser webControl = new WebBrowser();
webControl.DocumentText = html;
HtmlDocument doc = webControl.Document;
There are two problems:
有两个问题:
- Requires the
WebBrowser
object! - This can't be used with multiple threads; I need something that would work on different thread (other than the main thread).
- 需要
WebBrowser
对象! - 这不能用于多线程;我需要一些可以在不同线程(主线程除外)上工作的东西。
Any ideas?
有任何想法吗?
采纳答案by Patrick Desjardins
I did a search to GooglePlex for HTML and I found Html Agility PackI do not know if it's for that or not, I am downloading it right now to give a try.
我在 GooglePlex 上搜索了 HTML 并找到了Html Agility Pack我不知道它是否适用于此,我现在正在下载它以进行尝试。
回答by Jason Bunting
Depending on what you are trying to do (maybe you can give us more details?) and depending on whether or not the HTML is well-formed, you couldconvert this to an XmlDocument
:
根据您尝试执行的操作(也许您可以提供更多详细信息?)以及 HTML 格式是否正确,您可以将其转换为XmlDocument
:
System.Xml.XmlDocument x = new System.Xml.XmlDocument();
x.LoadXml(html); // as long as html is well-formed, i.e. XHTML
Then you could manipulate it easily, without the WebBrowser
instance. As for threads, I don't know enough about the implementation of XmlDocument
to know the answer to that part.
然后你可以轻松地操作它,而无需WebBrowser
实例。至于线程,我对 的实现了解不够,XmlDocument
无法知道该部分的答案。
If the document isn't in proper form, you could use NTidy(.NET wrapper for HTML Tidy) to get it in shape first; I had to do this very thing for a project once and it really wasn't too bad.
如果文档格式不正确,您可以先使用NTidy(HTML Tidy 的.NET 包装器)使其成形;我不得不为一个项目做一次这件事,这真的还不错。
回答by Martin Kool
JasonBunting already posted this, but it really works to use a .net wrapper around HTML tidy and load it up in an XmlDocument.
JasonBunting 已经发布了这个,但它确实可以在 HTML tidy 周围使用 .net 包装器并将其加载到 XmlDocument 中。
I have used this .net wrapper before :
我之前使用过这个 .net 包装器:
http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx
http://www.codeproject.com/KB/cs/ZetaHtmlTidy.aspx
And implemented it somewhat like this:
并实现它有点像这样:
string input = "<p>crappy html<br <img src=foo></div>";
HtmlTidy tidy = new HtmlTidy()
string output = tidy.CleanHtml(input, HtmlTidyOptions.ConvertToXhtml);
XmlDocument doc = new XmlDocument();
doc.LoadXml(output);
Sorry if considered a repost :)
对不起,如果被认为是转帖:)
回答by Ashraf Sabry
This is an old question. Now there are:
这是一个老问题。现在有:
- The HTML Agility Pack(You have already found this)
- CsQuery, a .Net jQuery port, which will be great for jQuery developers
- HTML Agility Pack(您已经找到了)
- CsQuery,一个 .Net jQuery 端口,非常适合 jQuery 开发人员