如何解析 HTML 或将 HTML 转换为 XML,以便我从网站中提取信息(在 C# 中)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11304400/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to parse HTML or convert HTML to XML so I extract the information out of the website (in C#)
提问by Jerry
Possible Duplicate:
What is the best way to parse html in C#?
可能的重复:
在 C# 中解析 html 的最佳方法是什么?
Is there a way to parse HTML or convert HTML to XML so I extract the information out of the website easily?
有没有办法解析 HTML 或将 HTML 转换为 XML,以便我轻松地从网站中提取信息?
I'm working with C#.
我正在使用 C#。
Thank you,
谢谢,
采纳答案by Habib
HTMLAgilityPackis what you are looking for. Check out this tutorial Parsing HTML Document with HTMLAgilityPack
HTMLAgilityPack正是您要找的。查看本教程使用 HTMLAgilityPack 解析 HTML 文档
回答by Michael
You can use the COM objects in Microsoft HTML Object Libraryto load HTML, and then use it's object model to navigate around. An example is shown below:
您可以使用 COM 对象Microsoft HTML Object Library来加载 HTML,然后使用它的对象模型来导航。一个例子如下所示:
string html;
WebClient webClient = new WebClient();
using (Stream stream = webClient.OpenRead(new Uri("http://www.google.com")))
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
}
IHTMLDocument2 doc = (IHTMLDocument2)new HTMLDocument();
doc.write(html);
foreach (IHTMLElement el in doc.all)
Console.WriteLine(el.tagName);

![C# 将 DataRow[] 转换为 DataTable 而不会丢失其 DataSet](/res/img/loading.gif)