C# Html 敏捷包。加载和抓取网页
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10558149/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Html Agility Pack. Load and scrape webpage
提问by thatsIT
Is this the bestway to get a webpage when scraping?
这是抓取时获取网页的最佳方式吗?
HttpWebRequest oReq = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse resp = (HttpWebResponse)oReq.GetResponse();
var doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(resp.GetResponseStream());
var element = doc.GetElementbyId("//start-left");
var element2 = doc.DocumentNode.SelectSingleNode("//body");
string html = doc.DocumentNode.OuterHtml;
I've seen HtmlWeb().Loadto get a webpage. Is that a better alternative to load and the scrape the webpage?
我已经看到HtmlWeb().Load得到一个网页。这是加载和抓取网页的更好选择吗?
Ok i'll try that instead.
好的,我会尝试的。
HtmlDocument doc = web.Load(url);
Now when i got my docand didn't get so mutch properties. No one like SelectSingleNode. The only one I can use is GetElementById, and that works but I whant to get a class.
现在,当我得到我的doc但没有得到如此多的财产时。没有人喜欢SelectSingleNode。我唯一可以使用的是GetElementById,它有效,但我想上课。
Do I need to do it like this?
我需要这样做吗?
var htmlBody = doc.DocumentNode.SelectSingleNode("//body");
htmlBody.SelectSingleNode("//paging");
回答by Jacob Proffitt
Much easier to use HtmlWeb.
使用 HtmlWeb 更容易。
string Url = "http://something";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);

