vb.net 如何从 HTML 表格中提取数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14322216/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract data from a HTML table?
提问by gromit1
I recently downloaded HtmlAgilityPack but I haven't found any real instructions on how to use it. I have attempted to piece together some code based on some various discussion board posts and other sources. Here is what I have so far:
我最近下载了 HtmlAgilityPack 但我还没有找到任何关于如何使用它的真正说明。我试图根据一些不同的讨论板帖子和其他来源拼凑一些代码。这是我到目前为止所拥有的:
Private Sub Button3_Click(ByVal sender As System.Object, ByVal e As System.EventArgs)
Dim document As New HtmlAgilityPack.HtmlDocument
document.LoadHtml("www.reuters.com/finance/stocks/overview?symbol=GOOG")
Dim tabletag = document.DocumentNode.SelectSingleNode("//table[@class='data']/tr[1]/td[2]")
End Sub
As you can see I am working with the HTML from www.reuters.com/finance/stocks/overview?symbol=GOOG.
如您所见,我正在使用www.reuters.com/finance/stocks/overview?symbol=GOOG.
I am trying to extract the Beta value from this page. This value is currently 1.04.
我正在尝试从此页面中提取 Beta 值。该值当前为 1.04。
When I run the code above my immediate window shows this repeated 100 times:
当我运行上面的代码时,我的即时窗口显示重复了 100 次:
1.04
$243,156.41
328.59
--
--
Trading Report for (GOOG). A detailed report, including free correlated market analysis, and updates.
ValuEngine Detailed Valuation Report for GOOG
GOOGLE INC CL A (GOOG) 12-months forecast
GOOGLE INC CL A (GOOG) 2-weeks forecast
Google Inc: Business description, financial summary, 3yr and interim financials, key statistics/ratios and historical ratio analysis.
I only want the first number (1.04) returned. What am I doing wrong? Any suggestions?
我只想要返回第一个数字 (1.04)。我究竟做错了什么?有什么建议?
采纳答案by chrixbittinx
You need to use cookies and a proxy. The below works great for me. Let me know your thoughts:
您需要使用 cookie 和代理。以下对我很有用。让我知道你的想法:
Imports System.Net
Imports System.Web
Public Class Form1
Public cookies As New CookieContainer
Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
Dim wreq As HttpWebRequest = WebRequest.Create("http://www.reuters.com/finance/stocks/overview?symbol=GOOG")
wreq.UserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5"
wreq.Method = "get"
Dim prox As IWebProxy = wreq.Proxy
prox.Credentials = CredentialCache.DefaultCredentials
Dim document As New HtmlAgilityPack.HtmlDocument
Dim web As New HtmlAgilityPack.HtmlWeb
web.UseCookies = True
web.PreRequest = New HtmlAgilityPack.HtmlWeb.PreRequestHandler(AddressOf onPreReq)
wreq.CookieContainer = cookies
Dim res As HttpWebResponse = wreq.GetResponse()
document.Load(res.GetResponseStream, True)
'just for testing:
' Dim tabletag = document.DocumentNode.SelectNodes("//table")
' MsgBox(tabletag.Nodes.Count.ToString)
'returns your field
Dim tabletag2 = document.DocumentNode.SelectSingleNode("//td[@class='data']")
MsgBox(tabletag2.InnerText)
End Sub
Private Function onPreReq(req As HttpWebRequest)
req.CookieContainer = cookies
Return True
End Function
End Class

