VB.net 使用 HtmlAgilityPack 获取 href 的 InnerText
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18374412/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
VB.net Getting the InnerText of href using HtmlAgilityPack
提问by Marc Intes
I have now updated my code (Thanks Tim for helping me learn) which is already working but it doesn't give me the right links i want.
我现在已经更新了我的代码(感谢 Tim 帮助我学习),它已经在工作,但它没有给我想要的正确链接。
Here is my working code:
这是我的工作代码:
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim webClient As New System.Net.WebClient
Dim WebSource As String = webClient.DownloadString("http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA")
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(WebSource)
Dim links = GetLinks(doc, "test")
For Each Link In links
ListBox1.Items.Add(Link.ToString())
Next
End Sub
Public Class Link
Public Sub New(Uri As Uri, Text As String)
Me.Uri = Uri
Me.Text = Text
End Sub
Public Property Text As String
Public Property Uri As Uri
Public Overrides Function ToString() As String
Return String.Format(If(Uri Is Nothing, "", Uri.ToString()))
End Function
End Class
Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument, linkContains As String) As List(Of Link)
Dim uri As Uri = Nothing
Dim linksOnPage = From link In doc.DocumentNode.Descendants()
Where link.Name = "a" _
AndAlso link.Attributes("href") IsNot Nothing _
Let text = link.InnerText.Trim()
Let url = link.Attributes("href").Value
Where url.IndexOf(linkContains, StringComparison.OrdinalIgnoreCase) >= 0 _
AndAlso uri.TryCreate(url, UriKind.Absolute, uri)
Dim Uris As New List(Of Link)()
For Each link In linksOnPage
Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
Next
Return Uris
End Function
I am currently new to this HtmlAgilityPack, I am still learning please bear with me.
我目前是这个 HtmlAgilityPack 的新手,我还在学习,请耐心等待。
My Main Goal:
我的主要目标:
Sample link: http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA
示例链接: http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA
My expected link outputs which contains the word "test":
我预期的链接输出包含“测试”一词:
www.copetest.com/?
www.testofhumanity.com/
www3.algonquincollege.com/testcentre/?
www.lpitest.ca/?
testtube.nfb.ca/?
www.ieltscanada.ca/testdates.jsp?
https://www.awinfosys.com/eassessment/fsa_fieldtest.htm?
采纳答案by Tim Schmelter
You shoud use the attribute hrefinstead, also note that .NET is case-sensitive by default
您应该改用该属性href,还要注意 .NET 默认区分大小写
For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")
Dim href = link.Attributes("href").Value
If href.IndexOf("test", StringComparison.OrdinalIgnoreCase) >= 0 Then
ListBox1.Items.Add(href)
' or
ListBox1.Items.Add(link.InnerText)
End If
Next
Here is a method that should return all links in a document as List(Of Link). Linkis a custom class with two perties, one for the text and the other for the Uri:
这是一个应该将文档中的所有链接返回为List(Of Link). Link是一个具有两个属性的自定义类,一个用于文本,另一个用于Uri:
Public Class Link
Public Sub New(Uri As Uri, Text As String)
Me.Uri = Uri
Me.Text = Text
End Sub
Public Property Text As String
Public Property Uri As Uri
Public Overrides Function ToString() As String
Return String.Format("{0} [{1}]", Text, If(Uri Is Nothing, "", Uri.ToString()))
End Function
End Class
Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument) As List(Of Link)
Dim uri As Uri = Nothing
Dim linksOnPage = From link In doc.DocumentNode.Descendants()
Where link.Name = "a" _
AndAlso link.Attributes("href") IsNot Nothing _
Let text = link.InnerText.Trim()
Let url = link.Attributes("href").Value
Where uri.TryCreate(url, UriKind.Absolute, uri)
Dim Uris As New List(Of Link)()
For Each link In linksOnPage
Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
Next
Return Uris
End Function
Here is the requested overload that checks if an url contains a given text:
这是检查 url 是否包含给定文本的请求重载:
Public Function GetLinks(doc As HtmlAgilityPack.HtmlDocument, linkContains As String) As List(Of Link)
Dim uri As Uri = Nothing
Dim linksOnPage = From link In doc.DocumentNode.Descendants()
Where link.Name = "a" _
AndAlso link.Attributes("href") IsNot Nothing _
Let text = link.InnerText.Trim()
Let url = link.Attributes("href").Value
Where url.IndexOf(linkContains, StringComparison.OrdinalIgnoreCase) >= 0 _
AndAlso uri.TryCreate(url, UriKind.Absolute, uri)
Dim Uris As New List(Of Link)()
For Each link In linksOnPage
Uris.Add(New Link(New Uri(link.url, UriKind.Absolute), link.text))
Next
Return Uris
End Function
Editednow tested, works, use it in the following way:
编辑现在测试,工作,按以下方式使用它:
Dim site = File.ReadAllText("C:\Temp\website_test.htm")
Dim doc = New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(site)
Dim links = GetLinks(doc)
For Each Link In links
ListBox1.Items.Add(Link.ToString())
Next

