vb.net 将特定文本从网站抓取到 VB 上的应用程序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20582241/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Scraping specific text from website to Application on VB
提问by Vija02
I'm trying to create a simple app which is basically used to compare stuff on several websites. I've seen some ways to extract all the text to the app. But is there any way to extract say, only the Title and Description.
我正在尝试创建一个简单的应用程序,它基本上用于比较多个网站上的内容。我已经看到了一些将所有文本提取到应用程序的方法。但是有没有办法提取说,只有标题和描述。
Take a book site as an example. Is there anyway to search a book title then show all different reviews, synopsis, prices without having any unusefull text there?
以图书网站为例。有没有办法搜索书名然后显示所有不同的评论、概要、价格而没有任何无用的全文?
采纳答案by Bj?rn-Roger Kringsj?
A quick and simple solution is to use a WebBrowserwhich exposes a HtmlDocumentthrough it's .Documentproperty.
一个快速而简单的解决方案是使用一个WebBrowser,它通过它的属性公开一个HtmlDocument.Document。
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Me.WebBrowser1.ScriptErrorsSuppressed = True
Me.WebBrowser1.Navigate(New Uri("http://stackoverflow.com/"))
End Sub
Private Sub WebBrowser1_DocumentCompleted(sender As Object, e As WebBrowserDocumentCompletedEventArgs) Handles WebBrowser1.DocumentCompleted
Dim document As HtmlDocument = Me.WebBrowser1.Document
Dim title As String = Me.GetTitle(document)
Dim description As String = Me.GetMeta(document, "description")
Dim keywords As String = Me.GetMeta(document, "keywords")
Dim author As String = Me.GetMeta(document, "author")
End Sub
Private Function GetTitle(document As HtmlDocument) As String
Dim head As HtmlElement = Me.GetHead(document)
If (Not head Is Nothing) Then
For Each el As HtmlElement In head.GetElementsByTagName("title")
Return el.InnerText
Next
End If
Return String.Empty
End Function
Private Function GetMeta(document As HtmlDocument, name As String) As String
Dim head As HtmlElement = Me.GetHead(document)
If (Not head Is Nothing) Then
For Each el As HtmlElement In head.GetElementsByTagName("meta")
If (String.Compare(el.GetAttribute("name"), name, True) = 0) Then
Return el.GetAttribute("content")
End If
Next
End If
Return String.Empty
End Function
Private Function GetHead(document As HtmlDocument) As HtmlElement
For Each el As HtmlElement In document.GetElementsByTagName("head")
Return el
Next
Return Nothing
End Function
End Class

