vb.net 如何从网站发布和检索数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14918108/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to Post & Retrieve Data from Website
提问by Alex
I am working with a Windows form application. I have a textbox called "tbPhoneNumber" which contains a phone number.
我正在使用 Windows 窗体应用程序。我有一个名为“tbPhoneNumber”的文本框,其中包含一个电话号码。
I want to go on the website http://canada411.comand enter in the number that was in my textbox, into the website textbox ID: "c411PeopleReverseWhat" and then somehow send a click on "Find" (which is an input belonging to class "c411ButtonImg").
我想访问网站http://canada411.com并输入我文本框中的数字,进入网站文本框 ID:“c411PeopleReverseWhat”,然后以某种方式发送点击“查找”(这是一个属于类“c411ButtonImg”)。
After that, I want to retrieve what is in between the asterixs of the following HTML section:
之后,我想检索以下 HTML 部分的星号之间的内容:
<div id="contact" class="vcard">
<span><h1 class="fn c411ListedName">**Full Name**</h1></span>
<span class="c411Phone">**(###)-###-####**</span>
<span class="c411Address">**Address**</span>
<span class="adr">
<span class="locality">**City**</span>
<span class="region">**Province**</span>
<span class="postal-code">**L#L#L#**</span>
</span>
So basically I am trying to send data into an input box, click the input button and store the values retrieved into variables. I want to do this seemlessly so I would need to do something like an HTTPWebRequest? Or do I use a WebBrowser object? I just don't want the user to see that the application is going on a website.
所以基本上我试图将数据发送到输入框中,单击输入按钮并将检索到的值存储到变量中。我想无缝地做到这一点,所以我需要做一些类似 HTTPWebRequest 的事情?还是我使用 WebBrowser 对象?我只是不希望用户看到该应用程序正在网站上运行。
回答by MonkeyDoug
I do a good amount of website scraping and I will show you how I do it. Feel free to skip ahead if I am being too specific, but this is a commonly requested theme and should be made specific.
我做了大量的网站抓取,我会告诉你我是怎么做的。如果我过于具体,请随意跳过,但这是一个普遍要求的主题,应该具体化。
URL Simplification
网址简化
The library I use for this is htmlagilitypack(It is a dll, make a new project and add a reference to it). The first thing to check is if we have to go to take any special steps to get to a page by using a phone number. I searched for John Smith and found quite a few. I entered 2 of these results and noticed that the url formatting is very simple. Those results were..
我为此使用的库是htmlagilitypack(它是一个 dll,创建一个新项目并添加对它的引用)。首先要检查的是,我们是否必须采取任何特殊步骤才能使用电话号码访问某个页面。我搜索了 John Smith 并找到了很多。我输入了其中的 2 个结果,并注意到 url 格式非常简单。这些结果是..
http://www.canada411.ca/res/7056736767/John-Smith/138223109.html
http://www.canada411.ca/res/7056736767/John-Smith/138223109.html
http://www.canada411.ca/res/7052355273/John-Smith/172439951.html
http://www.canada411.ca/res/7052355273/John-Smith/172439951.html
I tested to see if I can remove some of the values from the url that I don't know and just leave the phone number. The result was that I can...
我进行了测试,看看是否可以从 url 中删除一些我不知道的值,然后只留下电话号码。结果是我可以...
http://www.canada411.ca/search/re/1/7056736767/-
http://www.canada411.ca/search/re/1/7056736767/-
http://www.canada411.ca/search/re/1/7052355273/-
http://www.canada411.ca/search/re/1/7052355273/-
You can see by the url that there are some static areas in the url and our phone number. From this lets construct a string for the url.
通过url可以看到url和我们的电话号码中有一些静态区域。由此让我们为 url 构造一个字符串。
Dim phoneNumber as string = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL as string = "http://www.canada411.ca/search/re/1/" + phoneNumber +"/-"
Value Extraction with XPath
使用 XPath 提取值
Now that we have the page dialed in, lets examine the html you provided above. You need 6 values from the page so we will create them now...
现在我们已经拨入了页面,让我们检查上面提供的 html。您需要页面中的 6 个值,因此我们现在将创建它们...
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
As mentioned above, we will be using htmlagilitypack which uses Xpath. The cool thing about this is that once we can find some unique identifier in the html, we can use Xpath to find our values. I know it may be confusing, but it will become clearer.
如上所述,我们将使用使用Xpath 的htmlagilitypack 。很酷的一点是,一旦我们可以在 html 中找到一些唯一的标识符,我们就可以使用 Xpath 来找到我们的值。我知道这可能会令人困惑,但它会变得更清晰。
All of the values you need are within tags that have a class name. Lets use the class name in our Xpath to find them.
您需要的所有值都在具有类名的标签内。让我们使用 Xpath 中的类名来查找它们。
Dim FullNameXPath As String = "//*[@class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[@class='c411Phone']"
Dim AddressXPath As String = "//*[@class='c411Address']"
Dim LocalityXPath As String = "//*[@class='locality']"
Dim RegionXPath As String = "//*[@class='region']"
Dim PostalCodeXPath As String = "//*[@class='postal-code']"
Essentially what we are looking at is a string that will inform htmlagilitypack what to look for. In our case, text contained within the classes we named. There is a lot to XPath and it could take a while to explain all of it. On a side note though...If you use Google Chrome and highlight a value on a page, you can right click inspect element. In the code that appears below, you can right click the value and copy to XPath!!! Very useful.
本质上,我们正在查看的是一个字符串,它将通知 htmlagilitypack 要查找的内容。在我们的例子中,包含在我们命名的类中的文本。XPath 有很多内容,解释所有这些可能需要一段时间。附带说明一下...如果您使用 Google Chrome 并突出显示页面上的值,您可以右键单击检查元素。在下面出现的代码中,您可以右键单击该值并复制到XPath!!!很有用。
Basic HTMLAgilityPack Template
基本 HTMLAgilityPack 模板
Now, all that is left is to connect to the page and get those variables populated.
现在,剩下的就是连接到页面并填充这些变量。
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
Msgbox(nameResult.InnerText)
Next
In the above example we create an HtmlWeb object named Web. This is the actual crawler of our project. We then define a HtmlDocument which will consist of our converted and searchable page source. All of this is done behind the scenes. We then send Web to get the page source and assign it to the Doc object we created. Doc is reusable, which thankfully requires us to connect to the page only once.
在上面的例子中,我们创建了一个名为 Web 的 HtmlWeb 对象。这是我们项目的实际爬虫。然后我们定义一个 HtmlDocument,它将包含我们转换和可搜索的页面源。所有这些都是在幕后完成的。然后我们发送 Web 来获取页面源并将其分配给我们创建的 Doc 对象。Doc 是可重用的,值得庆幸的是,这要求我们只连接到该页面一次。
The for loop looks for any nodes in our Doc that match FullNameXPath which was defined previously as the XPath value for finding the name. When a Node is found, it is assigned to the nameResult variable and from within the loop we call a message box to display the inner text of our node.
for 循环在我们的 Doc 中查找与 FullNameXPath 匹配的任何节点,之前定义为用于查找名称的 XPath 值。找到节点后,将其分配给 nameResult 变量,然后在循环中调用消息框以显示节点的内部文本。
So when we put it all together
所以当我们把它们放在一起时
Complete Working Code (As of 2/17/2013)
完整的工作代码(截至 2/17/2013)
Dim phoneNumber As String = "7056736767" 'this could be TextBox1.Text or whatever
Dim URL As String = "http://www.canada411.ca/search/re/1/" + phoneNumber + "/-"
Dim FullName As String
Dim Phone As String
Dim Address As String
Dim Locality As String
Dim Region As String
Dim PostalCode As String
Dim FullNameXPath As String = "//*[@class='fn c411ListedName']"
Dim PhoneXPath As String = "//*[@class='c411Phone']"
Dim AddressXPath As String = "//*[@class='c411Address']"
Dim LocalityXPath As String = "//*[@class='locality']"
Dim RegionXPath As String = "//*[@class='region']"
Dim PostalCodeXPath As String = "//*[@class='postal-code']"
Dim Web As New HtmlAgilityPack.HtmlWeb
Dim Doc As New HtmlAgilityPack.HtmlDocument
Doc = Web.Load(URL)
For Each nameResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(FullNameXPath)
FullName = nameResult.InnerText
MsgBox(FullName)
Next
For Each PhoneResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PhoneXPath)
Phone = PhoneResult.InnerText
MsgBox(Phone)
Next
For Each ADDRResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(AddressXPath)
Address = ADDRResult.InnerText
MsgBox(Address)
Next
For Each LocalResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(LocalityXPath)
Locality = LocalResult.InnerText
MsgBox(Locality)
Next
For Each RegionResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(RegionXPath)
Region = RegionResult.InnerText
MsgBox(Region)
Next
For Each postalCodeResult As HtmlAgilityPack.HtmlNode In Doc.DocumentNode.SelectNodes(PostalCodeXPath)
PostalCode = postalCodeResult.InnerText
MsgBox(PostalCode)
Next
回答by Ulises
Yes it is possible, I've done this using the selenium framework, which is aimed for testing automation. However, it provides you with the tools to do exactly that.
是的,这是可能的,我已经使用selenium 框架完成了这项工作,该框架旨在测试自动化。但是,它为您提供了执行此操作的工具。
Download for .net here: http://docs.seleniumhq.org/download/
在此处下载 .net:http: //docs.seleniumhq.org/download/

