VBA:从 HTML 表格中抓取信息
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44653360/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
VBA: Scraping information from HTML Table
提问by Quint
I'm trying to pull information from an html table. I want to add each element with in the table to a collection. This is what I have so far.
我正在尝试从 html 表中提取信息。我想将表中的每个元素添加到集合中。这就是我到目前为止所拥有的。
Dim htmlTable As Object
Dim coll2 As Collection
Set coll2 = New Collection
Set IE = New InternetExplorerMedium
With IE
'.AddressBar = False
'.MenuBar = False
.Navigate ("PASSWORDED SITE")
.Visible = True
End With
Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set tableItem = IE.Document.getElementsByTagName("td")
With coll2
For Each tableItem In htmlTable.innerHTML
.Add tableItem
Next
End With
I have a problem with this line For Each tableItem In htmlTable.innerText
I tried diffent variations of htmlTable.innerText
each throwing differant errors.
我对这条线有问题,For Each tableItem In htmlTable.innerText
我尝试了htmlTable.innerText
每个抛出不同错误的不同变体。
This is the HTML Extract for the table.
这是表格的 HTML Extract。
<table class="Grid" id="ctl00_ContentPlaceHolder1_gvExtract" style="border-collapse: collapse;" border="1" rules="all" cellspacing="0">
<tbody><tr class="GridHeader" style="font-weight: bold;">
<th scope="col">Delete</th><th scope="col">Download</th><th scope="col">Extract Date</th><th scope="col">User Id Owner</th>
</tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
<td><a href='javascript:DoPostBack("DeleteExtract", 2942854)'>Delete</a></td>
<td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942854")'>Work Order Inquiry - Work Order</a></td>
<td>06/20/2017 07:50:37</td>
<td>MBMAYO</td>
</tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
<td><a href='javascript:DoPostBack("DeleteExtract", 2942836)'>Delete</a></td>
<td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2942836")'>Work Order Inquiry - Work Order</a></td>
<td>06/20/2017 07:39:29</td>
<td>MBMAYO</td>
</tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
<td><a href='javascript:DoPostBack("DeleteExtract", 2941835)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941835")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:23:54</td><td>MBMAYO</td>
</tr><tr class="GridItemEven" style="background-color: rgb(204, 204, 204);">
<td><a href='javascript:DoPostBack("DeleteExtract", 2941827)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941827")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:16:16</td><td>MBMAYO</td>
</tr><tr class="GridItemOdd" style="background-color: rgb(255, 255, 255);">
<td><a href='javascript:DoPostBack("DeleteExtract", 2941822)'>Delete</a></td><td><a href='javascript:OpenDownloadWindow("../Common/FileDownloader.aspx?fileKey=2941822")'>Work Order Inquiry - Work Order</a></td><td>06/20/2017 07:14:06</td><td>MBMAYO</td>
</tr>
</tbody></table>
The goal is to store each <td>
as an item for a collection and then retrieve the date for example <td>06/20/2017 07:50:37</td>
from it. This table Grows so I think an array is out of the question?
目标是将每个存储<td>
为集合的项目,然后例如<td>06/20/2017 07:50:37</td>
从中检索日期。这张表变大了,所以我认为数组是不可能的?
Editfrom comment:
从评论编辑:
I have been trying call this function, I'm getting a object does not support this method error:
我一直在尝试调用这个函数,我得到一个对象不支持这个方法错误:
Public Function htmlCell(id As String) As String
htmlCell = IE.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
.get??ElementsByTagName("t??d")(id).innerHTML
End Function
回答by dee
What you probably need is something like this. HTH
你可能需要的是这样的东西。HTH
Dim htmlTable As MSHTML.htmlTable
Dim htmlTableCells As MSHTML.IHTMLElementCollection
Dim htmlTableCell As MSHTML.htmlTableCell
Dim htmlAnchor As MSHTML.HTMLAnchorElement
Set htmlTable = ie.document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
Set htmlTableCells = htmlTable.getElementsByTagName("td")
With coll2
For Each htmlTableCell In htmlTableCells
If VBA.TypeName(htmlTableCell.FirstChild) = "HTMLAnchorElement" Then
Set htmlAnchor = htmlTableCell.FirstChild
.Add htmlAnchor.innerHTML
Else
.Add htmlTableCell.innerHTML
End If
Next
End With
Result
结果
Dim el
For Each el In coll2
Debug.Print el
Next el
Output:
输出:
Delete
Work Order Inquiry - Work Order
06/20/2017 07:50:37
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:39:29
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:23:54
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:16:16
MBMAYO
Delete
Work Order Inquiry - Work Order
06/20/2017 07:14:06
MBMAYO
回答by Andre
I would try something like this:
我会尝试这样的事情:
Dim htmlTable As Object
Dim collTD As Collection
Dim oNode as Object
' Set IE ...
Set htmlTable = IE.Document.getElementById("ctl00_ContentPlaceHolder1_gvExtract")
' You only want the td's inside htmlTable !
Set collTD = htmlTable.getElementsByTagName("td")
For Each oNode In collTD
Debug.Print oNode.InnerHTML
' Stop -> use Watch window to drill down into oNode subitems
Next oNode
and go from there.
从那里去。
回答by ASH
I think it should be something like this.
我认为它应该是这样的。
Sub Scrape_HTML()
Set ie = CreateObject("InternetExplorer.application")
With ie
.Visible = True
.navigate "your_URL_here"
' Wait for the page to fully load; you can't do anything if the page is not fully loaded
Do While .Busy Or _
.readyState <> 4
DoEvents
Loop
Set Links = ie.document.getElementsByTagName("tr")
RowCount = 1
' Scrape out the innertext of each 'tr' element.
With Sheets("DataSheet")
For Each lnk In Links
.Range("A" & RowCount) = lnk.innerText
RowCount = RowCount + 1
Next
End With
End Sub