vba 在 HTMLElement 上使用 getElementById 而不是 HTMLDocument
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15191847/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Use getElementById on HTMLElement instead of HTMLDocument
提问by NickSlash
I've been playing around with scraping data from web pages using VBS/VBA.
我一直在使用 VBS/VBA 从网页中抓取数据。
If it were Javascript I'd be away as its easy, but it doesn't seem to be quite as straight forward in VBS/VBA.
如果它是 Javascript,我会很容易离开,但它在 VBS/VBA 中似乎并不那么简单。
This is an example I made for an answer, it works but I had planned on accessing the child nodes using getElementByTagName
but I could not figure out how to use them! The HTMLElement
object does not have those methods.
这是我为回答而制作的示例,它有效,但我曾计划使用访问子节点,getElementByTagName
但我不知道如何使用它们!该HTMLElement
对象没有这些方法。
Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set Browser = New InternetExplorer
Browser.navigate "http://www.hsbc.com/about-hsbc/leadership"
Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set Document = Browser.Document
Set Elements = Document.getElementsByClassName("profile-col1")
For Each Element in Elements
Debug.Print "[ name] " & Trim(Element.Children(1).Children(0).innerText)
Debug.Print "[ title] " & Trim(Element.Children(1).Children(1).innerText)
Next Element
Set Document = Nothing
Set Browser = Nothing
End Sub
I have been looking at the HTMLElement.document
property, seeing if it is like a fragment of the document but its either difficult to work with or just isnt what I think
我一直在查看该HTMLElement.document
属性,看它是否像文档的片段,但它要么难以处理,要么不是我认为的
Dim Fragment As HTMLDocument
Set Element = Document.getElementById("example") ' This works
Set Fragment = Element.document ' This doesn't
This also seems a long winded way to do it (although thats usually the way for vba imo). Anyone know if there is a simpler way to chain functions?
这似乎也是一种冗长的方法(尽管这通常是 vba imo 的方法)。任何人都知道是否有更简单的方法来链接函数?
Document.getElementById("target").getElementsByTagName("tr")
would be awesome...
Document.getElementById("target").getElementsByTagName("tr")
会很棒...
采纳答案by mkingston
I don't like it either.
我也不喜欢。
So use javascript:
所以使用javascript:
Public Function GetJavaScriptResult(doc as HTMLDocument, jsString As String) As String
Dim el As IHTMLElement
Dim nd As HTMLDOMTextNode
Set el = doc.createElement("INPUT")
Do
el.ID = GenerateRandomAlphaString(100)
Loop Until Document.getElementById(el.ID) Is Nothing
el.Style.display = "none"
Set nd = Document.appendChild(el)
doc.parentWindow.ExecScript "document.getElementById('" & el.ID & "').value = " & jsString
GetJavaScriptResult = Document.getElementById(el.ID).Value
Document.removeChild nd
End Function
Function GenerateRandomAlphaString(Length As Long) As String
Dim i As Long
Dim Result As String
Randomize Timer
For i = 1 To Length
Result = Result & Chr(Int(Rnd(Timer) * 26 + 65 + Round(Rnd(Timer)) * 32))
Next i
GenerateRandomAlphaString = Result
End Function
Let me know if you have any problems with this; I've changed the context from a method to a function.
如果您对此有任何问题,请告诉我;我已将上下文从方法更改为函数。
By the way, what version of IE are you using? I suspect you're on < IE8. If you upgrade to IE8 I presume it'll update shdocvw.dll to ieframe.dll and you will be able to use document.querySelector/All.
顺便问一下,你用的是什么版本的IE?我怀疑您使用的是 < IE8。如果您升级到 IE8,我认为它会将 shdocvw.dll 更新为 ieframe.dll,您将能够使用 document.querySelector/All。
Edit
编辑
Comment response which isn't really a comment: Basically the way to do this in VBA is to traverse the child nodes. The problem is you don't get the correct return types. You could fix this by making your own classes that (separately) implement IHTMLElement and IHTMLElementCollection; but that's WAY too much of a pain for me to do it without getting paid :). If you're determined, go and read up on the Implements keyword for VB6/VBA.
评论响应,这不是真正的评论:基本上在 VBA 中执行此操作的方法是遍历子节点。问题是你没有得到正确的返回类型。您可以通过创建自己的类(分别)实现 IHTMLElement 和 IHTMLElementCollection 来解决此问题;但这对我来说太痛苦了而没有得到报酬:)。如果您下定决心,请阅读 VB6/VBA 的实现关键字。
Public Function getSubElementsByTagName(el As IHTMLElement, tagname As String) As Collection
Dim descendants As New Collection
Dim results As New Collection
Dim i As Long
getDescendants el, descendants
For i = 1 To descendants.Count
If descendants(i).tagname = tagname Then
results.Add descendants(i)
End If
Next i
getSubElementsByTagName = results
End Function
Public Function getDescendants(nd As IHTMLElement, ByRef descendants As Collection)
Dim i As Long
descendants.Add nd
For i = 1 To nd.Children.Length
getDescendants nd.Children.Item(i), descendants
Next i
End Function
回答by dee
Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As htmlDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set Browser = New InternetExplorer
Browser.Visible = True
Browser.navigate "http://www.stackoverflow.com"
Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
DoEvents
Loop
Set Document = Browser.Document
Set Elements = Document.getElementById("hmenus").getElementsByTagName("li")
For Each Element In Elements
Debug.Print Element.innerText
'Questions
'Tags
'Users
'Badges
'Unanswered
'Ask Question
Next Element
Set Document = Nothing
Set Browser = Nothing
End Sub
回答by WizzleWuzzle
Thanks to dee for the answer above with the Scrape() subroutine. The code worked perfectly as written, and I was able to then convert the code to work with the specific website I am trying to scrape.
感谢 dee 对 Scrape() 子例程的回答。代码与编写的一样完美,然后我能够将代码转换为与我试图抓取的特定网站一起使用。
I do not have enough reputation to upvote or to comment, but I do actually have some minor improvements to add to dee's answer:
我没有足够的声誉来支持或发表评论,但实际上我确实有一些小的改进可以添加到 dee 的答案中:
You will need to add the VBA Reference via "Tools\References" to "Microsoft HTML Object Library in order for the code to compile.
I commented out the Browser.Visible line and added the comment as follows
'if you need to debug the browser page, uncomment this line: 'Browser.Visible = True
And I added a line to close the browser before Set Browser = Nothing:
Browser.Quit
您需要通过“Tools\References”将 VBA 参考添加到“Microsoft HTML 对象库”中,以便编译代码。
我注释掉了 Browser.Visible 行并添加了如下注释
'if you need to debug the browser page, uncomment this line: 'Browser.Visible = True
我在 Set Browser = Nothing 之前添加了一行来关闭浏览器:
Browser.Quit
Thanks again dee!
再次感谢迪!
ETA: this works on machines with IE9, but not machines with IE8. Anyone have a fix?
ETA:这适用于装有 IE9 的机器,但不适用于装有 IE8 的机器。有人有修复吗?
Found the fix myself, so came back here to post it. The ClassName function is available in IE9. For this to work in IE8, you use querySelectorAll, with a dot preceding the class name of the object you are looking for:
我自己找到了修复程序,所以回到这里发布它。ClassName 函数在 IE9 中可用。为了在 IE8 中工作,您可以使用 querySelectorAll,并在您要查找的对象的类名前加一个点:
'Set repList = doc.getElementsByClassName("reportList") 'only works in IE9, not in IE8
Set repList = doc.querySelectorAll(".reportList") 'this works in IE8+
回答by QHarr
I would use XMLHTTP request to retrieve page content as much faster. Then it is easy enough to use querySelectorAll to apply a CSS class selector to grab by class name. Then you access the child elements by tag name and index.
我会使用 XMLHTTP 请求来更快地检索页面内容。然后很容易使用 querySelectorAll 来应用 CSS 类选择器来按类名抓取。然后通过标签名称和索引访问子元素。
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, html As HTMLDocument, elements As Object, i As Long
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.hsbc.com/about-hsbc/leadership", False
.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
Set html = New HTMLDocument
With html
.body.innerHTML = sResponse
Set elements = .querySelectorAll(".profile-col1")
For i = 0 To elements.Length - 1
Debug.Print String(20, Chr$(61))
Debug.Print elements.item(i).getElementsByTagName("a")(0).innerText
Debug.Print elements.item(i).getElementsByTagName("p")(0).innerText
Debug.Print elements.item(i).getElementsByTagName("p")(1).innerText
Next
End With
End Sub
References:
参考:
VBE > Tools > References > Microsoft HTML Object Library
VBE > 工具 > 参考 > Microsoft HTML 对象库