vba 在 HTMLElement 上使用 getElementById 而不是 HTMLDocument

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15191847/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:04:30  来源:igfitidea点击:

Use getElementById on HTMLElement instead of HTMLDocument

vbaweb-scraping

提问by NickSlash

I've been playing around with scraping data from web pages using VBS/VBA.

我一直在使用 VBS/VBA 从网页中抓取数据。

If it were Javascript I'd be away as its easy, but it doesn't seem to be quite as straight forward in VBS/VBA.

如果它是 Javascript,我会很容易离开,但它在 VBS/VBA 中似乎并不那么简单。

This is an example I made for an answer, it works but I had planned on accessing the child nodes using getElementByTagNamebut I could not figure out how to use them! The HTMLElementobject does not have those methods.

这是我为回答而制作的示例,它有效,但我曾计划使用访问子节点,getElementByTagName但我不知道如何使用它们!该HTMLElement对象没有这些方法。

Sub Scrape()
Dim Browser As InternetExplorer
Dim Document As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement

Set Browser = New InternetExplorer

Browser.navigate "http://www.hsbc.com/about-hsbc/leadership"

Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
    DoEvents
Loop

Set Document = Browser.Document

Set Elements = Document.getElementsByClassName("profile-col1")

For Each Element in Elements
    Debug.Print "[  name] " & Trim(Element.Children(1).Children(0).innerText)
    Debug.Print "[ title] " & Trim(Element.Children(1).Children(1).innerText)
Next Element

Set Document = Nothing
Set Browser = Nothing
End Sub

I have been looking at the HTMLElement.documentproperty, seeing if it is like a fragment of the document but its either difficult to work with or just isnt what I think

我一直在查看该HTMLElement.document属性,看它是否像文档的片段,但它要么难以处理,要么不是我认为的

Dim Fragment As HTMLDocument
Set Element = Document.getElementById("example") ' This works
Set Fragment = Element.document ' This doesn't

This also seems a long winded way to do it (although thats usually the way for vba imo). Anyone know if there is a simpler way to chain functions?

这似乎也是一种冗长的方法(尽管这通常是 vba imo 的方法)。任何人都知道是否有更简单的方法来链接函数?

Document.getElementById("target").getElementsByTagName("tr")would be awesome...

Document.getElementById("target").getElementsByTagName("tr")会很棒...

采纳答案by mkingston

I don't like it either.

我也不喜欢。

So use javascript:

所以使用javascript:

Public Function GetJavaScriptResult(doc as HTMLDocument, jsString As String) As String

    Dim el As IHTMLElement
    Dim nd As HTMLDOMTextNode

    Set el = doc.createElement("INPUT")
    Do
        el.ID = GenerateRandomAlphaString(100)
    Loop Until Document.getElementById(el.ID) Is Nothing
    el.Style.display = "none"
    Set nd = Document.appendChild(el)

    doc.parentWindow.ExecScript "document.getElementById('" & el.ID & "').value = " & jsString

    GetJavaScriptResult = Document.getElementById(el.ID).Value

    Document.removeChild nd

End Function


Function GenerateRandomAlphaString(Length As Long) As String

    Dim i As Long
    Dim Result As String

    Randomize Timer

    For i = 1 To Length
        Result = Result & Chr(Int(Rnd(Timer) * 26 + 65 + Round(Rnd(Timer)) * 32))
    Next i

    GenerateRandomAlphaString = Result

End Function

Let me know if you have any problems with this; I've changed the context from a method to a function.

如果您对此有任何问题,请告诉我;我已将上下文从方法更改为函数。

By the way, what version of IE are you using? I suspect you're on < IE8. If you upgrade to IE8 I presume it'll update shdocvw.dll to ieframe.dll and you will be able to use document.querySelector/All.

顺便问一下,你用的是什么版本的IE?我怀疑您使用的是 < IE8。如果您升级到 IE8,我认为它会将 shdocvw.dll 更新为 ieframe.dll,您将能够使用 document.querySelector/All。

Edit

编辑

Comment response which isn't really a comment: Basically the way to do this in VBA is to traverse the child nodes. The problem is you don't get the correct return types. You could fix this by making your own classes that (separately) implement IHTMLElement and IHTMLElementCollection; but that's WAY too much of a pain for me to do it without getting paid :). If you're determined, go and read up on the Implements keyword for VB6/VBA.

评论响应,这不是真正的评论:基本上在 VBA 中执行此操作的方法是遍历子节点。问题是你没有得到正确的返回类型。您可以通过创建自己的类(分别)实现 IHTMLElement 和 IHTMLElementCollection 来解决此问题;但这对我来说太痛苦了而没有得到报酬:)。如果您下定决心,请阅读 VB6/VBA 的实现关键字。

Public Function getSubElementsByTagName(el As IHTMLElement, tagname As String) As Collection

    Dim descendants As New Collection
    Dim results As New Collection
    Dim i As Long

    getDescendants el, descendants

    For i = 1 To descendants.Count
        If descendants(i).tagname = tagname Then
            results.Add descendants(i)
        End If
    Next i

    getSubElementsByTagName = results

End Function

Public Function getDescendants(nd As IHTMLElement, ByRef descendants As Collection)
    Dim i As Long
    descendants.Add nd
    For i = 1 To nd.Children.Length
        getDescendants nd.Children.Item(i), descendants
    Next i
End Function

回答by dee

Sub Scrape()
    Dim Browser As InternetExplorer
    Dim Document As htmlDocument
    Dim Elements As IHTMLElementCollection
    Dim Element As IHTMLElement

    Set Browser = New InternetExplorer
    Browser.Visible = True
    Browser.navigate "http://www.stackoverflow.com"

    Do While Browser.Busy And Not Browser.readyState = READYSTATE_COMPLETE
        DoEvents
    Loop

    Set Document = Browser.Document

    Set Elements = Document.getElementById("hmenus").getElementsByTagName("li")
    For Each Element In Elements
        Debug.Print Element.innerText
        'Questions
        'Tags
        'Users
        'Badges
        'Unanswered
        'Ask Question
    Next Element

    Set Document = Nothing
    Set Browser = Nothing
End Sub

回答by WizzleWuzzle

Thanks to dee for the answer above with the Scrape() subroutine. The code worked perfectly as written, and I was able to then convert the code to work with the specific website I am trying to scrape.

感谢 dee 对 Scrape() 子例程的回答。代码与编写的一样完美,然后我能够将代码转换为与我试图抓取的特定网站一起使用。

I do not have enough reputation to upvote or to comment, but I do actually have some minor improvements to add to dee's answer:

我没有足够的声誉来支持或发表评论,但实际上我确实有一些小的改进可以添加到 dee 的答案中:

  1. You will need to add the VBA Reference via "Tools\References" to "Microsoft HTML Object Library in order for the code to compile.

  2. I commented out the Browser.Visible line and added the comment as follows

    'if you need to debug the browser page, uncomment this line:
    'Browser.Visible = True
    
  3. And I added a line to close the browser before Set Browser = Nothing:

    Browser.Quit
    
  1. 您需要通过“Tools\References”将 VBA 参考添加到“Microsoft HTML 对象库”中,以便编译代码。

  2. 我注释掉了 Browser.Visible 行并添加了如下注释

    'if you need to debug the browser page, uncomment this line:
    'Browser.Visible = True
    
  3. 我在 Set Browser = Nothing 之前添加了一行来关闭浏览器:

    Browser.Quit
    

Thanks again dee!

再次感谢迪!

ETA: this works on machines with IE9, but not machines with IE8. Anyone have a fix?

ETA:这适用于装有 IE9 的机器,但不适用于装有 IE8 的机器。有人有修复吗?

Found the fix myself, so came back here to post it. The ClassName function is available in IE9. For this to work in IE8, you use querySelectorAll, with a dot preceding the class name of the object you are looking for:

我自己找到了修复程序,所以回到这里发布它。ClassName 函数在 IE9 中可用。为了在 IE8 中工作,您可以使用 querySelectorAll,并在您要查找的对象的类名前加一个点:

'Set repList = doc.getElementsByClassName("reportList") 'only works in IE9, not in IE8
Set repList = doc.querySelectorAll(".reportList")       'this works in IE8+

回答by QHarr

I would use XMLHTTP request to retrieve page content as much faster. Then it is easy enough to use querySelectorAll to apply a CSS class selector to grab by class name. Then you access the child elements by tag name and index.

我会使用 XMLHTTP 请求来更快地检索页面内容。然后很容易使用 querySelectorAll 来应用 CSS 类选择器来按类名抓取。然后通过标签名称和索引访问子元素。

Option Explicit
Public Sub GetInfo()
    Dim sResponse As String, html As HTMLDocument, elements As Object, i As Long

    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "https://www.hsbc.com/about-hsbc/leadership", False
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        sResponse = StrConv(.responseBody, vbUnicode)
    End With
    Set html = New HTMLDocument
    With html
        .body.innerHTML = sResponse
        Set elements = .querySelectorAll(".profile-col1")
        For i = 0 To elements.Length - 1
            Debug.Print String(20, Chr$(61))
            Debug.Print elements.item(i).getElementsByTagName("a")(0).innerText
            Debug.Print elements.item(i).getElementsByTagName("p")(0).innerText
            Debug.Print elements.item(i).getElementsByTagName("p")(1).innerText
        Next
    End With
End Sub


References:

参考:

VBE > Tools > References > Microsoft HTML Object Library

VBE > 工具 > 参考 > Microsoft HTML 对象库