VBA 网页抓取(getelementsbyclassname)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41851152/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-12 11:57:17  来源:igfitidea点击:

VBA Web Scrape (getelementsbyclassname)

excelvbaexcel-vbaweb-scraping

提问by Dushyant

I am trying to scrape a list of VBA course items given at the right pane of the following link "www.tutorialspoint.com/vba/index.htm"

我正在尝试抓取以下链接“www.tutorialspoint.com/vba/index.htm”右侧窗格中给出的 VBA 课程项目列表

But I am unable to scrape the list due to some error:

但由于某些错误,我无法抓取列表:

Sub tutorailpointsscrap()
      Dim ie As InternetExplorer

      Set ie = New InternetExplorer

      With ie
      .navigate "https://www.tutorialspoint.com//vba/index.htm"
      .Visible = True
      Do While ie.readyState <> READYSTATE_COMPLETE
      DoEvents
      Loop
      End With

      Dim html As HTMLDocument
      Set html = ie.document


      Dim ele As IHTMLElement

      Dim lists As IHTMLElementCollection
      Dim row As Long

      Set ele = html.getElementsByClassName("nav nav-list primary left-menu")

      Set lists = ele.getElementsByTagName("a")
      row = 1


      For Each li In lists
      Cells(row, 1) = li.innerText
      row = row + 1
      Next

      ie.Quit

  End Sub

The HTML that contains the data is:

包含数据的 HTML 是:

<ul class="nav nav-list primary left-menu">
<li class="heading">VBA Tutorial</li>
<li><a href="/vba/index.htm" style="background-color: rgb(214, 214, 214);">VBA - Home</a></li>
<li><a href="/vba/vba_overview.htm">VBA - Overview</a></li>
<li><a href="/vba/vba_excel_macros.htm">VBA - Excel Macros</a></li>
<li><a href="/vba/vba_excel_terms.htm">VBA - Excel Terms</a></li>
<li><a href="/vba/vba_macro_comments.htm">VBA - Macro Comments</a></li>
<li><a href="/vba/vba_message_box.htm">VBA - Message Box</a></li>
<li><a href="/vba/vba_input_box.htm">VBA - Input Box</a></li>
<li><a href="/vba/vba_variables.htm">VBA - Variables</a></li>
<li><a href="/vba/vba_constants.htm">VBA - Constants</a></li>
<li><a href="/vba/vba_operators.htm">VBA - Operators</a></li>
<li><a href="/vba/vba_decisions.htm">VBA - Decisions</a></li>
<li><a href="/vba/vba_loops.htm">VBA - Loops</a></li>
<li><a href="/vba/vba_strings.htm">VBA - Strings</a></li>
<li><a href="/vba/vba_date_time.htm">VBA - Date and Time</a></li>
<li><a href="/vba/vba_arrays.htm">VBA - Arrays</a></li>
<li><a href="/vba/vba_functions.htm">VBA - Functions</a></li>
<li><a href="/vba/vba_sub_procedure.htm">VBA - SubProcedure</a></li>
<li><a href="/vba/vba_events.htm">VBA - Events</a></li>
<li><a href="/vba/vba_error_handling.htm">VBA - Error Handling</a></li>
<li><a href="/vba/vba_excel_objects.htm">VBA - Excel Objects</a></li>
<li><a href="/vba/vba_text_files.htm">VBA - Text Files</a></li>
<li><a href="/vba/vba_programming_charts.htm">VBA - Programming Charts</a></li>
<li><a href="/vba/vba_userforms.htm">VBA - Userforms</a></li>
</ul>

回答by dee

If I understood your problem correctly you want the following:

如果我正确理解您的问题,您需要以下内容:

Dim lists As IHTMLElementCollection
Dim anchorElements As IHTMLElementCollection
Dim ulElement As HTMLUListElement
Dim liElement As HTMLLIElement
Dim row As Long

Set lists = html.getElementsByClassName("nav nav-list primary left-menu")
row = 1

For Each ulElement In lists
    For Each liElement In ulElement.getElementsByTagName("li")
        Set anchorElements = liElement.getElementsByTagName("a")
        If anchorElements.Length > 0 Then
            Cells(row, 1) = anchorElements.Item(0).innerText
            row = row + 1
        End If
    Next liElement
Next ulElement

Resulting in this (for all lists):

导致这个(对于所有列表):

VBA - Home
VBA - Overview
VBA - Excel Macros
VBA - Excel Terms
VBA - Macro Comments
VBA - Message Box
VBA - Input Box
VBA - Variables
VBA - Constants
VBA - Operators
VBA - Decisions
VBA - Loops
VBA - Strings
VBA - Date and Time
VBA - Arrays
VBA - Functions
VBA - SubProcedure
VBA - Events
VBA - Error Handling
VBA - Excel Objects
VBA - Text Files
VBA - Programming Charts
VBA - Userforms
VBA - Quick Guide
VBA - Useful Resources
VBA - Discussion
Developer's Best Practices
Questions and Answers
Effective Resume Writing
HR Interview Questions
Computer Glossary
Who is Who

If you want content of anchors of just the first list then just like this.

如果您只想要第一个列表的锚点内容,那么就像这样。

For Each liElement In lists.Item(0).getElementsByTagName("li")
    Set anchorElements = liElement.getElementsByTagName("a")
    If anchorElements.Length > 0 Then
        Cells(row, 1) = anchorElements.Item(0).innerText
        row = row + 1
    End If
Next liElement

回答by Sajan Kaundal

Sub Button1_Click()

Dim internet As Object
Dim URL As String

Set internet = CreateObject("InternetExplorer.Application")
internet.Visible = True

For i = 2 To 3

URL = Sheets("Sheet2").Range("A" & i).Value
            internet.Navigate URL

 Application.Wait Now + TimeSerial(0, 0, 15)

 Do Until internet.ReadyState >= 4
    DoEvents
Loop


 Set a = internet.document
         Set lists = a.GetElementsByClassName("mg-results-td is-sv uk-flex uk-flex-middle")(0)
         'Range("B" & i).Value = e.NextSibling.innerText
         'Range("B" & i).Value = "Sajan"
         'For Each ulElement In lists
             Range("B" & i).Value = lists.innerText
         'Next ulElement

'internet.GoBack
Application.Wait Now + TimeSerial(0, 0, 50)
Next i
End Sub

回答by Sajan Kaundal

How about this one:

这个怎么样:

Sub TutorailsPoint()
Const URL = "https://www.tutorialspoint.com//vba/index.htm"
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, posts As Object, topic As Object
Dim x As Long

x = 2

http.Open "GET", URL, False
http.send
html.body.innerHTML = http.responseText

Set topics = html.getElementsByClassName("nav nav-list primary left-menu")
For Each posts In topics
    For Each topic In posts.getElementsByTagName("a")
        Cells(x, 1) = topic.innerText
        x = x + 1
    Next topic
Next posts
End Sub

回答by MITHU

Here is another way you may like. It will give you just the tutorials and nothing else:

这是您可能喜欢的另一种方式。它只会给你教程而不是别的:

Sub TpData()
    Const URL = "https://www.tutorialspoint.com//vba/index.htm"
    Dim http As New XMLHTTP60, html As New HTMLDocument, post As Object

    With http
        .Open "GET", URL, False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each post In html.getElementsByClassName("left-menu")(1).getElementsByTagName("li")
        With post.getElementsByTagName("a")
            If .Length Then i = i + 1: Cells(i, 1) = .item(0).innerText
        End With
    Next post
End Sub