VBA - 查找前面的 html 标签

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20912246/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-12 01:11:56  来源:igfitidea点击:

VBA - Find preceding html tag

htmlexcelvbaexcel-vba

提问by LoxBagel

Say I have HTML source that looks like this

假设我有这样的 HTML 源代码

<div id="book-info"> 
  <span class="title">Weather</span>
  <span class="title">Title Of Book</span>
  <p><a href="http://test.com?MMC_ID=34343">Buy Now</a></p>
</div>

What I need returned is "Title Of Book"

我需要返回的是“书名”

There are numerous instances of span class="title" but the one I need immediately precedes the only MMC_ID tag on the page, so I can use MMC_ID as a marker to get close to the span tag I need.

有许多 span class="title" 的实例,但我需要的实例紧跟在页面上唯一的 MMC_ID 标记之前,因此我可以使用 MMC_ID 作为标记来接近我需要的 span 标记。

Question: How can I say "Grab the contents of the very first span tag to the left of MMC_ID?

问题:我怎么说“抓取 MMC_ID 左侧第一个 span 标签的内容?

The below code works sometimes, but there is a variable number of span tags on the page so it fails when that deviation occurs.

下面的代码有时会起作用,但是页面上有可变数量的跨度标记,因此在发生这种偏差时它会失败。

With CreateObject("msxml2.xmlhttp")
    .Open "GET", ActiveCell.Offset(0, -1).Value, False
    .Send
    htm.body.innerhtml = .ResponseText
End With

ExtractedText = htm.getElementById("book-info").getElementsByTagName("span")(1).innerText

回答by ron

This should do it

这应该做

Text_1 = htm.getElementById("book-info").innerhtml
if instr(1, text_1, "MMC_ID ", vbTextCompare) > 0 then
   numb_spans = htm.getElementById("book-info").getElementsByTagName("span").length
   ExtractedText = htm.getElementById("book-info").getElementsByTagName("span")(-1 + numb_spans).innerText
else
end if

回答by Dick Kusleika

You could loop through all the spans and stop when the child of the next sibling of the next sibling is an anchor element and contains the proper text.

您可以遍历所有跨度并在下一个兄弟的下一个兄弟的子元素是锚元素并包含正确的文本时停止。

Sub test()

    Dim htm As HTMLDocument
    Dim ExtractedText As String
    Dim hSpan As HTMLSpanElement
    Dim hAnchor As HTMLAnchorElement

    Set htm = New HTMLDocument

    With CreateObject("msxml2.xmlhttp")
        .Open "GET", "file://///99991-dc01/99991/dkusleika/My%20Documents/test.html", False
        .Send
        htm.body.innerHTML = .ResponseText
    End With

    For Each hSpan In htm.getElementById("book-info").getElementsByTagName("span")
        On Error Resume Next
            Set hAnchor = hSpan.NextSibling.NextSibling.FirstChild
        On Error GoTo 0

        If Not hAnchor Is Nothing Then
            If InStr(1, hAnchor.href, "MMC_ID", vbTextCompare) > 0 Then
                ExtractedText = hSpan.innerText
                Exit For
            End If
        End If
    Next hSpan

    Debug.Print ExtractedText

End Sub

回答by Jean-Fran?ois Corbett

Is it always the lastspanelement? If so, just count how many elements

它总是最后一个span元素吗?如果是这样,只需计算多少个元素

htm.getElementById("book-info").getElementsByTagName("span")

returns and grab the last one.

返回并抓住最后一个。