VBA:如何在 <td> 标签中获取隐藏的 href

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22311579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-12 02:16:33  来源:igfitidea点击:

VBA: How to get a hidden href within <td> tag

vbaexcel-vbaweb-scrapingexcel

提问by Cessna

I'm trying to get the scores from the soccer world cups since 1930, in this link are the match ups listed and by clicking in the team a new window pop up with the summary I'm looking for. My problem is that in the HTML source code I don't see a <a>tag with the href inside of it as usually is, i'm trying to simulate a click event with the following VBA code:

我正在尝试获取自 1930 年以来足球世界杯的比分,在此链接中列出了比赛列表,单击球队后会弹出一个包含我正在查找的摘要的新窗口。我的问题是,在 HTML 源代码中,我没有<a>像往常一样看到带有 href的标签,我正在尝试使用以下 VBA 代码模拟点击事件:

Link: http://www.flashscore.com/soccer/world/world-cup-1990/results/

链接:http: //www.flashscore.com/soccer/world/world-cup-1990/results/

Vba code:

VBA 代码:

Sub Test()
  Dim URL As String
  Dim IE As InternetExplorer
  Dim HTMLdoc As HTMLDocument
  Dim TDelements As IHTMLElementCollection
  Dim TDelement As HTMLTableCell
  Dim i As Integer

URL = "http://www.flashscore.com/soccer/world/world-cup-1990/results/"

Set IE = New InternetExplorer

With IE
    .Navigate URL
    .Visible = True

    While .Busy Or .ReadyState <> READYSTATE_COMPLETE: DoEvents: Wend

        Set HTMLdoc = .Document
    End With

        Set TDelements = HTMLdoc.getElementsByTagName("td")

        For Each TDelement In TDelements

            If TDelement.Title = "Click for match detail!" Then
                TDelement.Click
            End If
        Next

   IE.Quit
End Sub

回答by Jerome Montino

I'd do this one differently, as what you're trying to achieve requires a massive overhead and might take you on a wild goose chase of links, actions, loops, and whatnot. Instead of emulating the actions, it's much better emulating the resultsof the actions.

我会以不同的方式做这件事,因为你想要实现的目标需要大量的开销,并且可能会让你对链接、动作、循环等进行疯狂的追逐。与其模仿动作,不如模仿动作的结果

Take for example the Germany - Argentinamatch. When you click on it, it opens this pop-up:

德国-阿根廷比赛为例。当你点击它时,它会打开这个弹出窗口:

enter image description here

在此处输入图片说明

Of particular note here is the URL of the pop-up. Intuitively enough, the URL has some weird combination that might possibly be a unique identifier. The first thing we check is the tag for that row in the table we just clicked.

这里需要特别注意的是弹出窗口的 URL。直观地说,URL 有一些奇怪的组合,可能是唯一的标识符。我们首先检查的是我们刚刚单击的表中该行的标记。

enter image description here

在此处输入图片说明

Check the highlighted part: it resembles the address in our pop-up closely, albeit there are four (4) extra characters at the beginning. Collapsing the row and cross-checking against other rows, we come to the conclusion that the rows use a similar format:

检查突出显示的部分:它与我们弹出窗口中的地址非常相似,尽管开头有四 (4) 个额外字符。折叠行并与其他行进行交叉检查,我们得出的结论是这些行使用了类似的格式:

enter image description here

在此处输入图片说明

Removing the g_1_from the idattribute of the trtag gives us the unique identifier for every match. I'll be taking the sanitized idfrom the Italy - Englandmatch and I will input it into the generic URL. We'll do this in another tab as the pop-up's address bar is locked:

从标签的属性中删除g_1_为我们提供了每个匹配项的唯一标识符。我将从意大利 - 英格兰比赛中提取经过消毒的内容,并将其输入到通用 URL 中。我们将在另一个选项卡中执行此操作,因为弹出窗口的地址栏已锁定:idtrid

enter image description here

在此处输入图片说明

Voila. The page is open in another tab, and from there you can get what you want from the match summary. You can even check the head-to-head by changing #match-summaryto #h2h;overall.

瞧。该页面在另一个选项卡中打开,从那里您可以从比赛摘要中获得您想要的内容。您甚至可以通过更改#match-summary为 来检查正面交锋#h2h;overall

enter image description here

在此处输入图片说明

Why does this matter?

为什么这很重要?

Because it's safer. Since you're using Excel, you have the sheets at your control. The safest way is to record all these idattributes from each row into, say, a column. Then loop over that column, concatenating it properly to produce the target URLs.

因为这样更安全。由于您使用的是 Excel,因此您可以控制工作表。最安全的方法是将id每一行的所有这些属性记录到一个列中。然后遍历该列,将其正确连接以生成目标 URL。

Then you loop over the URLs and get what you want from the EXACTpages you're looking for. There's no need anymore for the original link/table at this point since almost all the details are in the "summary" pages anyway.

然后您遍历 URL 并从您正在寻找的EXACT页面中获取您想要的内容。此时不再需要原始链接/表格,因为几乎所有的细节都在“摘要”页面中。

Hope this helps and good luck.

希望这会有所帮助并祝你好运。

Follow-up Edit

后续编辑

As I feel a bit guilty not posting any code, here's my attempt at producing the idvalues for the World Cup section. Note that I took some liberties with the code, so modify it to suit your approach better. I will only be providing the way to get the idvalues. All else is up to you.

由于我对没有发布任何代码感到有点内疚,这是我尝试id为世界杯部分生成值的尝试。请注意,我对代码使用了一些自由,因此请对其进行修改以更好地适应您的方法。我只会提供获取id值的方法。其他一切都取决于你。

Sub Test()

    Dim URL As String
    Dim IE As New InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim dictObj As Object: Set dictObj = CreateObject("Scripting.Dictionary")
    Dim tRowID As String

    URL = "http://www.flashscore.com/soccer/world/world-cup-1990/results/"

    With IE
        .Navigate URL
        .Visible = True
        Do Until .ReadyState = READYSTATE_COMPLETE: DoEvents: Loop
        Set HTMLdoc = .Document
    End With

    With HTMLdoc
        'Target the set of tables.
        Set tblSet = .getElementById("fs-results")
        'Index starts at 0. World Cup Playoffs is (0), World Cup is (1),
        'Qualifiers is (2), etc... Targeting tbody right away, as there is
        'only one (1) tbody tag per row anyway.
        Set mTbl = tblSet.getElementsByTagName("tbody")(1)
        Set tRows = mTbl.getElementsByTagName("tr")
        With dictObj
            'If if value is not yet in dictionary, store it.
            For Each tRow In tRows
                'Remove the first four (4) characters.
                tRowID = Mid(tRow.getAttribute("id"), 5)
                If Not .Exists(tRowID) Then
                    .Add tRowID, Empty
                End If
            Next tRow
        End With
    End With

    'Print out each of the ID values.
    For Each Key In dictObj
        Debug.Print Key
    Next Key

    Set IE = Nothing

End Sub

Which returns the following idvalues for the World Cup table (not the Playoffs!):

它返回id世界杯表(不是季后赛!)的以下值:

04LfR6G7
6RNojjg8
nww8zBgL
t0xCZU8R
dKY0Pgu3
jsY4ODf9
K4YcDdti
2qM7lD3j
bXvdGk2k
tdU0F9He
65hR8YfM
lUsW7hAS
jBqaxXO8
YZY3yivF
n1HjSn11
hvInTSoe
rHUdQZPd
xfS51jmd
ncyhHVnq
4b34kglp
O0kz7LJH
lpQE9EuG
UHlQLBvS
fNQAAfQ9
CKpewD92
EmJrU8Wl
KnT4ETW1
nRnnVlHr
b3T12WXk
jwUc3CIq
n9RDMiPL
pAjW714B
GznS8sk5
GWR6BzB3
UXW8NX9F
beX1CGec

Hope this helps you some more. Kindly make sure you read the comments. Good luck!

希望这对你有更多帮助。请确保您阅读评论。祝你好运!