使用 Excel-VBA 获取 HTML 源代码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2520949/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 10:20:57  来源:igfitidea点击:

Getting HTML Source with Excel-VBA

stringurlvbaexcel-vbaexcel

提问by l--''''''---------''''''''''''

I would like to direct an excel VBA form to certain URLs, get the HTML source and store that resource in a string. Is this possible, and if so, how do I do it?

我想将一个 excel VBA 表单定向到某些 URL,获取 HTML 源并将该资源存储在一个字符串中。这可能吗,如果可以,我该怎么做?

回答by Gary McGill

Yes. One way to do it is to use the MSXMLDLL - and to do that you need to add a reference to the Microsoft XMLlibrary via Tools->References.

是的。一种方法是使用MSXMLDLL - 为此,您需要Microsoft XML通过Tools->References添加对库的引用

Here's some code that displays the content of a given URL:

下面是一些显示给定 URL 内容的代码:

Public Sub ShowHTML(ByVal strURL)
    On Error GoTo ErrorHandler
    Dim strError As String
    strError = ""
    Dim oXMLHTTP As MSXML2.XMLHTTP
    Set oXMLHTTP = New MSXML2.XMLHTTP
    Dim strResponse As String
    strResponse = ""

    With oXMLHTTP
        .Open "GET", strURL, False
        .send ""
        If .Status <> 200 Then
            strError = .statusText
            GoTo CleanUpAndExit
        Else
            If .getResponseHeader("Content-type") <> "text/html" Then
                strError = "Not an HTML file"
                GoTo CleanUpAndExit
            Else
                strResponse = .responseText
            End If
        End If
    End With

CleanUpAndExit:
    On Error Resume Next ' Avoid recursive call to error handler
    ' Clean up code goes here
    Set oXMLHTTP = Nothing
    If Len(strError) > 0 Then ' Report any error
        MsgBox strError
    Else
        MsgBox strResponse
    End If
    Exit Sub
ErrorHandler:
    strError = Err.Description
    Resume CleanUpAndExit
End Sub

回答by OneOfTheUnemployed

Just an addition to the above response. The question was how to get the HTML source which the stated answer does not actually provide.

只是对上述响应的补充。问题是如何获取所述答案实际上并未提供的 HTML 源代码。

Compare the contents of oXMLHTTP.responseText with the source code in a browser for URL "http://finance.yahoo.com/q/op?s=T+Options". They do not match and even the returned values are different. (This should be executed after hours to avoid changes during the trading day.)

在浏览器中将 oXMLHTTP.responseText 的内容与 URL“http://finance.yahoo.com/q/op?s=T+Options”的源代码进行比较。它们不匹配,甚至返回的值也不同。(这应该在下班后执行,以避免交易日发生变化。)

If I find a way to perform this task the basic code will be posted.

如果我找到执行此任务的方法,将发布基本代码。

回答by ashleedawg

Compact getHTTPfunction

紧凑的getHTTP功能

Below is a compact & generic function that will return HTTP response from a specified URL to, for example:

下面是一个紧凑的通用函数,它将从指定的 URL 返回 HTTP 响应,例如:

  • return the HTMLSource of a web page,
  • JSONresponse from an API URL,
  • parse a text file at a URL, etc.
  • 返回HTML网页的来源,
  • JSON来自 API URL 的响应,
  • 在 URL 等处解析文本文件。

This does notrequire any VBA References since MSXML2is used as a late-bound object.

这并没有要求任何VBA参考,因为MSXML2作为一个后期绑定对象。

Public Function getHTTP(ByVal url As String) As String
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", url, False: .Send
        getHTTP = StrConv(.responseBody, vbUnicode)
    End With
End Function

Note that this basic function has no validation or error handling, as those are the parts that can vary considerably depending on which URL you're hitting.

请注意,此基本功能没有验证或错误处理,因为这些部分可能会因您点击的 URL 的不同而有很大差异。

If desired, check the value of .Statusafter the .Send) to check for success codes like 0or 200, and also you can setup an error trap with On Error Goto...(never Resume Next!)

如果需要,检查).Status之后的值.Send以检查成功代码,例如0200,并且您还可以设置错误陷阱On Error Goto...(从不Resume Next!)



Example Usage:

示例用法:

This procedure scrapes thisStack Overflow page for the current score of thisquestion.

此过程会抓取Stack Overflow 页面以获得问题的当前分数。

Sub demo_getVoteCount()
    Const answerID$ = 2522760
    Const url_SO = "https://stackoverflow.com/a/" & answerID
    Dim html As String, startPos As Long, voteCount As Variant

    html = getHTTP(url_SO)                                  'get html from url

    startPos = InStr(html, "answerid=""" & answerID)        'locate this answer
    startPos = InStr(startPos, html, "vote-count-post")     'locate vote count
    startPos = InStr(startPos, html, ">") + 1               'locate value

    voteCount=Mid(html,startPos,InStr(startPos,html,"<")-startPos) 'extract score
    MsgBox "Answer #" & answerID & " has a score of " & voteCount & "."
End Sub

Of course in reality there are far better ways to get the score of an answer than the example above, such as thisway.)

当然,实际上有比上面的例子更好的方法来获得答案的分数,比如这种方式。)