VBA - XMLHTTP 和 WinHttp 请求速度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41523223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-12 11:50:03  来源:igfitidea点击:

VBA - XMLHTTP and WinHttp request speed

excelvbaweb-scrapingxmlhttprequest

提问by Ryszard J?draszyk

Below are declared variables for 3 requests which I implement in my macros. I listed libraries they use and their late bindings in comments:

下面是我在宏中实现的 3 个请求的声明变量。我在评论中列出了他们使用的库及其后期绑定:

Dim XMLHTTP As New MSXML2.XMLHTTP 'Microsoft XML, v6.0 'Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim ServerXMLHTTP As New MSXML2.ServerXMLHTTP 'Microsoft XML, v6.0 'Set ServerXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim http As New WinHttpRequest 'Microsoft WinHttp Services, version 5.1 'Set http = CreateObject("WinHttp.WinHttpRequest.5.1")

I have a few old web scraping macros which used Internet Explorer automation. I wanted to clean coding and speed them up with these requests.

我有一些使用 Internet Explorer 自动化的旧网页抓取宏。我想清理编码并通过这些请求加快速度。

Unfortunately what I have noticed, MSXML2.ServerXMLHTTPand WinHttpRequestare slower on online store's 20 products test (34 and 35 sec) than IE automation with pictures and active scripting off (24 sec)! MSXML2.XMLHTTPexecutes in 18 secs. I used to see situations when some out of these 3 requests are 2-3 times faster / slower than the other ones, so I always test which one performs best, but never before had any request lost to IE automation.

不幸的是我已经注意到,MSXML2.ServerXMLHTTPWinHttpRequest较慢的在线商店的20种产品测试(34和35秒)比IE自动化图片和活动脚本关闭(24秒)!MSXML2.XMLHTTP在 18 秒内执行。我曾经看到过这 3 个请求中的一些请求比其他请求快 2-3 倍/慢的情况,所以我总是测试哪个表现最好,但以前从未有任何请求丢失到 IE 自动化。

The main page with results is below, it's all results on one page, 1500+ of them, so request takes some time (6500 pages if pasted to MS Word):

带有结果的主页在下面,所有结果都在一页上,其中 1500 多个,因此请求需要一些时间(如果粘贴到 MS Word,则为 6500 页):

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

Then I open individual links from main result page:

然后我从主结果页面打开各个链接:

http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat--mcbc7/24317/

http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat--mcbc7/24317/

I would like to know if these 3 requests are all options I have to get data from websites without browser automation. Also - how possibly browser automation can beat some of these requests?

我想知道这 3 个请求是否都是我必须在没有浏览器自动化的情况下从网站获取数据的选项。另外 - 浏览器自动化如何击败其中一些请求?

UPDATE

更新

I have tested the main result page with procedure provided in answer by Robin Mackenzie, clearing IE cache before running it. At least on this particular page, caching seemed to have no explicit gain, as subsequent requests yielded a similar result. IE had active scripting disabled and no images loading.

我已经使用 Robin Mackenzie 在回答中提供的过程测试了主要结果页面,在运行之前清除了 IE 缓存。至少在这个特定的页面上,缓存似乎没有明显的好处,因为随后的请求产生了类似的结果。IE 禁用了活动脚本并且没有图像加载。

IE automation method, Document length: 7593346 chars, Processed in: 8 seconds

WinHTTP method,  Document length: 7824059 chars, Processed in: 29 seconds

XML HTTP method, Document length: 7830217 chars, Processed in: 4 seconds

Server XML HTTP method, Document length: 7823958 chars, Processed in: 26 seconds

URL download file method, Document length: 7830346 chars, Processed in: 7 seconds

Very surprising for me is the difference in amount of characters returned by these methods.

对我来说非常令人惊讶的是这些方法返回的字符数量的差异。

采纳答案by Robin Mackenzie

In addition to the methods you've mentioned:

除了你提到的方法:

  • IE automation
  • WinHTTPRequest
  • XMLHTTP
  • ServerXMLHTTP
  • IE自动化
  • WinHTTP请求
  • XMLHTTP
  • 服务器XMLHTTP

There are 2 other methods you can think about:

您可以考虑另外两种方法:

  • Using the CreateDocumentFromUrlmethod of the MSHTML.HTMLDocumentobject
  • Using the Windows API function URLDownloadToFileA
  • 使用对象的CreateDocumentFromUrl方法MSHTML.HTMLDocument
  • 使用 Windows API 函数 URLDownloadToFileA

There are some other Windows APIs that I am ignoring such as InternetOpen, InternetOpenUrletc as potential performance will be outweighed by complexity of guess the response length, buffering the response, and so forth.

还有一些其他的Windows API我忽略了,如InternetOpenInternetOpenUrl等潜在的性能将通过猜测响应长度,缓冲响应,等等的复杂性所抵消。

CreateDocumentFromUrl

从 URL 创建文档

With the CreateDocumentFromUrlmethod it is a problem with your sample website because it attempts to create a HTMLDocumentin an frame which is not allowed with errors such as:

使用该CreateDocumentFromUrl方法,您的示例网站存在问题,因为它尝试HTMLDocument在一个框架中创建一个不允许的错误,例如:

Framing Forbidden

禁止装帧

and

To help protect the security of information you enter into this website, the publisher of this content does not allow it to be displayed in a frame.

为帮助保护您输入到本网站的信息的安全性,此内容的发布者不允许将其显示在框架中。

So we should not use this method.

所以我们不应该使用这种方法。

URLDownloadToFileA

网址下载到文件A

I thought you need the phpequivalent of file_get_contentsand found this method. It is easily used (check this link) and out-performs the other methods when used on a large request (e.g. try it when you go for >2000 baseball bats). The XMLHTTPalso method uses the URLMonlibrary so I guess this way is just cutting out a bit of middle-man logic and obviously there's a downside because you have to do some file system handling.

我认为您需要等效的phpfile_get_contents并找到了此方法。它很容易使用(检查此链接),并且在处理大型请求时性能优于其他方法(例如,当您购买超过 2000 个棒球棒时尝试使用它)。该XMLHTTP还的方法使用URLMon图书馆,所以我想这种方式只是切出位中间人逻辑,因为你必须做一些文件系统处理显然有不利的一面。

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

Sub TestUrlDownloadFile(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strTempFileName As String
    Dim strResponse As String
    Dim objFso As FileSystemObject

    On Error GoTo ExitFunction

    dteStart = Now
    strTempFileName = "D:\foo.txt"
    DownloadFile strUrl, strTempFileName
    Set objFso = New FileSystemObject
    With objFso.OpenTextFile(strTempFileName, ForReading)
        strResponse = .ReadAll
        .Close
    End With
    objFso.DeleteFile strTempFileName
    dteFinish = Now

    Debug.Print "URL download file method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
  Dim lngRetVal As Long
  lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
  If lngRetVal = 0 Then DownloadFile = True
End Function

With the URLDownloadToFileAit is taking me about 1-2 seconds to download you sample URL versus 4-5 seconds with the XMLHTTPmethod (full code below).

使用该方法URLDownloadToFileA下载示例 URL 需要大约 1-2 秒,而使用该XMLHTTP方法需要4-5 秒(完整代码如下)。

The URL:

网址:

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

This is the output:

这是输出:

Testing...


XML HTTP method
Document length: 7869753 chars
Processed in: 4 seconds


URL download file method
Document length: 7869753 chars
Processed in: 1 seconds

Code

代码

This includes all methods discussed e.g. IE automation, WinHTTPRequest, XMLHTTP, ServerXMLHTTP, CreateDocumentFromURL and URLDownloadFile.

这包括讨论的所有方法,例如 IE 自动化、WinHTTPRequest、XMLHTTP、ServerXMLHTTP、CreateDocumentFromURL 和 URLDownloadFile。

You need all these references in project:

您需要项目中的所有这些参考:

enter image description here

在此处输入图片说明

Here it is:

这里是:

Option Explicit

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

Sub Test()

    Dim strUrl As String

    strUrl = "http://www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400"

    Debug.Print "Testing..."
    Debug.Print VBA.vbNewLine

    'TestIE strUrl
    'TestWinHHTP strUrl
    TestXMLHTTP strUrl
    'TestServerXMLHTTP strUrl
    'TestCreateDocumentFromUrl strUrl
    TestUrlDownloadFile strUrl

End Sub

Sub TestIE(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objIe As InternetExplorer
    Dim objHtml As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objIe = New SHDocVw.InternetExplorer
    With objIe
        .navigate strUrl
        .Visible = False
        While .Busy Or .readyState <> READYSTATE_COMPLETE
           DoEvents
        Wend
        Set objHtml = .document
        strResponse = objHtml.DocumentElement.outerHTML
        .Quit
    End With
    dteFinish = Now

    Debug.Print "IE automation method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    If Not objIe Is Nothing Then
        objIe.Quit
    End If
    Set objIe = Nothing

End Sub

Sub TestWinHHTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objHttp As WinHttp.WinHttpRequest
    Dim objDoc As HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objHttp = New WinHttp.WinHttpRequest
    With objHttp
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        .WaitForResponse
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "WinHTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objHttp = Nothing

End Sub

Sub TestXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.XMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.XMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestServerXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.ServerXMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.ServerXMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "Server XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestUrlDownloadFile(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strTempFileName As String
    Dim strResponse As String
    Dim objFso As FileSystemObject

    On Error GoTo ExitFunction

    dteStart = Now
    strTempFileName = "D:\foo.txt"
    If DownloadFile(strUrl, strTempFileName) Then
        Set objFso = New FileSystemObject
        With objFso.OpenTextFile(strTempFileName, ForReading)
            strResponse = .ReadAll
            .Close
        End With
        objFso.DeleteFile strTempFileName
    Else
        Debug.Print "Error downloading file from URL: " & strUrl
        GoTo ExitFunction
    End If
    dteFinish = Now

    Debug.Print "URL download file method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
    Dim lngRetVal As Long
    lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
    If lngRetVal = 0 Then
        DownloadFile = True
    Else
        DownloadFile = False
    End If
End Function

Sub TestCreateDocumentFromUrl(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strResponse As String
    Dim objDoc1 As HTMLDocument
    Dim objDoc2 As HTMLDocument

    On Error GoTo ExitFunction

    dteStart = Now
    Set objDoc1 = New HTMLDocument
    Set objDoc2 = objDoc1.createDocumentFromUrl(strUrl, "null")
    While objDoc2.readyState <> "complete"
        DoEvents
    Wend
    strResponse = objDoc2.DocumentElement.outerHTML
    Debug.Print strResponse
    dteFinish = Now

    Debug.Print "HTML Document Create from URL method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc2 = Nothing
    Set objDoc1 = Nothing

End Sub

回答by Florent B.

Most of the time is spent waiting for a response from the server. So if you want improve the execution time, then send the requests in parallel.

大部分时间都花在等待服务器的响应上。因此,如果您想缩短执行时间,请并行发送请求。

I would also use the "Msxml2.ServerXMLHTTP.6.0" object/interface since it doesn't implement any caching.

我还会使用“Msxml2.ServerXMLHTTP.6.0”对象/接口,因为它没有实现任何缓存。

Here's a working example:

这是一个工作示例:

Sub TestRequests()
  GetUrls _
    "http://stackoverflow.com/questions/34880012", _
    "http://stackoverflow.com/questions/34880013", _
    "http://stackoverflow.com/questions/34880014", _
    "http://stackoverflow.com/questions/34880015", _
    "http://stackoverflow.com/questions/34880016", _
    "http://stackoverflow.com/questions/34880017"

End Sub

Private Sub OnRequest(url, xhr)
  xhr.Open "GET", url, True
  xhr.setRequestHeader "Content-Type", "text/html; charset=UTF-8"
  xhr.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
  xhr.Send
End Sub

Private Sub OnResponse(url, xhr)
  Debug.Print url, Len(xhr.ResponseText)
End Sub

Public Function GetUrls(ParamArray urls())
    Const WORKERS = 10

    ' create http workers
    Dim wkrs(0 To WORKERS * 2 - 1), i As Integer
    For i = 0 To UBound(wkrs) Step 2
      Set wkrs(i) = CreateObject("Msxml2.ServerXMLHTTP.6.0")
    Next

    ' send the requests in parallele
    Dim index As Integer, count As Integer, xhr As Object
    While count <= UBound(urls)
      For i = 0 To UBound(wkrs) Step 2
        Set xhr = wkrs(i)

        If xhr.readyState And 3 Then  ' if busy
          xhr.waitForResponse 0.01    ' wait 10ms
        ElseIf Not VBA.IsEmpty(wkrs(i + 1)) And xhr.readyState = 4 Then
          OnResponse urls(wkrs(i + 1)), xhr
          count = count + 1
          wkrs(i + 1) = Empty
        End If

        If VBA.IsEmpty(wkrs(i + 1)) And index <= UBound(urls) Then
          wkrs(i + 1) = index
          OnRequest urls(index), xhr
          index = index + 1
        End If
      Next
    Wend
End Function