在 Excel VBA 中使用 MSXML2.XMLHTTP 从网站中提取大量文本数据

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22125321/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 17:53:49  来源:igfitidea点击:

Using MSXML2.XMLHTTP in Excel VBA to extract large amounts of text data from website

excelvbamsxmlhtml-content-extraction

提问by sinhars82

I am trying to download historical stock price data from finance.yahoo.com for 1000s of stocks. The website only displays 60 days of data on a single page so I have to loop through the time period that I am downloading for along with the loop for different securities. I have used the following code to access the relevant page.

我正在尝试从 Finance.yahoo.com 下载 1000 只股票的历史股价数据。该网站仅在单个页面上显示 60 天的数据,因此我必须遍历我正在下载的时间段以及不同证券的循环。我已使用以下代码访问相关页面。

Set httpObject = CreateObject("MSXML2.XMLHTTP")
httpObject.Open "GET", url, False
httpObject.send
webObject.body.innerHTML = httpObject.responseText

This works like a charm for 99% of the data that I access. But every now and then, the object returned by the website turns out to be empty, even though the exact same URL will show me the correct data in Internet Explorer. If I rerun the code for just that URL, it may or may not fail the next time.

对于我访问的 99% 的数据来说,这就像一个魅力。但有时,网站返回的对象结果是空的,即使完全相同的 URL 会在 Internet Explorer 中向我显示正确的数据。如果我只为那个 URL 重新运行代码,它下次可能会也可能不会失败。

I would like to know if there is a more stable/consistent way of using the above method. I had previously implemented the InternetExplorer.Application method to download data but found that to be much slower and cumbersome.

我想知道是否有更稳定/一致的使用上述方法的方法。我之前已经实现了 InternetExplorer.Application 方法来下载数据,但发现它要慢得多而且麻烦。

采纳答案by Dick Kusleika

Check the readystate.

检查就绪状态。

httpObject.ReadyState = 4

that means it's done loading. Then you can check the status

这意味着它已完成加载。然后你可以检查状态

httpObject.Status = 200

that means you didn't mess up the URL. If readystate is 4 and status is 200, then you can read the responseText.

这意味着您没有弄乱 URL。如果 readystate 为 4 且 status 为 200,则您可以读取 responseText。

回答by Paul

I had the same issue, getting an empty repsonse, with .Status = 0(i.e. Ok). My issue it turned out was that the http when you go in the browser redirects to https. When I changed it in my request it was fine.

我有同样的问题,得到一个空的回复,.Status = 0(即好的)。我的问题原来是当你进入浏览器时 http 重定向到 https。当我在请求中更改它时,它很好。

For a generic function, you could try http and if you get .Status = 0and .responseText = "", then repeat but with https.

对于通用函数,您可以尝试 http ,如果您得到.Status = 0and .responseText = "",则重复但使用 https 。