从网站上的列表中获取数据以提高 VBA

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19308522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 23:46:29  来源:igfitidea点击:

Get data from listings on a website to excel VBA

excelvbaexcel-vba

提问by facebook-100000733797497

I am trying to find a way to get the data from yelp.com

我试图找到一种从 yelp.com 获取数据的方法

I have a spreadsheet on which there are several keywords and locations. I am looking to extract data from yelp listings based on these keywords and locations already in my spreadsheet.

我有一个电子表格,上面有几个关键字和位置。我希望根据电子表格中已有的这些关键字和位置从 yelp 列表中提取数据。

I have created the following code, but it seems to get absurd data and not the exact information I am looking for.

我创建了以下代码,但它似乎得到了荒谬的数据,而不是我正在寻找的确切信息。

I want to get business name, address and phone number, but all I am getting is nothing. If anyone here could help me solve this problem.

我想得到公司名称、地址和电话号码,但我得到的只是一无所有。如果这里有人可以帮我解决这个问题。

Sub find()

Dim ie As Object
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        ie.Visible = False
        ie.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
        ' Don't show window
    ie.Visible = False

    'Wait until IE is done loading page
    Do While ie.Busy
        Application.StatusBar = "Downloading information, lease wait..."
        DoEvents
    Loop

    ' Make a string from IE content
    Set mDoc = ie.Document
    peopleData = mDoc.body.innerText
    ActiveSheet.Cells(1, 1).Value = peopleData
End With

peopleData = "" 'Nothing
Set mDoc = Nothing
End Sub

回答by David Zemens

If you right click in IE, and do View Source, it is apparent that the data served on the site is not part of the document's .Body.innerTextproperty. I notice this is often the case with dynamically served data, and that approach is really too simple for most web-scraping.

如果您在 IE 中右键单击并执行View Source,很明显站点上提供的数据不是文档.Body.innerText属性的一部分。我注意到动态提供的数据经常出现这种情况,而且这种方法对于大多数网络抓取来说真的太简单了。

I open it in Google Chrome and inspect the elements to get an idea of what I'm really looking for, and how to find it using a DOM/HTML parser; you will need to add a reference to Microsoft HTML Object Library.

我在谷歌浏览器中打开它并检查元素以了解我真正在寻找什么,以及如何使用 DOM/HTML 解析器找到它;您需要添加对 Microsoft HTML 对象库的引用。

enter image description here

在此处输入图片说明

I think you can get it to return a collection of the <DIV>tags, and then check those for the classname with an Ifstatment inside the loop.

我认为你可以让它返回一组<DIV>标签,然后用If循环内的语句检查类名。

I made some revisions to my original answer, this should print each record in a new cell:

我对原始答案进行了一些修改,这应该在新单元格中打印每条记录:

Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library 
'  and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
    Set ie = CreateObject("InternetExplorer.Application")
    With ie
        .Visible = False
        .Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
        ' Don't show window
        'Wait until IE is done loading page
        Do While .readyState <> 4
            Application.StatusBar = "Downloading information, Please wait..."
            DoEvents
            Sleep 200
        Loop
        Set html = .Document
    End With
    Set Listings = html.getElementsByTagName("LI") ' ## returns the list
    For Each l In Listings
        '## make sure this list item looks like the listings Div Class:
        '   then, build the string to put in your cell
        If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
            Range("A1").Offset(r, 0).Value = l.innerText
            r = r + 1
        End If
    Next

Set html = Nothing
Set ie = Nothing
End Sub