在 VBA 中的 <tr> 或 <td> 标签内抓取 html 数据

Question

提问by kamelkid2

<tr>
    <td>Tanks:<br /><i>Lost:<br />Destroyed:</i></td>
    <td>750<br /><i>6<br />18</i></td>
</tr>
<tr>
    <td>Tanks:<br /><i>Lost:<br />Destroyed:</i></td>
    <td>750<br /><i>6<br />18</i></td>
</tr>

I am trying to scrape data from a website that has html structured like this within VBA. the value of interest that I want is "750" however it can sometimes be 0, 1,000,000, or any number in between so a set number of characters to extract wont work.

我正在尝试从一个在 VBA 中具有这样结构的 html 的网站抓取数据。我想要的兴趣值是“750”，但它有时可以是 0、1,000,000 或介于两者之间的任何数字，因此要提取的一组字符数将不起作用。

can anyone give some insight on the best way to scrape this? this is my code that will import all of the text as is, but the logic to post process and trim the data of interest is proving very difficult so i am looking for a nice clean way to scrape the 750 slot as is.

任何人都可以提供一些有关刮这个的最佳方法的见解吗？这是我的代码，将按原样导入所有文本，但后处理和修剪感兴趣的数据的逻辑被证明非常困难，因此我正在寻找一种很好的清洁方法来按原样刮掉 750 插槽。

Set elems = IE.document.getElementsByTagName("tr")
    For Each e In elems

        If e.innerText Like "Tanks:*" Then
            msgbox e
        End If

    next e

Answer 1

回答by Matteo NNZ

Within the row (tr), the content you want seems to be always in the second tdand it is the first content before the linebreak <br/>. The stable structure of your HTML seems to be:

在行 ( tr) 中，您想要的内容似乎总是在第二行中td，并且是 linebreak 之前的第一个内容<br/>。您的 HTML 的稳定结构似乎是：

<tr>
    <td>
    </td>

    <td> 'we look for the first stuff inside here, before the </br> comes
    </td>
</tr>

So, starting from your code:

所以，从你的代码开始：

Set elems = IE.document.getElementsByTagName("tr")
For Each e In elems

If e.innerText Like "Tanks:*" Then 'finding the right <tr>

    'get full HTML inside the <tr></tr>
     fullHTML = e.innerHTML

    'first step: parsing until the second <td> comes out...
    lookFor = "<td>"
    startPos = 8 'we can ignore the first 4, we know that <td> is not the one we look for    
    foundThis = Right(Left(fullHTML,startPos),4) 'store current 4 characters    
    Do While foundThis <> lookFor
        startPos = startPos + 1
        foundThis = Right(Left(fullHTML,startPos),4)
    Loop
    'once out, we can take the string starting from your 750 until the end
    remainingHTML = Right(Left(fullHTML,startPos+6),Len(fullHTML)-startPos)     
    'so now we parse until we encounter the "<" of the break row tag    
    myValue = ""
    startPos = 1
    newParse = Right(Left(remainingHTML,startPos),1)
    Do While newParse <> "<"
        myValue = myValue & newParse
        startPos = startPos + 1
        newParse = Right(Left(remainingHTML,startPos),1)
    Loop    

    MsgBox myValue 'here is your 750, 1,000,000 or whatever else

End If

Next e

Please note that the parsing would be much easier if you could reference a JavaScript library in your VBA project. In that case, you could just create a list of children:

请注意，如果您可以在 VBA 项目中引用 JavaScript 库，解析会容易得多。在这种情况下，您可以创建一个孩子列表：

If e.innerText Like "Tanks:*" Then
    puppies = e.children
    'puppies = ["<td></td>", "<td></td>"]
End If

Like this, you could directly parse the second element of the collection. NOTEthe code is not tested and might need to be revised in debug to make it working properly. This is just an idea of how you can structure your parsing.

像这样，您可以直接解析集合的第二个元素。注意代码未经测试，可能需要在调试中进行修改以使其正常工作。这只是关于如何构建解析的一个想法。

在 VBA 中的 <tr> 或 <td> 标签内抓取 html 数据

提问by kamelkid2

回答by Matteo NNZ

相关推荐

最近更新

标签

在 VBA 中的 <tr> 或 <td> 标签内抓取 html 数据

提问by kamelkid2

回答by Matteo NNZ

相关推荐

vba excel 在介绍中使用带有 html 链接的父邮件信封发送电子邮件

运行时错误 424 需要对象 - VBA 开始停止

vba 通过vba更改字段的默认值

使用字符串变量在 VBA 中设置对象变量？(Excel 2013)

相关推荐

最近更新

标签