如何提高 VBA 中 XML 解析的速度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5626653/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 12:59:52  来源:igfitidea点击:

How can I improve the speed of XML parsing in VBA

xmlperformanceexcelvba

提问by Doug S.

I have a large XML file that needs parsed in VBA (excel 2003 & 2007). There could be upwards of 11,000 'rows' of data in the xml file with each 'row' having between 10 and 20 'columns'. This ends up being a huge task just to parse through and grab the data (5 - 7 minutes). I tried reading the xml and placing each 'row' into a dictionary (key = row number, value = Row Attributes), but this takes just as long.

我有一个需要在 VBA 中解析的大型 XML 文件(excel 2003 & 2007)。xml 文件中可能有多达 11,000 个“行”的数据,每个“行”有 10 到 20 个“列”。这最终是一项艰巨的任务,只是解析和获取数据(5 - 7 分钟)。我尝试读取 xml 并将每个“行”放入字典(键 = 行号,值 = 行属性),但这需要同样长的时间。

It is taking forever to traverse the DOM. Is there a more efficient way?

遍历 DOM 需要很长时间。有没有更有效的方法?

Dim XMLDict
    Sub ParseXML(ByRef RootNode As IXMLDOMNode)
        Dim Counter As Long
        Dim RowList As IXMLDOMNodeList
        Dim ColumnList As IXMLDOMNodeList
        Dim RowNode As IXMLDOMNode
        Dim ColumnNode As IXMLDOMNode
        Counter = 1
        Set RowList = RootNode.SelectNodes("Row")

        For Each RowNode In RowList
            Set ColumnList = RowNode.SelectNodes("Col")
            Dim NodeValues As String
            For Each ColumnNode In ColumnList
                NodeValues = NodeValues & "|" & ColumnNode.Attributes.getNamedItem("id").Text & ":" & ColumnNode.Text
            Next ColumnNode
            XMLDICT.Add Counter, NodeValues
            Counter = Counter + 1
        Next RowNode
    End Sub

回答by barrowc

You could try using SAX instead of DOM. SAX should be faster when all you are doing is parsing the document and the document is non-trivial in size. The reference for the SAX2 implementation in MSXML is here

您可以尝试使用 SAX 而不是 DOM。当您所做的只是解析文档并且文档的大小非常重要时,SAX 应该会更快。MSXML 中 SAX2 实现的参考在这里

I typically reach straight for the DOM for most XML parsing in Excel but SAX seems to have advantages in some situations. The short comparison heremight help to explain the differences between them.

对于 Excel 中的大多数 XML 解析,我通常直接使用 DOM,但 SAX 在某些情况下似乎具有优势。这里的简短比较可能有助于解释它们之间的差异。

Here's a hacked-together example (partially based on this) just using Debug.Printfor output:

这是一个仅用于输出的 hacked-together 示例(部分基于thisDebug.Print

Add a reference to "Microsoft XML, v6.0" via Tools > References

通过工具 > 引用添加对“Microsoft XML, v6.0”的引用

Add this code in a normal module

将此代码添加到普通模块中

Option Explicit

Sub main()

Dim saxReader As SAXXMLReader60
Dim saxhandler As ContentHandlerImpl

Set saxReader = New SAXXMLReader60
Set saxhandler = New ContentHandlerImpl

Set saxReader.contentHandler = saxhandler
saxReader.parseURL "file://C:\Users\foo\Desktop\bar.xml"

Set saxReader = Nothing

End Sub

Add a class module, call it ContentHandlerImpland add the following code

添加一个类模块,调用它ContentHandlerImpl并添加以下代码

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Use the left-hand drop-down at the top of the module to choose "IVBSAXContentHandler" and then use the right-hand drop-down to add stubs for each event in turn (from charactersto startPrefixMapping)

使用模块顶部的左侧下拉菜单选择“IVBSAXContentHandler”,然后使用右侧下拉菜单依次为每个事件添加存根(从charactersstartPrefixMapping

Add code to some of the stubs as follows

将代码添加到一些存根中,如下所示

Explicitly set up the counter and the flag to show if we want to read text data at this time

显式设置计数器和标志来显示此时是否要读取文本数据

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

Every time a new element starts, check the name of the element and take appropriate action

每次启动新元素时,检查元素的名称并采取适当的措施

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub

Check to see if we are interested in the text data and, if we are, chop off any extraneous white space and remove all line feeds (this may or may not be desirable depending on the document you are trying to parse)

检查我们是否对文本数据感兴趣,如果我们感兴趣,切掉任何多余的空白并删除所有换行符(这可能是也可能不是理想的,这取决于您尝试解析的文档)

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

If we have reached the end of a Colthen stop reading the text values; if we have reached the end of a Rowthen print out the string of node values

如果我们已经到达 a 的末尾,Col则停止读取文本值;如果我们已经到达 a 的末尾,Row则打印出节点值的字符串

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub


To make things clearer, here is the full version of ContentHandlerImplwith al of the stub methods in place:

为了让事情更清楚,这里是包含所有ContentHandlerImpl存根方法的完整版本:

Option Explicit

Implements IVBSAXContentHandler

Private lCounter As Long
Private sNodeValues As String
Private bGetChars As Boolean

Private Sub IVBSAXContentHandler_characters(strChars As String)

If (bGetChars) Then
    sNodeValues = sNodeValues & Replace(Trim$(strChars), vbLf, "")
End If

End Sub

Private Property Set IVBSAXContentHandler_documentLocator(ByVal RHS As MSXML2.IVBSAXLocator)

End Property

Private Sub IVBSAXContentHandler_endDocument()

End Sub

Private Sub IVBSAXContentHandler_endElement(strNamespaceURI As String, strLocalName As String, strQName As String)

Select Case strLocalName
    Case "Col"
        bGetChars = False
    Case "Row"
        lCounter = lCounter + 1
        Debug.Print lCounter & " " & sNodeValues
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_endPrefixMapping(strPrefix As String)

End Sub

Private Sub IVBSAXContentHandler_ignorableWhitespace(strChars As String)

End Sub

Private Sub IVBSAXContentHandler_processingInstruction(strTarget As String, strData As String)

End Sub

Private Sub IVBSAXContentHandler_skippedEntity(strName As String)

End Sub

Private Sub IVBSAXContentHandler_startDocument()

lCounter = 0
bGetChars = False

End Sub

Private Sub IVBSAXContentHandler_startElement(strNamespaceURI As String, strLocalName As String, strQName As String, ByVal oAttributes As MSXML2.IVBSAXAttributes)

Select Case strLocalName
    Case "Row"
        sNodeValues = ""
    Case "Col"
        sNodeValues = sNodeValues & "|" & oAttributes.getValueFromName(strNamespaceURI, "id") & ":"
        bGetChars = True
    Case Else
        ' do nothing
End Select

End Sub

Private Sub IVBSAXContentHandler_startPrefixMapping(strPrefix As String, strURI As String)

End Sub

回答by Alain

Use the SelectSingleNodefunction. This will let you search for a node based on pattern matching.

使用该SelectSingleNode功能。这将允许您根据模式匹配搜索节点。

For instance, I created the following function:

例如,我创建了以下函数:

Private Function getXMLNodeValue(ByRef xmlDoc As MSXML2.DOMDocument, ByVal xmlPath As String)
    Dim node As IXMLDOMNode
    Set node = xmlDoc.SelectSingleNode(xmlPath)
    If node Is Nothing Then getXMLNodeValue = vbNullString Else getXMLNodeValue = node.Text
End Function

Now, if I have the following XML file: An XML Response

现在,如果我有以下 XML 文件: XML 响应

I can simply call:

我可以简单地调用:

myValue = getXMLNodeValue(xmlResult, "//ErrorStatus/Source")

and it will jump through to the first key called 'Error Status' at any depth, and pull out the text in the 'Source' node - returning "INTEGRATION"

它将跳转到任何深度的第一个名为“错误状态”的键,并拉出“源”节点中的文本 - 返回“INTEGRATION”