在 Excel Vba 中解析字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40832549/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-12 11:37:16  来源:igfitidea点击:

Parse a String in Excel Vba

regexvbaexcel-vbadictionaryexcel

提问by MeSS83

I have a macro that send an XMLHTTP request to a server and it gets as response a plain textstring, not a JSON format string or other standard formats (at least for what I know).

我有一个向服务器发送 XMLHTTP 请求的宏,它得到一个纯文本字符串作为响应,而不是 JSON 格式字符串或其他标准格式(至少就我所知)。

I would like to parse the output string in order to access the data in an structured approach in the same fashion as the parseJsonsubroutine in this link

我想解析输出字符串,以便以与此链接中parseJson子例程相同的方式以结构化方法访问数据

My problem is I am not good with regular expressions and I am not able to modify the routine for my needs.

我的问题是我不擅长正则表达式,我无法根据需要修改例程。

The string that I need to parse has the following structure:

我需要解析的字符串具有以下结构:

  1. The string is a single line
  2. Each single parameter is defined by its parameter name the equal simbol, its value and ending with; "NID=3;"or "SID=Test;"
  3. Parameter can be collected in "structures" starts and end with the symbol | and they are identified with their name followed by ; such as |STEST;NID=3;SID=Test;|
  4. A structure can contain also other structures
  1. 字符串是单行
  2. 每个单个参数由其参数名称、等号、其值和结尾定义;“NID=3;” “SID=测试;”
  3. 参数可以收集在“结构”中,以符号| 开始和结束。他们的名字后跟; 如|STEST;NID=3;SID=Test;|
  4. 一个结构也可以包含其他结构

An example of a output string is the following

输出字符串的示例如下

|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|

In this case there is a macro structure KCwhich contains a structure AD. The structure ADis composed by the parameters PE, PFand 2 structures CD. And finaly the structures CDhave the parameters PEand HP

在这种情况下,有一个包含结构AD的宏结构KC。结构体AD由参数PEPF和 2 个结构体CD 组成。最后,结构CD具有参数PEHP

So I would like to parse the string to obtain an Object/Dictionarythat reflects this structure, can you help me?

所以我想解析字符串以获得反映这种结构的对象/字典,你能帮我吗?

Adds after the first answers

在第一个答案之后添加

Hi all, thank you for your help, but I think I should make more clear the output that I would like to get. For the example string that I have, I would like to have an object with the following structure:

大家好,感谢您的帮助,但我想我应该更清楚我想要得到的输出。对于我拥有的示例字符串,我想要一个具有以下结构的对象:

<KC>
    <AD>
        <PE>5</PE>
        <PF>3</PF>
        <CD>
            <PE>5</PE>
            <HP>test</HP>
        </CD>
        <CD>
            <PE>3</PE>
            <HP>abc</HP>
        </CD>
    </AD>
</KC>

So I started to wrote a possible working code base on some hint from @Nvj answer and the answer in this link

所以我开始根据@Nvj 答案的一些提示和这个链接中的答案编写一个可能的工作代码

Option Explicit
Option Base 1

Sub Test()

  Dim strContent As String
  Dim strState   As String
  Dim varOutput  As Variant

  strContent = "|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|"
  Call ParseString(strContent, varOutput, strState)

End Sub

Sub ParseString(ByVal strContent As String, varOutput As Variant, strState As String)
' strContent - source string
' varOutput - created object or array to be returned as result
' strState - Object|Array|Error depending on processing to be returned as state
Dim objTokens As Object
Dim lngTokenId As Long
Dim objRegEx As Object
Dim bMatched As Boolean

Set objTokens = CreateObject("Scripting.Dictionary")
lngTokenId = 0
Set objRegEx = CreateObject("VBScript.RegExp")
With objRegEx
    .Global = True
    .MultiLine = True
    .IgnoreCase = True
    .Pattern = "\|[A-Z]{2};"  'Pattern for the name of structures
    Tokenize objTokens, objRegEx, strContent, lngTokenId, bMatched, "str"
    .Pattern = "[A-Z]{2}=[^\|=;]+;" 'Pattern for parameters name and values
    Tokenize objTokens, objRegEx, strContent, lngTokenId, bMatched, "par"
End With

End Sub

Sub Tokenize(objTokens, objRegEx, strContent, lngTokenId, bMatched, strType)
Dim strKey        As String
Dim strKeyPar     As String
Dim strKeyVal     As String

Dim strWork       As String
Dim strPar        As String
Dim strVal        As String
Dim strLevel      As String

Dim strRes        As String

Dim lngCopyIndex  As Long
Dim objMatch      As Object

strRes = ""
lngCopyIndex = 1
With objRegEx
    For Each objMatch In .Execute(strContent)
        If strType = "str" Then
          bMatched = True
          With objMatch
              strWork = Replace(.Value, "|", "")
              strWork = Replace(strWork, ";", "")
              strLevel = get_Level(strWork)
              strKey = "<" & lngTokenId & strLevel & strType & ">"
              objTokens(strKey) = strWork
              strRes = strRes & Mid(strContent, lngCopyIndex, .FirstIndex - lngCopyIndex + 1) & strKey
              lngCopyIndex = .FirstIndex + .Length + 1
          End With
          lngTokenId = lngTokenId + 1
        ElseIf strType = "par" Then

          strKeyPar = "<" & lngTokenId & "par>"
          strKeyVal = "<" & lngTokenId & "val>"
          strKey = strKeyPar & strKeyVal
          bMatched = True
          With objMatch
              strWork = Replace(.Value, ";", "")
              strPar = Split(strWork, "=")(0)
              strVal = Split(strWork, "=")(1)
              objTokens(strKeyPar) = strPar
              objTokens(strKeyVal) = strVal
              strRes = strRes & Mid(strContent, lngCopyIndex, .FirstIndex - lngCopyIndex + 1) & strKey
              lngCopyIndex = .FirstIndex + .Length + 1
          End With
          lngTokenId = lngTokenId + 2

        End If
    Next
    strContent = strRes & Mid(strContent, lngCopyIndex, Len(strContent) - lngCopyIndex + 1)
End With
End Sub

Function get_Level(strInput As String) As String

Select Case strInput
  Case "KC"
  get_Level = "L1"
  Case "AD"
  get_Level = "L2"
  Case "CD"
  get_Level = "L3"
  Case Else
  MsgBox ("Error")
  End
End Select

End Function

This function creates a dictionary with an item for each structure name, parameter name and parameter value as shown in the figure enter image description hereThanks to the function get_Levelthe items associated to structures have a level that should help to preserve the original hierarchy of the data.

该函数为每个结构名称、参数名称和参数值创建一个字典,如图所示。 在此处输入图片说明多亏了该函数get_Level,与结构关联的项目具有一个级别,应该有助于保留数据的原始层次结构。

So what I am missing is a function to create an object that has the original structure of the input string. This is what the Retrievefunction do in this answer link, but I do not know how to adapt it to my case

所以我缺少的是一个函数来创建一个具有输入字符串原始结构的对象。这就是Retrieve这个答案链接中的函数所做的,但我不知道如何使它适应我的情况

采纳答案by NavkarJain

I've started to write a parser in VBA for the string structure specified by you, and it's not complete, but I'll post it anyways. Maybe you can pick up some ideas from it.

我已经开始在 VBA 中为您指定的字符串结构编写解析器,但它并不完整,但我还是会发布它。也许你可以从中汲取一些想法。

Sub ParseString()

    Dim str As String
    str = "|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|"

    ' Declare an object dictionary
    ' Make a reference to Microsoft Scripting Runtime in order for this to work
    Dim dict As New Dictionary

    ' If the bars are present in the first and last character of the string, replace them
    str = Replace(str, "|", "", 1, 1)
    If (Mid(str, Len(str), 1) = "|") Then
        str = Mid(str, 1, Len(str) - 1)
    End If

    ' Split the string by bars
    Dim substring_array() As String
    substring_array = Split(str, "|")

    ' Declare a regex object
    ' Check the reference to Microsoft VBScript Regular Expressions 5.5 in order for this to work
    Dim regex As New RegExp
    With regex
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
    End With

    ' Object to store the regex matches
    Dim matches As MatchCollection
    Dim param_name_matches As MatchCollection
    Dim parameter_value_matches As MatchCollection

    ' Define some regex patterns
    pattern_for_structure_name = "^[^=;]+;"
    pattern_for_parameters = "[^=;]+=[^=;]+;"
    pattern_for_parameter_name = "[^=;]="
    pattern_for_parameter_val = "[^=;];"

    ' Loop through the elements of the array
    Dim i As Integer
    For i = 0 To UBound(substring_array) - LBound(substring_array)

        ' Get the array element in a string
        str1 = substring_array(i)

        ' Check if it contains a structure name
        regex.Pattern = pattern_for_structure_name
        Set matches = regex.Execute(str1)

        If matches.Count = 0 Then

            ' This substring does not contain a structure name
            ' Check if it contains parameters
            regex.Pattern = pattern_for_parameter
            Set matches = regex.Execute(matches(0).Value)
            If matches.Count = 0 Then

                ' There are no parameters as well as no structure name
                ' This means the string had || - invalid string
                MsgBox ("Invalid string")

            Else

                ' The string contains parameter names
                ' Add each parameter name to the dictionary
                Dim my_match As match
                For Each my_match In matches

                    ' Get the name of the parameter
                    regex.Pattern = pattern_for_parameter_name
                    Set parameter_name_matches = regex.Execute(my_match.Value)

                    ' Check if the above returned any matches
                    If parameter_name_matches.Count = 1 Then

                        ' Remove = sign from the parameter name
                        parameter_name = Replace(parameter_name_matches(0).Value, "=", "")

                        ' Get the value of the parameter
                        regex.Pattern = pattern_for_parameter_value
                        Set parameter_value_matches = regex.Execute(my_match.Value)

                        ' Check if the above returned any matches
                        If parameter_value_matches.Count = 1 Then

                            ' Get the value
                            parameter_value = Replace(parameter_value_matches(0).Value, ";", "")

                            ' Add the parameter name and value as a key pair to the Dictionary object
                            dict.Item(parameter_name) = parameter_value

                        Else

                            ' Number of matches is either 0 or greater than 1 - in both cases the string is invalid
                            MsgBox ("Invalid string")

                        End If

                    Else

                        ' Parameter name did not match - invalid string
                        MsgBox ("Invalid string")

                    End If

                Next

            End If

        ElseIf matches.Count = 1 Then

            ' This substring contains a single structure name
            ' Check if it has parameter names

        Else

            ' This substring contains more than one structure name - the original string is invalid
            MsgBox ("Invalid string")

        End If

    Next i

End Sub

回答by Logan Reed

This looks like a simple nested delimited string. A couple of Split()functions will do the trick:

这看起来像一个简单的嵌套分隔字符串。有几个Split()函数可以解决这个问题:

Option Explicit

Function parseString(str As String) As Collection

    Dim a1() As String, i1 As Long, c1 As Collection
    Dim a2() As String, i2 As Long, c2 As Collection
    Dim a3() As String

    a1 = Split(str, "|")
    Set c1 = New Collection
    For i1 = LBound(a1) To UBound(a1)
        If a1(i1) <> "" Then
            Set c2 = New Collection
            a2 = Split(a1(i1), ";")
            For i2 = LBound(a2) To UBound(a2)
                If a2(i2) <> "" Then
                    a3 = Split(a2(i2), "=")
                    If UBound(a3) > 0 Then
                        c2.Add a3(1), a3(0)
                    ElseIf UBound(a3) = 0 Then
                        c2.Add a3(0)
                    End If
                End If
            Next i2
            c1.Add c2
        End If
    Next i1

    Set parseString = c1

End Function


Sub testParseString()

    Dim c As Collection

    Set c = parseString("|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|")

    Debug.Assert c(1)(1) = "KC"
    Debug.Assert c(2)("PE") = "5"
    Debug.Assert c(3)(1) = "CD"
    Debug.Assert c(4)("HP") = "abc"
    Debug.Assert c(4)(3) = "abc"  

End Sub

Note that you can address values by both, index and key (if key existed in the input). If key was not provided you can only access the value by its index. You can also iterate collection recursively to get all the values in a tree structure.

请注意,您可以通过索引和键(如果输入中存在键)来寻址值。如果未提供键,则只能通过其索引访问该值。您还可以递归迭代集合以获取树结构中的所有值。

Food for thought: since your structures may have repeated names (in your case "CD" structure happens twice) Collections / Dictionaries would find it problematic to store this elegantly (due to key collisions). Another good way to approach this is to create an XML structure with DOMDocument and use XPath to access its elements. See Program with DOM in Visual Basic

深思熟虑:由于您的结构可能有重复的名称(在您的情况下,“CD”结构出现两次)集合/字典会发现优雅地存储它(由于键冲突)有问题。解决此问题的另一个好方法是使用 DOMDocument 创建 XML 结构并使用 XPath 访问其元素。请参见在 Visual Basic 中使用 DOM 编程

UPDATE: I've added XML example below as well. Have a look.

更新:我也在下面添加了 XML 示例。看一看。

回答by Logan Reed

Here is another take on your string parsing issue using DOMDocumentXML parser. You need to include Microsoft XML, v.6.0 in your VBA references.

这是使用DOMDocumentXML 解析器解决字符串解析问题的另一种方法。您需要在 VBA 引用中包含 Microsoft XML, v.6.0。

Function parseStringToDom(str As String) As DOMDocument60

    Dim a1() As String, i1 As Long
    Dim a2() As String, i2 As Long
    Dim a3() As String

    Dim dom As DOMDocument60
    Dim rt As IXMLDOMNode
    Dim nd As IXMLDOMNode

    Set dom = New DOMDocument60
    dom.async = False
    dom.validateOnParse = False
    dom.resolveExternals = False
    dom.preserveWhiteSpace = True

    Set rt = dom.createElement("root")
    dom.appendChild rt

    a1 = Split(str, "|")
    For i1 = LBound(a1) To UBound(a1)
        If a1(i1) <> "" Then
            a2 = Split(a1(i1), ";")
            Set nd = dom.createElement(a2(0))
            For i2 = LBound(a2) To UBound(a2)
                If a2(i2) <> "" Then
                    a3 = Split(a2(i2), "=")
                    If UBound(a3) > 0 Then
                        nd.appendChild dom.createElement(a3(0))
                        nd.LastChild.Text = a3(1)
                    End If
                End If
            Next i2
            rt.appendChild nd
        End If
    Next i1

    Set parseStringToDom = dom

End Function


Sub testParseStringToDom()

    Dim dom As DOMDocument60

    Set dom = parseStringToDom("|KC;|AD;PE=5;PF=3;|CD;PE=5;HP=test;|CD;PE=3;HP=abc;|")

    Debug.Assert Not dom.SelectSingleNode("/root/KC") Is Nothing
    Debug.Assert dom.SelectSingleNode("/root/AD/PE").Text = "5"
    Debug.Assert dom.SelectSingleNode("/root/CD[1]/HP").Text = "test"
    Debug.Assert dom.SelectSingleNode("/root/CD[2]/HP").Text = "abc"

    Debug.Print dom.XML

End Sub

As you can see this converts your text into an XML DOM document preserving all the structures and allowing for duplicates in naming. You can then use XPath to access any node or value. This can also be extended to have more nesting levels and further structures.

如您所见,这会将您的文本转换为 XML DOM 文档,保留所有结构并允许命名重复。然后您可以使用 XPath 访问任何节点或值。这也可以扩展为具有更多嵌套级别和更多结构。

This is the XML document it creates behind the scenes:

这是它在幕后创建的 XML 文档:

<root>
    <KC/>
    <AD>
        <PE>5</PE>
        <PF>3</PF>
    </AD>
    <CD>
        <PE>5</PE>
        <HP>test</HP>
    </CD>
    <CD>
        <PE>3</PE>
        <HP>abc</HP>
    </CD>
</root>