VBA 中的 RegEx:将复杂字符串分解为多个标记?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3681920/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 12:05:14  来源:igfitidea点击:

RegEx in VBA: Break a complex string into multiple tokens?

regexexcelvba

提问by TallPaul

I am trying to parse a line in a mmCIF Protein file into separate tokens using Excel 2000/2003. Worst case it COULD look something like this:

我正在尝试使用 Excel 2000/2003 将 mmCIF Protein 文件中的一行解析为单独的标记。最坏的情况可能是这样的:

token1 token2 "token's 1a',1b'" 'token4"5"' 12 23.2 ? . 'token' tok'en to"ken

Which should become the following tokens:

这应该成为以下令牌:

token1  
token2  
token's 1a',1b' (note: the double quotes have disappeared)  
token4"5" (note: the single quotes have disappeared)  
12  
23.2  
?  
.  
token (note: the single quotes have disappeared)  
to'ken  
to"ken  

I am looking to see if a RegEx is even possible to split this kind of line into tokens?

我想看看 RegEx 是否可以将这种行拆分为令牌?

采纳答案by Marc Thibault

Nice puzzle. Thanks.

不错的拼图。谢谢。

This pattern (aPatt below) gets the tokens separated, but I can't figure how to remove the outer quotes.

这种模式(下面的 aPatt)将标记分开,但我不知道如何删除外部引号。

tallpaul() produces:

tallpaul() 产生:

 token1
 token2
 "token's 1a',1b'"
 'token4"5"'
 12
 23.2
 ?
 .
 'token'
 tok'en
 to"ken

If you can figure out how to lose the outer quotes, please let us know. This needs a reference to "Microsoft VBScript Regular Expressions" to work.

如果您能弄清楚如何丢失外部引号,请告诉我们。这需要对“Microsoft VBScript 正则表达式”的引用才能工作。

Option Explicit
''returns a list of matches
Function RegExpTest(patrn, strng)
   Dim regEx   ' Create variable.
   Set regEx = New RegExp   ' Create a regular expression.
   regEx.Pattern = patrn   ' Set pattern.
   regEx.IgnoreCase = True   ' Set case insensitivity.
   regEx.Global = True   ' Set global applicability.
   Set RegExpTest = regEx.Execute(strng)   ' Execute search.
End Function

Function tallpaul() As Boolean
    Dim aString As String
    Dim aPatt As String
    Dim aMatch, aMatches

    '' need to pad the string with leading and trailing spaces.
    aString = " token1 token2 ""token's 1a',1b'"" 'token4""5""' 12 23.2 ? . 'token' tok'en to""ken "
    aPatt = "(\s'[^']+'(?=\s))|(\s""[^""]+""(?=\s))|(\s[\w\?\.]+(?=\s))|(\s\S+(?=\s))"
    Set aMatches = RegExpTest(aPatt, aString)

    For Each aMatch In aMatches
          Debug.Print aMatch.Value
    Next
    tallpaul = True
End Function

回答by Simon Cowen

It is possible to do:

可以这样做:

You'll need to reference "Microsoft VBScript Regular Expressions 5.5" in your VBA Project, then...

您需要在 VBA 项目中引用“Microsoft VBScript 正则表达式 5.5”,然后...

Private Sub REFinder(PatternString As String, StringToTest As String)
    Set RE = New RegExp

    With RE
        .Global = True
        .MultiLine = False
        .IgnoreCase = False
        .Pattern = PatternString
    End With

    Set Matches = RE.Execute(StringToTest)

    For Each Match In Matches
        Debug.Print Match.Value & " ~~~ " & Match.FirstIndex & " - " & Match.Length & " = " & Mid(StringToTest, Match.FirstIndex + 1, Match.Length)

        ''#You get a submatch for each of the other possible conditions (if using ORs)
        For Each Item In Match.SubMatches
            Debug.Print "Submatch:" & Item
        Next Item
        Debug.Print
    Next Match

    Set RE = Nothing
    Set Matches = Nothing
    Set Match = Nothing
    Set SubMatch = Nothing
End Sub

Sub DoIt()
    ''#This simply splits by space...
    REFinder "([.^\w]+\s)|(.+$)", "Token1 Token2 65.56"
End Sub

This is obviously just a really simple example as I'm not very knowledgable of RegExp, it's more just to show you HOW it can be done in VBA (you'd probably also want to do something more useful than Debug.Print with the resulting tokens!). I'll have to leave writing the RegExp expression to somebody else I'm afraid!

这显然只是一个非常简单的例子,因为我对 RegExp 不是很了解,更多的是向您展示如何在 VBA 中完成它(您可能还想做一些比 Debug.Print 更有用的事情,结果是代币!)。恐怕我将不得不将 RegExp 表达式的编写工作留给其他人!

Simon

西蒙