如何在 VBA 中使用 RegExp 隔离空间（\s 与 \p{Zs}）？

Question

提问by wackoHymano1997

Introduction/Question:

介绍/问题：

I have been studying the use of Regular Expressions (using VBA/Excel), and so far I cannot understand how I would isolate a <space>(or " ") using regexp from other white space characters that are included in \s. I thought that I would be able to use \p{Zs}, but in my testing so far, it has not worked out. Could someone please correct my misunderstanding? I appreciate any helpful input.

我一直在研究正则表达式的使用（使用 VBA/Excel），到目前为止，我无法理解如何使用正则表达式将 a <space>（或" "）与\s. 我以为我可以使用\p{Zs}，但在我迄今为止的测试中，它还没有成功。有人可以纠正我的误解吗？我感谢任何有用的输入。

To offer proper credit, I modified some code that started off as a very helpful post by @Portland Runner that is found here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

为了提供适当的信用，我修改了一些代码，这些代码最初是@Portland Runner 的一篇非常有用的帖子，可在此处找到：How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

This has been my approach/study so far:

到目前为止，这是我的方法/研究：

Using the string "14z-16z Flavored Peanuts", I've been trying to write a RegExp which removes "14z-16z "and leaves only "Flavored Peanuts". I initially used ^[0-9](\S)+as strPattern and a sub procedure with following snippets:

使用 string "14z-16z Flavored Peanuts"，我一直在尝试编写一个 RegExp ，"14z-16z "它只删除和保留"Flavored Peanuts"。我最初用作^[0-9](\S)+strPattern 和具有以下片段的子过程：

Sub REGEXP_TEST_SPACE()

Dim strPattern As String
Dim strReplace As String
Dim strInput As String
Dim regEx As New RegExp

strInput = "14z-16z Flavored Peanuts"
strPattern = "^[0-9](\S)+"
strReplace = ""

With regEx
    .Global = True
    .MultiLine = True
    .IgnoreCase = True
    .pattern = strPattern
End With

If regEx.Test(strInput) Then
    Range("A1").Value = regEx.Replace(strInput, strReplace)
End If

End Sub

This approach gave me an A1 value of " Flavored Peanuts"(note the leading <space>in that string).

这种方法给了我一个 A1 值" Flavored Peanuts"（注意该<space>字符串中的前导）。

I then changed strPattern = "^[0-9](\S)+(\s)"(added the (\s)), which gave me the desired A1 value of "Flavored Peanuts". Great!!! I got the desired output!

然后我更改了strPattern = "^[0-9](\S)+(\s)"（添加了(\s)），这给了我所需的 A1 值"Flavored Peanuts"。伟大的！！！我得到了想要的输出！

But as I understand it, \srepresents all white-space characters, equal to [ \f\n\r\t\v]. In this case, I know that the character is just a normal, single space -- I don't need carriage return, horizontal tab, etc. So I tried to see if I could just isolate the <space>character in regex (unicode separator: space), which I believe is \p{Zs}(e.g., strPattern = "^[0-9](\S)+(\p{Zs})"). Using this pattern, however, doesn't return a match whatsoever, nevermind removing the leading space. I also tried the more general \p{Z}(all unicode separators), but that didn't work either.

但据我所知，\s代表所有空白字符，等于[ \f\n\r\t\v]. 在这种情况下，我知道该字符只是一个普通的单个空格——我不需要回车、水平制表符等。所以我试着看看我是否可以<space>在正则表达式中隔离这个字符（unicode separator: space ），我认为是\p{Zs}（例如，strPattern = "^[0-9](\S)+(\p{Zs})"）。但是，使用此模式不会返回任何匹配项，更不用说删除前导空格了。我还尝试了更通用的\p{Z}（所有 unicode 分隔符），但这也不起作用。

Clearly I have missed something in my study. Help is both desired and appreciated. Thank you.

显然，我在学习中遗漏了一些东西。帮助是需要和赞赏的。谢谢你。

Answer 1

采纳答案by Wiktor Stribi?ew

Since you are trying to find a correspondence with the \p{Zs}Unicode category class, you might want to also handle all hard spaces. This code will be helpful:

由于您正在尝试查找与\p{Zs}Unicode 类别类的对应关系，因此您可能还想处理所有硬空间。此代码将有所帮助：

strPattern = "^[0-9](\S)+[ " & ChrW(160) & "]"

Or,

或者，

strPattern = "^[0-9](\S+)[ \x0A]"

The [ \x0A]character class will match either a regular space or a hard, non-breaking space.

该[ \x0A]字符类将匹配一个普通的空间或硬，非换空间。

If you need to match all kinds of spaces, you can use this regex pattern taken based on the information on https://www.cs.tut.fi/~jkorpela/chars/spaces.html:

如果您需要匹配各种空格，您可以使用根据https://www.cs.tut.fi/~jkorpela/chars/spaces.html上的信息获取的正则表达式模式：

strPattern = "^[0-9](\S)+[ \xA0\u1680\u180E\u2000-\u200B\u202F\u205F\u3000\uFEFF]"

This is the table with code point explanations:

这是带有代码点解释的表格：

U+0020  32  SPACE   foo bar Depends on font, typically 1/4 em, often adjusted
U+00A0  160 NO-BREAK SPACE  foo bar As a space, but often not adjusted
U+1680  5760    OGHAM SPACE MARK    foo?bar Unspecified; usually not really a space but a dash
U+180E  6158    MONGOLIAN VOWEL SEPARATOR   foo?bar No width
U+2000  8192    EN QUAD foo?bar 1 en (= 1/2 em)
U+2001  8193    EM QUAD foo?bar 1 em (nominally, the height of the font)
U+2002  8194    EN SPACE    foo?bar 1 en (= 1/2 em)
U+2003  8195    EM SPACE    foo?bar 1 em
U+2004  8196    THREE-PER-EM SPACE  foo?bar 1/3 em
U+2005  8197    FOUR-PER-EM SPACE   foo?bar 1/4 em
U+2006  8198    SIX-PER-EM SPACE    foo?bar 1/6 em
U+2007  8199    FIGURE SPACE    foo?bar “Tabular width”, the width of digits
U+2008  8200    PUNCTUATION SPACE   foo?bar The width of a period “.”
U+2009  8201    THIN SPACE  foo?bar 1/5 em (or sometimes 1/6 em)
U+200A  8202    HAIR SPACE  foo?bar Narrower than THIN SPACE
U+200B  8203    ZERO WIDTH SPACE    foo?bar Nominally no width, but may expand
U+202F  8239    NARROW NO-BREAK SPACE   foo?bar Narrower than NO-BREAK SPACE (or SPACE)
U+205F  8287    MEDIUM MATHEMATICAL SPACE   foo?bar 4/18 em
U+3000  12288   IDEOGRAPHIC SPACE   foo　bar The width of ideographic (CJK) characters.
U+FEFF  65279   ZERO WIDTH NO-BREAK SPACE

Best regards.

此致。

Answer 2

回答by Jeanno

You can explicitly include a white space in your RegEx pattern. The following pattern works just fine

您可以在 RegEx 模式中明确包含一个空格。以下模式工作得很好

strPattern = "^[0-9](\S)+ "

Answer 3

回答by phrebh

Just use a literal space character: strPattern = "^[0-9](\S)+ ".

只需使用文字空格字符：strPattern = "^[0-9](\S)+ ".

如何在 VBA 中使用 RegExp 隔离空间（\s 与 \p{Zs}）？

提问by wackoHymano1997

采纳答案by Wiktor Stribi?ew

回答by Jeanno

回答by phrebh

相关推荐

最近更新

标签

如何在 VBA 中使用 RegExp 隔离空间（\s 与 \p{Zs}）？

提问by wackoHymano1997

采纳答案by Wiktor Stribi?ew

回答by Jeanno

回答by phrebh

相关推荐

为什么即使代码运行良好，VBA 也会说“（无响应）”？

在 VBA 中设置范围并定义命名范围

vba 每天在同一时间启动一个宏

Excel 工作表突然无法运行 VBA 代码 - 在“某些”其他机器上工作 - 复制工作表工作正常

相关推荐

最近更新

标签