如何在 VBA 中使用 RegExp 隔离空间(\s 与 \p{Zs})?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28617616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I isolate a space using RegExp in VBA (\s vs. \p{Zs})?
提问by wackoHymano1997
Introduction/Question:
介绍/问题:
I have been studying the use of Regular Expressions (using VBA/Excel), and so far I cannot understand how I would isolate a <space>
(or " "
) using regexp from other white space characters that are included in \s
. I thought that I would be able to use \p{Zs}
, but in my testing so far, it has not worked out. Could someone please correct my misunderstanding? I appreciate any helpful input.
我一直在研究正则表达式的使用(使用 VBA/Excel),到目前为止,我无法理解如何使用正则表达式将 a <space>
(或" "
)与\s
. 我以为我可以使用\p{Zs}
,但在我迄今为止的测试中,它还没有成功。有人可以纠正我的误解吗?我感谢任何有用的输入。
To offer proper credit, I modified some code that started off as a very helpful post by @Portland Runner that is found here: How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
为了提供适当的信用,我修改了一些代码,这些代码最初是@Portland Runner 的一篇非常有用的帖子,可在此处找到:How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops
This has been my approach/study so far:
到目前为止,这是我的方法/研究:
Using the string "14z-16z Flavored Peanuts"
, I've been trying to write a RegExp which removes "14z-16z "
and leaves only "Flavored Peanuts"
. I initially used ^[0-9](\S)+
as strPattern and a sub procedure with following snippets:
使用 string "14z-16z Flavored Peanuts"
,我一直在尝试编写一个 RegExp ,"14z-16z "
它只删除和保留"Flavored Peanuts"
。我最初用作^[0-9](\S)+
strPattern 和具有以下片段的子过程:
Sub REGEXP_TEST_SPACE()
Dim strPattern As String
Dim strReplace As String
Dim strInput As String
Dim regEx As New RegExp
strInput = "14z-16z Flavored Peanuts"
strPattern = "^[0-9](\S)+"
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = True
.pattern = strPattern
End With
If regEx.Test(strInput) Then
Range("A1").Value = regEx.Replace(strInput, strReplace)
End If
End Sub
This approach gave me an A1 value of " Flavored Peanuts"
(note the leading <space>
in that string).
这种方法给了我一个 A1 值" Flavored Peanuts"
(注意该<space>
字符串中的前导)。
I then changed strPattern = "^[0-9](\S)+(\s)"
(added the (\s)
), which gave me the desired A1 value of "Flavored Peanuts"
. Great!!! I got the desired output!
然后我更改了strPattern = "^[0-9](\S)+(\s)"
(添加了(\s)
),这给了我所需的 A1 值"Flavored Peanuts"
。伟大的!!!我得到了想要的输出!
But as I understand it, \s
represents all white-space characters, equal to [ \f\n\r\t\v]
. In this case, I know that the character is just a normal, single space -- I don't need carriage return, horizontal tab, etc. So I tried to see if I could just isolate the <space>
character in regex (unicode separator: space), which I believe is \p{Zs}
(e.g., strPattern = "^[0-9](\S)+(\p{Zs})"
). Using this pattern, however, doesn't return a match whatsoever, nevermind removing the leading space. I also tried the more general \p{Z}
(all unicode separators), but that didn't work either.
但据我所知,\s
代表所有空白字符,等于[ \f\n\r\t\v]
. 在这种情况下,我知道该字符只是一个普通的单个空格——我不需要回车、水平制表符等。所以我试着看看我是否可以<space>
在正则表达式中隔离这个字符(unicode separator: space ),我认为是\p{Zs}
(例如,strPattern = "^[0-9](\S)+(\p{Zs})"
)。但是,使用此模式不会返回任何匹配项,更不用说删除前导空格了。我还尝试了更通用的\p{Z}
(所有 unicode 分隔符),但这也不起作用。
Clearly I have missed something in my study. Help is both desired and appreciated. Thank you.
显然,我在学习中遗漏了一些东西。帮助是需要和赞赏的。谢谢你。
采纳答案by Wiktor Stribi?ew
Since you are trying to find a correspondence with the \p{Zs}
Unicode category class, you might want to also handle all hard spaces. This code will be helpful:
由于您正在尝试查找与\p{Zs}
Unicode 类别类的对应关系,因此您可能还想处理所有硬空间。此代码将有所帮助:
strPattern = "^[0-9](\S)+[ " & ChrW(160) & "]"
Or,
或者,
strPattern = "^[0-9](\S+)[ \x0A]"
The [ \x0A]
character class will match either a regular space or a hard, non-breaking space.
该[ \x0A]
字符类将匹配一个普通的空间或硬,非换空间。
If you need to match all kinds of spaces, you can use this regex pattern taken based on the information on https://www.cs.tut.fi/~jkorpela/chars/spaces.html:
如果您需要匹配各种空格,您可以使用根据https://www.cs.tut.fi/~jkorpela/chars/spaces.html上的信息获取的正则表达式模式:
strPattern = "^[0-9](\S)+[ \xA0\u1680\u180E\u2000-\u200B\u202F\u205F\u3000\uFEFF]"
This is the table with code point explanations:
这是带有代码点解释的表格:
U+0020 32 SPACE foo bar Depends on font, typically 1/4 em, often adjusted
U+00A0 160 NO-BREAK SPACE foo bar As a space, but often not adjusted
U+1680 5760 OGHAM SPACE MARK foo?bar Unspecified; usually not really a space but a dash
U+180E 6158 MONGOLIAN VOWEL SEPARATOR foo?bar No width
U+2000 8192 EN QUAD foo?bar 1 en (= 1/2 em)
U+2001 8193 EM QUAD foo?bar 1 em (nominally, the height of the font)
U+2002 8194 EN SPACE foo?bar 1 en (= 1/2 em)
U+2003 8195 EM SPACE foo?bar 1 em
U+2004 8196 THREE-PER-EM SPACE foo?bar 1/3 em
U+2005 8197 FOUR-PER-EM SPACE foo?bar 1/4 em
U+2006 8198 SIX-PER-EM SPACE foo?bar 1/6 em
U+2007 8199 FIGURE SPACE foo?bar “Tabular width”, the width of digits
U+2008 8200 PUNCTUATION SPACE foo?bar The width of a period “.”
U+2009 8201 THIN SPACE foo?bar 1/5 em (or sometimes 1/6 em)
U+200A 8202 HAIR SPACE foo?bar Narrower than THIN SPACE
U+200B 8203 ZERO WIDTH SPACE foo?bar Nominally no width, but may expand
U+202F 8239 NARROW NO-BREAK SPACE foo?bar Narrower than NO-BREAK SPACE (or SPACE)
U+205F 8287 MEDIUM MATHEMATICAL SPACE foo?bar 4/18 em
U+3000 12288 IDEOGRAPHIC SPACE foo bar The width of ideographic (CJK) characters.
U+FEFF 65279 ZERO WIDTH NO-BREAK SPACE
Best regards.
此致。
回答by Jeanno
You can explicitly include a white space in your RegEx pattern. The following pattern works just fine
您可以在 RegEx 模式中明确包含一个空格。以下模式工作得很好
strPattern = "^[0-9](\S)+ "
回答by phrebh
Just use a literal space character: strPattern = "^[0-9](\S)+ "
.
只需使用文字空格字符:strPattern = "^[0-9](\S)+ "
.