php 正则表达式:\w - UTF-8 中的“_”+“-”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2062169/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
RegEx: \w - "_" + "-" in UTF-8
提问by Alix Axel
I need a regular expression that matches UTF-8 letters and digits, the dash sign (-) but doesn't match underscores (_), I tried these silly attempts without success:
我需要一个匹配 UTF-8 字母和数字、破折号 ( -) 但不匹配下划线 ( _)的正则表达式,我尝试了这些愚蠢的尝试但没有成功:
([\w-^_])+([\w^_]-?)+(\w[^_]-?)+
([\w-^_])+([\w^_]-?)+(\w[^_]-?)+
The \wis shorthand for [A-Za-z0-9_], but it also matches UTF-8 chars if I have the umodifier set.
The\w是 的简写[A-Za-z0-9_],但如果我u设置了修饰符,它也会匹配 UTF-8 字符。
Can anyone help me out with this one?
谁能帮我解决这个问题?
回答by gha.st
Try this:
尝试这个:
(?:[\w\-](?<!_))+
It does a simple match on anything that is encoded as a \w (or a dash) and then has a zero-width lookbehind that ensures that the character that was just matched is not a underscore.
它对编码为 \w(或破折号)的任何内容进行简单匹配,然后进行零宽度后视,以确保刚刚匹配的字符不是下划线。
Otherwise you could pick this one:
否则你可以选择这个:
(?:[^_\W]|-)+
which is a more set-based approach (note the uppercase W)
这是一种更基于集合的方法(注意大写的 W)
OK, I had a lot of fun with unicode in php's flavor of PCREs :D Peekaboo says there is a simple solution available:
好的,我在 php 风格的 PCRE 中使用 unicode 玩得很开心:D Peekaboo 说有一个简单的解决方案可用:
[\p{L}\p{N}\-]+
\p{L} matches anything unicode that qualifies as a Letter (note: not a word character, thus no underscores), while \p{N} matches anything that looks like a number (including roman numerals and more exotic things).
\- is just an escaped dash. Although not strictly necessary, I tend to make it a point to escape dashes in character classes... Note, that there are dozens of different dashes in unicode, thus giving rise to the following version:
\p{L} 匹配任何符合字母条件的 unicode(注意:不是单词字符,因此没有下划线),而 \p{N} 匹配任何看起来像数字的东西(包括罗马数字和更奇特的东西)。
\- 只是一个转义的破折号。虽然不是绝对必要的,但我倾向于在字符类中转义破折号......请注意,unicode中有几十种不同的破折号,因此产生了以下版本:
[\p{L}\p{N}\p{Pd}]+
Where "Pd" is Punctuation Dash, including, but not limited to our minus-dash-thingy. (Note, again no underscore here).
其中“Pd”是标点符号,包括但不限于我们的减号。(注意,这里再次没有下划线)。
回答by Jiri Klouda
I am not sure which language you use, but in PERL you can simply write: [[:alnum:]-]+ when the correct locale is set.
我不确定您使用哪种语言,但在 PERL 中,您可以简单地编写: [[:alnum:]-]+ 设置正确的语言环境。

