php 正则表达式：\w - UTF-8 中的“_”+“-”

Question

提问by Alix Axel

I need a regular expression that matches UTF-8 letters and digits, the dash sign (-) but doesn't match underscores (_), I tried these silly attempts without success:

我需要一个匹配 UTF-8 字母和数字、破折号 ( -) 但不匹配下划线 ( _)的正则表达式，我尝试了这些愚蠢的尝试但没有成功：

([\w-^_])+
([\w^_]-?)+
(\w[^_]-?)+

([\w-^_])+
([\w^_]-?)+
(\w[^_]-?)+

The \wis shorthand for [A-Za-z0-9_], but it also matches UTF-8 chars if I have the umodifier set.

The\w是的简写[A-Za-z0-9_]，但如果我u设置了修饰符，它也会匹配 UTF-8 字符。

Can anyone help me out with this one?

谁能帮我解决这个问题？

Answer 1

回答by gha.st

Try this:

尝试这个：

(?:[\w\-](?<!_))+

It does a simple match on anything that is encoded as a \w (or a dash) and then has a zero-width lookbehind that ensures that the character that was just matched is not a underscore.

它对编码为 \w（或破折号）的任何内容进行简单匹配，然后进行零宽度后视，以确保刚刚匹配的字符不是下划线。

Otherwise you could pick this one:

否则你可以选择这个：

(?:[^_\W]|-)+

which is a more set-based approach (note the uppercase W)

这是一种更基于集合的方法（注意大写的 W）

OK, I had a lot of fun with unicode in php's flavor of PCREs :D Peekaboo says there is a simple solution available:

好的，我在 php 风格的 PCRE 中使用 unicode 玩得很开心：D Peekaboo 说有一个简单的解决方案可用：

[\p{L}\p{N}\-]+

\p{L} matches anything unicode that qualifies as a Letter (note: not a word character, thus no underscores), while \p{N} matches anything that looks like a number (including roman numerals and more exotic things).
\- is just an escaped dash. Although not strictly necessary, I tend to make it a point to escape dashes in character classes... Note, that there are dozens of different dashes in unicode, thus giving rise to the following version:

\p{L} 匹配任何符合字母条件的 unicode（注意：不是单词字符，因此没有下划线），而 \p{N} 匹配任何看起来像数字的东西（包括罗马数字和更奇特的东西）。
\- 只是一个转义的破折号。虽然不是绝对必要的，但我倾向于在字符类中转义破折号......请注意，unicode中有几十种不同的破折号，因此产生了以下版本：

[\p{L}\p{N}\p{Pd}]+

Where "Pd" is Punctuation Dash, including, but not limited to our minus-dash-thingy. (Note, again no underscore here).

其中“Pd”是标点符号，包括但不限于我们的减号。（注意，这里再次没有下划线）。

Answer 2

回答by Jiri Klouda

I am not sure which language you use, but in PERL you can simply write: [[:alnum:]-]+ when the correct locale is set.

我不确定您使用哪种语言，但在 PERL 中，您可以简单地编写： [[:alnum:]-]+ 设置正确的语言环境。

php 正则表达式：\w - UTF-8 中的“_”+“-”

提问by Alix Axel

回答by gha.st

回答by Jiri Klouda

相关推荐

最近更新

标签

php 正则表达式：\w - UTF-8 中的“_”+“-”

提问by Alix Axel

回答by gha.st

回答by Jiri Klouda

相关推荐

php 教义中的 findByExample

php 在 Magento 观察者中获取 POST 数据

php 从 IBAN 银行帐号生成 BIC

php 消息：未定义的属性：CI_Loader::$session 使用 codeigniter

相关推荐

最近更新

标签