javascript 带有特殊字符的名称的正则表达式 (Unicode)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5963228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex for names with special characters (Unicode)
提问by Kristoffer la Cour
Okay, I have read about regex all day now, and still don't understand it properly. What i'm trying to do is validate a name, but the functions i can find for this on the internet only use [a-zA-Z]
, leaving characters out that i need to accept to.
好的,我已经读了一整天关于正则表达式的内容,但仍然没有正确理解它。我想要做的是验证一个名称,但我在互联网上可以找到的功能只使用[a-zA-Z]
,而忽略了我需要接受的字符。
I basically need a regex that checks that the name is at least two words, and that it does not contain numbers or special characters like !"#¤%&/()=...
, however the words can contain characters like ?, é, ? and so on...
我基本上需要一个正则表达式来检查名称是否至少是两个单词,并且它不包含数字或特殊字符,例如!"#¤%&/()=...
,但是单词可以包含像 ?, é, ? 等等...
An example of an accepted name would be: "John Elkj?rd" or "André Svenson"
An non-accepted name would be: "Hans", "H4nn3Andersen" or "Martin Henriksen!"
可接受名称的示例是:“John Elkj?rd”或“André Svenson”
不被接受的名称将是:“ Hans”、“H 4nn 3Andersen”或“Martin Henriksen !”
If it matters i use the javascript .match()
function client side and want to use php's preg_replace()
only "in negative" server side. (removing non-matching characters).
如果重要的话,我使用 javascript.match()
函数客户端并想使用 phppreg_replace()
唯一的“负面”服务器端。(删除不匹配的字符)。
Any help would be much appreciated.
任何帮助将非常感激。
Update:
Okay, thanks to Alix Axel's answeri have the important part down, the server side one.
更新:
好的,感谢Alix Axel 的回答,我有重要的部分,服务器端。
But as the page from LightWing's answersuggests, i'm unable to find anything about unicode support for javascript, so i ended up with half a solution for the client side, just checking for at least two words and minimum 5 characters like this:
但是正如LightWing 回答中的页面所暗示的那样,我找不到任何关于 unicode 对 javascript 支持的信息,所以我最终为客户端找到了一半的解决方案,只需检查至少两个单词和至少 5 个字符,如下所示:
if(name.match(/\S+/g).length >= minWords && name.length >= 5) {
//valid
}
An alternative would be to specify all the unicode characters as suggested in shifty's answer, which i might end up doing something like, along with the solution above, but it is a bit unpractical though.
另一种方法是按照shifty 的回答中的建议指定所有 unicode 字符,我最终可能会做类似的事情,以及上面的解决方案,但这有点不切实际。
回答by Alix Axel
Try the following regular expression:
试试下面的正则表达式:
^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$
In PHP this translates to:
在 PHP 中,这转化为:
if (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0)
{
// valid
}
You should read it like this:
你应该这样读:
^ # start of subject
(?: # match this:
[ # match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s # any kind of space
[ #match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
\s? # any kind of space (0 or more times)
)+ # one or more times
$ # end of subject
I honestly don't know how to port this to Javascript, I'm not even sure Javascript supports Unicode properties but in PHP PCRE this seems to work flawlessly @ IDEOne.com:
老实说,我不知道如何将它移植到 Javascript,我什至不确定 Javascript 是否支持 Unicode 属性,但在 PHP PCRE 中,这似乎完美无缺@IDEOne.com:
$names = array
(
'Alix',
'André Svenson',
'H4nn3 Andersen',
'Hans',
'John Elkj?rd',
'Kristoffer la Cour',
'Marco d\'Almeida',
'Martin Henriksen!',
);
foreach ($names as $name)
{
echo sprintf('%s is %s' . "\n", $name, (preg_match('~^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+\s?)+$~u', $name) > 0) ? 'valid' : 'invalid');
}
I'm sorry I can't help you regarding the Javascript part but probably someone here will.
很抱歉,关于 Javascript 部分我无法帮助您,但这里可能有人会帮助您。
Validates:
验证:
- John Elkj?rd
- André Svenson
- Marco d'Almeida
- Kristoffer la Cour
- 约翰·埃尔克杰德
- 安德烈·斯文森
- 马可·德阿尔梅达
- 克里斯托弗拉库尔
Invalidates:
无效:
- Hans
- H4nn3 Andersen
- Martin Henriksen!
- 汉斯
- H4nn3安徒生
- 马丁·亨利克森!
To replace invalid characters, though I'm not sure why you need this, you just need to change it slightly:
要替换无效字符,虽然我不确定您为什么需要它,但您只需要稍微更改它:
$name = preg_replace('~[^\p{L}\p{Mn}\p{Pd}\'\x{2019}\s]~u', '', $name);
Examples:
例子:
- H4nn3 Andersen ->Hnn Andersen
- Martin Henriksen! ->Martin Henriksen
- H4nn3 安徒生->Hnn 安徒生
- 马丁·亨利克森!->马丁·亨利克森
Note that you always need to use the umodifier.
请注意,您始终需要使用u修饰符。
回答by JacquesB
Regarding JavaScript it is more tricky, since JavaScript Regex syntax doesn't support unicode character properties. A pragmatic solution would be to match letters like this:
关于 JavaScript,它更棘手,因为 JavaScript Regex 语法不支持 unicode 字符属性。一个务实的解决方案是匹配这样的字母:
[a-zA-Z\xC0-\uFFFF]
This allows letters in all languages and excludes numbers and all the special (non-letter) characters commonly found on keyboards. It is imperfect because it also allows unicode special symbols which are not letters, e.g. emoticons, snowman and so on. However, since these symbols are typically not available on keyboards I don't think they will be entered by accident. So depending on your requirements it may be an acceptable solution.
这允许使用所有语言的字母,但不包括数字和键盘上常见的所有特殊(非字母)字符。它是不完美的,因为它还允许非字母的 unicode 特殊符号,例如表情符号、雪人等。然而,由于这些符号在键盘上通常不可用,我认为它们不会被意外输入。因此,根据您的要求,它可能是一个可以接受的解决方案。
回答by Seth V
Here's an optimization over the fantastic answer by @Alix above. It removes the need to define the character class twice, and allows for easier definition of any number of required words.
这是对上面@Alix 出色答案的优化。它消除了两次定义字符类的需要,并允许更容易地定义任意数量的必需单词。
^(?:[\p{L}\p{Mn}\p{Pd}\'\x{2019}]+(?:$|\s+)){2,}$
It can be broken down as follows:
它可以分解如下:
^ # start
(?: # non-capturing group
[ # match a:
\p{L} # Unicode letter, or
\p{Mn} # Unicode accents, or
\p{Pd} # Unicode hyphens, or
\' # single quote, or
\x{2019} # single quote (alternative)
]+ # one or more times
(?: # non-capturing group
$ # either end-of-string
| # or
\s+ # one or more spaces
) # end of group
){2,} # two or more times
$ # end-of-string
Essentially, it is saying to find a word as defined by the character class, then either find one or more spaces or an end of a line. The {2,}
at the end tells it that a minimum of two words must be found for a match to succeed. This ensures the OP's "Hans" example will not match.
本质上,它是说找到一个由字符类定义的单词,然后找到一个或多个空格或一行的结尾。将{2,}
在年底告诉它最低的两个词必须找到一个匹配成功。这可确保 OP 的“Hans”示例不匹配。
Lastly, since I found this question while looking for a similar solution for ruby, here is the regular expression as can be used in Ruby 1.9+
最后,由于我在寻找ruby的类似解决方案时发现了这个问题,这里是可以在 Ruby 1.9+ 中使用的正则表达式
\A(?:[\p{L}\p{Mn}\p{Pd}\'\U+2019]+(?:\Z|\s+)){2,}\Z
The primary changes are using \A and \Z for beginning and end of string (instead of line) and Ruby's Unicode character notation.
主要的变化是使用 \A 和 \Z 作为字符串的开头和结尾(而不是行)和 Ruby 的 Unicode 字符表示法。
回答by Saleh
visit this page Unicode Characters in Regular Expression
访问此页面正则表达式中的 Unicode 字符
回答by mjspier
you can add the allowed special chars to the regex.
您可以将允许的特殊字符添加到正则表达式中。
example:
例子:
[a-zA-Z???ü??ü?é]+
EDIT:
编辑:
not the best solution, but this would give a result if there are at least to words.
不是最好的解决方案,但如果至少有文字,这将给出结果。
[a-zA-Z???ü??ü?é]+\s[a-zA-Z???ü??ü?é]+
回答by ashein
When checking your input string you could
检查您的输入字符串时,您可以
- trim() it to remove leading/trailing whitespaces
- match against [^\w\s] to detect non-word\non-whitespace characters
- match against \s+ to get the number of word separators which equals to number of words + 1.
- trim() 它删除前导/尾随空格
- 匹配 [^\w\s] 以检测非单词\非空白字符
- 与 \s+ 匹配以获得等于单词数 + 1 的单词分隔符的数量。
However I'm not sure that the \w shorthand includes accented characters, but it should fall into "word characters" category.
但是,我不确定 \w 速记是否包含重音字符,但它应该属于“单词字符”类别。
回答by manuel-84
This is the JS regex that I use for fancy names composed with max 3 words (1 to 60 chars), separated by space/single quote/minus sign
这是我用于由最多 3 个单词(1 到 60 个字符)组成的花哨名称的 JS 正则表达式,由空格/单引号/减号分隔
^([a-zA-Z\xC0-\uFFFF]{1,60}[ \-\']{0,1}){1,3}$