用于人名的 PHP 正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1261338/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP Regex for human names
提问by KdgDev
I've run into a bit of a problem with a Regex I'm using for humans names.
我在用于人名的正则表达式中遇到了一些问题。
$rexName = '/^[a-z' -]$/i';
Suppose a user with the name Jürgen wishes to register? Or B?b? That's pretty commonplace in Europe. Is there a special notation for this?
假设一个名为 Jürgen 的用户希望注册?还是B?b?这在欧洲很常见。对此有特殊的表示法吗?
EDIT:, just threw the Jürgen name against a regex creator, and it splits the word up at the ü letter...
编辑:,只是将 Jürgen 名称扔给正则表达式创建者,然后将单词拆分为 ü 字母...
http://www.txt2re.com/index.php3?s=J%FCrgen+Blalock&submit=Show+Matches
http://www.txt2re.com/index.php3?s=J%FCrgen+Blalock&submit=Show+Matches
EDIT2: Allright, since checking for such specific things is hard, why not use a regex that simply checks for illegal characters?
EDIT2:好吧,既然检查这些特定的东西很困难,为什么不使用一个简单地检查非法字符的正则表达式呢?
$rexSafety = "/^[^<,\"@/{}()*$%?=>:|;#]*$/i";
(now which ones of these can actually be used in any hacking attempt?)
(现在其中哪些实际上可以用于任何黑客尝试?)
For instance. This allows ' and - signs, yet you need a ; to make it work in SQL, and those will be stopped.Any other characters that are commonly used for HTML injection of SQL attacks that I'm missing?
例如。这允许 ' 和 - 符号,但您需要一个 ; 使其在 SQL 中工作,这些将被停止。我缺少的任何其他常用于 SQL 攻击的 HTML 注入的字符?
回答by Pascal MARTIN
I would really say : don't try to validate names : one day or another, your code will meet a name that it thinks is "wrong"... And how do you think one would react when an application tells him "your name is not valid" ?
我真的会说:不要尝试验证名称:总有一天,您的代码会遇到一个它认为“错误”的名称……当应用程序告诉他“您的名字”时,您认为人们会如何反应无效”?
Depending on what you really want to achieve, you might consider using some kind of blacklist / filters, to exclude the "not-names" you thought about : it will maybe let some "bad-names" pass, but, at least, it shouldn't prevent any existing name from accessing your application.
根据您真正想要实现的目标,您可能会考虑使用某种黑名单/过滤器,以排除您想到的“非名称”:它可能会让一些“坏名称”通过,但至少,它不应阻止任何现有名称访问您的应用程序。
Here are a few examples of rules that come to mind :
以下是一些想到的规则示例:
- no number
- no special character, like
"~{()}@^$%?;:/*§£?and probably some others - no more that 3 spaces ?
- none of "admin", "support", "moderator", "test", and a few other obvious non-names that people tend to use when they don't want to type in their real name...
- (but, if they don't want to give you their name, their still won't, even if you forbid them from typing some random letters, they could just use a real name... Which is not their's)
- 没有号码
- 没有特殊字符,像
"~{()}@^$%?;:/*§£?和可能其他一些 - 没有更多的 3 个空格?
- 没有“管理员”、“支持”、“版主”、“测试”和其他一些人们在不想输入真实姓名时倾向于使用的明显非名称......
- (但是,如果他们不想给你他们的名字,他们仍然不会,即使你禁止他们输入一些随机字母,他们也可以使用真实姓名......这不是他们的)
Yes, this is not perfect ; and yes, it will let some non-names pass... But it's probably way better for your application than saying someone "your name is wrong" (yes, I insist ^^ )
是的,这并不完美;是的,它会让一些非名字通过......但对于你的申请来说,这可能比说某人“你的名字错了”要好得多(是的,我坚持 ^^ )
And, to answer a comment you left under one other answer :
而且,要回答您在另一个答案下留下的评论:
I could just forbid the most command characters for SQL injection and XSS attacks,
我可以禁止 SQL 注入和 XSS 攻击的大多数命令字符,
About SQL Injection, you must escape your data before sending those to the database ; and, if you always escape those data (you should !), you don't have to care about what users may input or not : as it is escaped, always, there is no risk for you.
关于 SQL 注入,您必须在将数据发送到数据库之前对其进行转义;并且,如果您总是转义这些数据(您应该这样做!),您就不必关心用户可能输入或不输入的内容:因为它被转义了,所以您始终没有风险。
Same about XSS : as you always escape your data when ouputting it (you should !), there is no risk of injection ;-)
XSS 也一样:因为你总是在输出数据时转义它(你应该!),没有注入的风险;-)
EDIT :if you just use that regex like that, it will not work quite well :
编辑:如果你只是像那样使用那个正则表达式,它就不会很好地工作:
The following code :
以下代码:
$rexSafety = "/^[^<,\"@/{}()*$%?=>:|;#]*$/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will get you at least a warning :
至少会给你一个警告:
Warning: preg_match() [function.preg-match]: Unknown modifier '{'
You must escape at least some of those special chars ; I'll let you dig into PCRE Patternsfor more informations (there is really a lot to know about PCRE / regex ; and I won't be able to explain it all)
您必须至少转义这些特殊字符中的一些;我会让你深入研究PCRE Patterns以获取更多信息(关于 PCRE / regex 真的有很多东西要了解;我无法解释这一切)
If you actually want to check that none of those characters is inside a given piece of data, you might end up with something like that :
如果您确实想检查给定的数据中没有这些字符,您可能会得到类似的结果:
$rexSafety = "/[\^<,\"@\/\{\}\(\)\*$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
(This is a quick and dirty proposition, which has to be refined!)
(这是一个快速而肮脏的命题,必须加以改进!)
This one says "OK" (well, I definitly hope my own name is ok!)
And the same example with some specials chars, like this :
这个说“OK” (好吧,我绝对希望我自己的名字没问题!)
还有一些特殊字符的相同示例,如下所示:
$rexSafety = "/[\^<,\"@\/\{\}\(\)\*$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'ma{rtin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will say "bad name"
会说“坏名字”
But please note I have notfully tested this, and it probably needs more work ! Do not use this on your site unless you tested it very carefully !
但请注意,我还没有完全测试过这个,它可能需要更多的工作!除非您非常仔细地测试过,否则不要在您的网站上使用它!
Also note that a single quote can be helpful when trying to do an SQL Injection... But it is probably a character that is legal in some names... So, just excluding some characters might no be enough ;-)
另请注意,在尝试执行 SQL 注入时,单引号可能会有所帮助...但它可能是某些名称中合法的字符...因此,仅排除某些字符可能还不够;-)
回答by Gumbo
PHP's PCRE implementationsupports Unicode character propertiesthat span a larger set of characters. So you could use a combination of \p{L}(letter characters), \p{P}(punctuation characters) and \p{Zs}(space separator characters):
PHP 的 PCRE 实现支持跨越更大字符集的Unicode 字符属性。因此,您可以使用\p{L}(字母字符)、\p{P}(标点字符)和\p{Zs}(空格分隔符)的组合:
/^[\p{L}\p{P}\p{Zs}]+$/
But there might be characters that are not covered by these character categories while there might be some included that you don't want to be allowed.
但是可能有些字符未包含在这些字符类别中,而其中可能包含一些您不希望被允许的字符。
So I advice you against using regular expressions on a datum with such a vague range of values like a real person's name.
因此,我建议您不要在诸如真人姓名之类的值范围如此模糊的数据上使用正则表达式。
Edit???As you edited your question and now see that you just want to prevent certain code injection attacks: You should better escape those characters rather than rejecting them as a potential attack attempt.
编辑???当您编辑您的问题时,现在看到您只想防止某些代码注入攻击:您应该更好地逃避这些字符,而不是将它们作为潜在的攻击尝试而拒绝。
Use mysql_real_escape_stringor prepared statementsfor SQL queries, htmlspecialcharsfor HTML output and other appropriate functions for other languages.
使用mysql_real_escape_string或准备语句用于 SQL 查询、htmlspecialcharsHTML 输出和其他语言的其他适当功能。
回答by sebasgo
That's a problem with no easy general solution. The thing is that you really can't predict what characters a name could possibly contain. Probably the best solution is to define an negative character mask to exclude some special characters you really don't want to end up in a name.
这是一个没有简单通用解决方案的问题。问题是您真的无法预测名称可能包含哪些字符。可能最好的解决方案是定义一个否定字符掩码以排除一些您真的不希望以名称结尾的特殊字符。
You can do this using:
您可以使用以下方法执行此操作:
$regexp = "/^[^<put unwanted characters here>]+$/
$regexp = "/^[ ^ <把不需要的字符放在这里>]+$/
回答by Jonathon Hill
If you're trying to parse apart a human name in PHP, I recomment Keith Beckman's nameparse.php script.
如果您想在 PHP 中解析人名,我推荐Keith Beckman 的 nameparse.php 脚本。

