php 名称的正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/275160/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 22:10:34  来源:igfitidea点击:

Regex for names

phpregex

提问by Humpton

Just starting to explore the 'wonders' of regex. Being someone who learns from trial and error, I'm really struggling because my trials are throwing up a disproportionate amount of errors... My experiments are in PHP using ereg().

刚刚开始探索正则表达式的“奇迹”。作为一个从反复试验中学习的人,我真的很挣扎,因为我的试验抛出了不成比例的错误......我的实验是在 PHP 中使用 ereg()。

Anyway. I work with first and last names separately but for now using the same regex. So far I have:

反正。我分别使用名字和姓氏,但现在使用相同的正则表达式。到目前为止,我有:

^[A-Z][a-zA-Z]+$  

Any length string that starts with a capital and has only letters (capital or not) for the rest. But where I fall apart is dealing with the special situations that can pretty much occur anywhere.

任何以大写开头的长度字符串,其余的只有字母(大写与否)。但我崩溃的地方是处理几乎可以在任何地方发生的特殊情况。

  • Hyphenated Names (Worthington-Smythe)
  • Names with Apostophies (D'Angelo)
  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.
  • Joint Names (Ben & Jerry)
  • 连字符名称(Worthington-Smythe)
  • 带有 Apostophies 的名字 (D'Angelo)
  • Names with Spaces (Van der Humpton) - 在这个阶段,可能需要也可能不需要的中间大写字母超出了我的兴趣。
  • 联名 (Ben & Jerry)

Maybe there's some other way a name can be that I'm no thinking of, but I suspect if I can get my head around this, I can add to it. I'm pretty sure there will be instances where more than one of these situations comes up in one name.

也许还有其他一些我没有想到的名字,但我怀疑如果我能解决这个问题,我可以添加它。我很确定会有不止一种情况出现在一个名称中的情况。

So, I think the bottom line is to have my regex also accept a space, hyphens, ampersands and apostrophes - but not at the start or end of the name to be technically correct.

所以,我认为最重要的是让我的正则表达式也接受空格、连字符、与号和撇号 - 但不是在名称的开头或结尾,以确保技术上是正确的。

回答by Daan

This regex is perfect for me.

这个正则表达式非常适合我。

^([ \u00c0-\u01ffa-zA-Z'\-])+$

It works fine in php environments using preg_match(), but doesn't work everywhere.

它在使用 preg_match() 的 php 环境中工作正常,但不适用于任何地方。

It matches Jérémie O'Co-norso I think it matches all UTF-8 names.

它匹配Jérémie O'Co-nor所以我认为它匹配所有 UTF-8 名称。

回答by Matthew Scharley

  • Hyphenated Names (Worthington-Smythe)
  • 连字符名称(Worthington-Smythe)

Add a - into the second character class. The easiest way to do that is to add it at the start so that it can't possibly be interpreted as a range modifier (as in a-z).

将 - 添加到第二个字符类中。最简单的方法是在开头添加它,这样它就不可能被解释为范围修饰符(如a-z)。

^[A-Z][-a-zA-Z]+$
  • Names with Apostophies (D'Angelo)
  • 带有 Apostophies 的名字 (D'Angelo)

A naive way of doing this would be as above, giving:

这样做的一种天真的方法如上所述,给出:

^[A-Z][-'a-zA-Z]+$

Don't forget you may need to escape it inside the string! A 'better' way, given your example might be:

不要忘记你可能需要在字符串中转义它!考虑到您的示例,“更好”的方法可能是:

^[A-Z]'?[-a-zA-Z]+$

Which will allow a possible single apostrophe in the second position.

这将允许在第二个位置有一个可能的单撇号。

  • Names with Spaces (Van der Humpton) - capitals in the middle which may or may not be required is way beyond my interest at this stage.
  • Names with Spaces (Van der Humpton) - 在这个阶段,可能需要也可能不需要的中间大写字母超出了我的兴趣。

Here I'd be tempted to just do our naive way again:

在这里,我很想再次以天真的方式行事:

^[A-Z]'?[- a-zA-Z]+$

A potentially better way might be:

一个可能更好的方法可能是:

^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$

Which looks for extra words at the end. This probably isn't a good idea if you're trying to match names in a body of extra text, but then again, the original wouldn't have done that well either.

最后查找额外的单词。如果您试图匹配额外文本正文中的名称,这可能不是一个好主意,但话说回来,原始文本也不会做得那么好。

  • Joint Names (Ben & Jerry)
  • 联名 (Ben & Jerry)

At this point you're not looking at single names anymore?

在这一点上,你不再看单一的名字了吗?

Anyway, as you can see, regexes have a habit of growing very quickly...

不管怎样,正如你所看到的,正则表达式有一个增长非常快的习惯......

回答by Taher Ahmed

THE BEST REGEX EXPRESSIONS FOR NAMES:

名称的最佳正则表达式:

  • I will use the term special characterto refer to the following three characters:
    1. Dash -
    2. Hyphen '
    3. Dot .
  • Spaces and special characters can not appear twice in a row (e.g.: --or '.or ..)
  • Trimmed (No spaces before or after)
  • You're welcome ;)
  • 我将使用术语特殊字符来指代以下三个字符:
    1. 破折号-
    2. 连字符'
  • 空格和特殊字符不能连续出现两次(例如:--'...
  • 修剪(前后无空格)
  • 别客气 ;)


Mandatory single name, WITHOUT spaces, WITHOUT special characters:

强制性的单一名称,没有空格,没有特殊字符:

^([A-Za-z])+$
  • Sierrais valid, Hyman Alexanderis invalid (has a space), O'Neilis invalid (has a special character)
  • Sierra有效,Hyman Alexander无效(有空格),O'Neil无效(有特殊字符)


Mandatory single name, WITHOUT spaces, WITHspecial characters:

强制性的单一名称,没有空格,特殊字符:

^[A-Za-z]+(((\'|\-|\.)?([A-Za-z])+))?$
  • Sierrais valid, O'Neilis valid, Hyman Alexanderis invalid (has a space)
  • Sierra有效,O'Neil有效,Hyman Alexander无效(有空格)


Mandatory single name, optional additional names, WITHspaces, WITH special characters:

强制单个名称,可选附加名称WITH空格,WITH 特殊字符:

^[A-Za-z]+((\s)?((\'|\-|\.)?([A-Za-z])+))*$
  • Hyman Alexanderis valid, Sierra O'Neilis valid
  • Hyman Alexander有效,Sierra O'Neil有效


Mandatory single name, optional additional names, WITHspaces, WITHOUTspecial characters:

强制性一个名字,可选的附加名空格,特殊字符:

^[A-Za-z]+((\s)?([A-Za-z])+)*$
  • Hyman Alexanderis valid, Sierra O'Neilis invalid (has a special character)
  • Hyman Alexander有效,Sierra O'Neil无效(有特殊字符)


SPECIAL CASE

特例

Many modern smart devices add spaces at the end of each word, so in my applications I allow unlimited number of spaces before and after the string, then I trim it in the code behind. So I use the following:

许多现代智能设备在每个单词的末尾添加空格,因此在我的应用程序中,我允许字符串前后有无限数量的空格,然后在后面的代码中修剪它。所以我使用以下内容:

Mandatory single name + optional additional names + spaces + special characters:

必填的单一名称 + 可选的附加名称 + 空格 + 特殊字符:

^(\s)*[A-Za-z]+((\s)?((\'|\-|\.)?([A-Za-z])+))*(\s)*$


Add your own special characters

添加您自己的特殊字符

If you wish to add your own special characters, let's say an underscore _this is the group you need to update:

如果您想添加自己的特殊字符,假设下划线_这是您需要更新的组:

(\'|\-|\.)

To

(\'|\-|\.|\_)

PS: If you have questions comment here and I will receive an email and respond ;)

PS:如果你有问题在这里评论,我会收到一封电子邮件并回复;)

回答by eyelidlessness

While I agree with the answers saying you basically can't do this with regex, I will point out that some of the objections (internationalized characters) can be resolved by using UTF strings and the \p{L}character class (matches a unicode "letter").

虽然我同意回答说你基本上不能用正则表达式做到这一点,但我会指出一些反对意见(国际化字符)可以通过使用 UTF 字符串和\p{L}字符类(匹配一个 unicode“字母”)来解决。

回答by VirtuosiMedia

I don't really have a whole lot to add to a regex that takes care of names because there are already some good suggestions here, but if you want a few resources for learning more about regular expressions, you should check out:

我真的没有很多东西要添加到处理名称的正则表达式中,因为这里已经有一些很好的建议,但是如果您想要一些资源来了解有关正则表达式的更多信息,您应该查看:

回答by Domchi

I second the 'give up' advice. Even if you consider numbers, hyphens, apostrophes and such, something like [a-zA-Z] still wouldn't catch international names (for example, those having ?????, or Cyrillic alphabet, or Chinese characters...)

我支持“放弃”的建议。即使您考虑数字、连字符、撇号等,[a-zA-Z] 之类的内容仍然无法识别国际名称(例如,具有 ?????、西里尔字母或中文字符的名称... )

But... why are you even trying to verify names? What errors are you trying to catch? Don't you think people know to write their name better than you? ;) Seriously, the only thing you can do by trying to verify names is to irritate people with unusual names.

但是……你为什么还要验证名字?你想捕捉什么错误?你不认为人们比你更懂得写自己的名字吗?;) 说真的,尝试验证姓名唯一能做的就是用不寻常的名字激怒人们。

回答by PhiLho

Basically, I agree with Paul... You will always find exceptions, like di Caprio, DeVil, or such.

基本上,我同意保罗的看法……你总会发现例外,比如迪卡普里奥德维尔等等。

Remarks on your message: in PHP, ereg is generally seen as obsolete (slow, incomplete) in favor of preg (PCRE regexes).
And you should try some regex tester, like the powerful Regex Coach: they are great to test quickly REs against arbitrary strings.

对您的消息的评论:在 PHP 中,ereg 通常被视为过时(缓慢、不完整),而支持 preg(PCRE 正则表达式)。
您应该尝试一些正则表达式测试器,例如强大的Regex Coach:它们非常适合针对任意字符串快速测试 RE。

If you really need to solve your problem and aren't satisfied with above answers, just ask, I will give a go.

如果您确实需要解决您的问题并且对上述答案不满意,请提问,我会试一试。

回答by uke

This worked for me:

这对我有用:

 +[a-z]{2,3} +[a-z]*|[\w'-]*
 +[a-z]{2,3} +[a-z]*|[\w'-]*

This regex will correctly match names such as the following:

此正则表达式将正确匹配如下名称:

jean-claude van damme

让-克洛德·范·达姆

nadine arroyo-rodriquez

纳丁·阿罗约-罗德里克斯

wayne la pierre

韦恩拉皮埃尔

beverly d'angelo

贝弗莉·丹吉洛

billy-bob thornton

比利鲍伯·桑顿

tito puente

铁托普恩特

susan del rio

苏珊·德尔里奥

It will group "van damme", "arroyo-rodriquez" "d'angelo", "billy-bob", etc. as well as the singular names like "wayne".

它将对“van damme”、“arroyo-rodriquez”、“d'angelo”、“billy-bob”等以及单数名称(如“wayne”)进行分组。

Note that it does not test that the grouped stuff is actually a valid name. Like others said, you'll need a dictionary for that. Also, it will group numbers, so if that's an issue you may want to modify the regex.

请注意,它不会测试分组的内容是否实际上是有效名称。就像其他人说的那样,你需要一本字典。此外,它会将数字分组,因此如果这是一个问题,您可能需要修改正则表达式。

I wrote this to parse names for a MapReduce application. All I wanted was to extract words from the name field, grouping together the del foo and la bar and billy-bobs into one word to make the key-value pair generation more accurate.

我写这个是为了解析 MapReduce 应用程序的名称。我想要的只是从 name 字段中提取单词,将 del foo 和 la bar 以及 billy-bob 组合成一个单词,以使键值对生成更准确。

回答by tk_

Check this out:

看一下这个:

^(([A-Za-z]+[,.]?[ ]?|[a-z]+['-]?)+)$

regex

正则表达式

You can test it here: https://regex101.com/r/mS9gD7/46

你可以在这里测试:https: //regex101.com/r/mS9gD7/46

回答by victorash

/([\u00c0-\u01ffa-zA-Z'\-]+[ ]?[*]?[\u00c0-\u01ffa-zA-Z'\-]*)+/;

/([\u00c0-\u01ffa-zA-Z'\-]+[ ]?[*]?[\u00c0-\u01ffa-zA-Z'\-]*)+/;

Try this . You can also force to start with char using ^,and end with char using $

尝试这个 。您还可以强制使用 ^ 以 char 开头,并使用 $ 以 char 结尾