java 正则表达式查找“​​姓氏,名字中间名”格式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25801247/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 08:43:37  来源:igfitidea点击:

Regular Expression to find "lastname, firstname middlename" format

javaregex

提问by A Paul

I am trying to find the format "abc, def g" which is a name format "lastname, firstname middlename". I think the best suited method is regex but I do not have any idea in Regex. I tried doing some learning in regex and tried some expression also but no luck. One additional point there may be more than one spaces between the words.

我试图找到格式“abc,def g”,这是一种名称格式“姓氏,名字中间名”。我认为最适合的方法是正则表达式,但我对正则表达式没有任何想法。我尝试在正则表达式中进行一些学习,也尝试了一些表达式,但没有运气。另外一点,单词之间可能有多个空格。

This is what I tried. But this is not working.

这是我尝试过的。但这行不通。

(([A-Z][,]\s?)*([A-Z][a-z]+\s?)+([A-Z]\s?[a-z]*)*)

Need help ! Any idea how I can do this so that only the above expression match.

需要帮忙 !知道如何做到这一点,以便只有上述表达式匹配。

Thanks !

谢谢 !

ANSWER

回答

Finally I am using

最后我正在使用

([A-Za-z]+),\s*([A-Za-z]+)\s*([A-Za-z]+)

Thanks to everyone for the suggestions.

感谢大家的建议。

采纳答案by Andreas Fester

Your sample input is "lastname, firstname middlename"- with that, you can use the following regexp to extract lastname, firstname and middlename (with the addition that there might be multiple white spaces, and that there might be both capital and non-capital letters in the strings - also, all parts are mandatory):

您的示例输入是"lastname, firstname middlename"- 有了这个,您可以使用以下正则表达式来提取姓氏、名字和中间名(另外可能有多个空格,并且字符串中可能有大写和非大写字母 - 也,所有部分都是强制性的):

String input = "Lastname,   firstname   middlename";
String regexp = "([A-Za-z]+),\s+([A-Za-z]+)\s+([A-Za-z]+)";

Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(input);
matcher.find();
System.out.println("Lastname  : " + matcher.group(1));
System.out.println("Firstname : " + matcher.group(2));
System.out.println("Middlename: " + matcher.group(3));

Short summary:

简短的摘要:

([A-Za-z]+)   First capture group - matches one or more letters to extract the last name
,\s+         Capture group is followed by a comma and one or more spaces
([A-Za-z]+)   Second capture group - matches one or more letters to extract the first name
\s+          Capture group is followed by one or more spaces
([A-Za-z]+)   Third capture group - matches one or more letters to extract the middle name

This only works if your names contain latin letters only - probably you should use a more open match for the characters:

这仅适用于您的姓名仅包含拉丁字母的情况 - 您可能应该对字符使用更开放的匹配:

String input = "Müller,   firstname  middlename";
String regexp = "(.+),\s+(.+)\s+(.+)";

This matches any character for lastname, firstname and middlename.

这匹配姓氏、名字和中间名的任何字符。

If the spaces are optional (only the first occurrence can be optional, otherwise we can not distinguish between firstname and middlename), then use *instead of +:

如果空格是可选的(只有第一次出现是可选的,否则我们无法区分名字和中间名),然后使用*代替+

String input = "Müller,firstname  middlename";
String regexp = "(.+),\s*(.+)\s+(.+)";


As @Elliott mentions, there might be other possibilities like using String.split()or String.indexOf()with String.substring()- regular expressions are often more flexible, but harder to maintain, especially for complex expressions.

正如@Elliott 提到的,可能还有其他可能性,例如 usingString.split()String.indexOf()with String.substring()- 正则表达式通常更灵活,但更难维护,尤其是对于复杂表达式。

In either case, implement unit tests with as much different inputs (including invalid ones) as possible so that you can verify that your algorithm is still valid after you modify it.

在任一情况下,使用尽可能多的不同输入(包括无效输入)实施单元测试,以便您可以验证您的算法在修改后仍然有效。

回答by Elliott Frisch

I would try and avoid a complicated regex, I would use String.substring()and indexOf(). That is, something like

我会尽量避免使用复杂的正则表达式,我会使用String.substring()and indexOf()。也就是说,像

String name = "Last, First Middle";
int comma = name.indexOf(',');
int lastSpace = name.lastIndexOf(' ');
String lastName = name.substring(0, comma);
String firstName = name.substring(comma + 2, lastSpace);
String middleName = name.substring(lastSpace + 1);
System.out.printf("first='%s' middle='%s' last='%s'%n", firstName,
            middleName, lastName);

Output is

输出是

first='First' middle='Middle' last='Last'

回答by Breandán Dalton

As an alternative to matching the lastname, firstname middlenamedirectly, you could use String.split and provide a regexp that matches the separators, instead. For instance:

作为lastname, firstname middlename直接匹配的替代方法,您可以使用 String.split 并提供与分隔符匹配的正则表达式。例如:

static String[] lastFirstMiddle(String input){
    String[] result=input.split("[,\s]+");
    System.out.println(Arrays.asList(result));
    return result;
}

I tested this with inputs

我用输入测试了这个

"Müller,   firstname  middlename"
"Müller,firstname  middlename"
 "O'Gara, Ronan Ramón"

Note: this approach fails with surnames that contain spaces, for instance "van der Heuvel", "de Valera", "mac Piarais" or "bin Laden" but then again, OP's original specification does not seem to admit of spaces in the surname (or the other names. I work with a "Mary Kate". That's her first name, not first and middle). There's an interesting page about personal names at http://www.w3.org/International/questions/qa-personal-names

注意:这种方法在姓氏包含空格时失败,例如“van der Heuvel”、“de Valera”、“mac Piarais”或“bin Laden”,但话说回来,OP 的原始规范似乎不承认姓氏中有空格(或其他名字。我和“玛丽凯特”一起工作。那是她的名字,而不是名字和中间名)。在http://www.w3.org/International/questions/qa-personal-names 上有一个关于人名的有趣页面

回答by NeverHopeless

I think this one will also work and a bit shorter than yours:

我认为这个也可以工作,而且比你的短一点:

([A-Z][a-z]*)(?:,\s*)?

Demo

演示

Or you can use split using this regex:

或者您可以使用此正则表达式使用 split:

(,?\s+)

回答by vks

^([a-zA-Z]+)\s*,\s*([a-zA-Z]+)\s+([a-zA-Z]+)$

I think you are looking for this.just grab the groups to get your needs.See demo.

我认为您正在寻找这个。只需抓住组即可满足您的需求。请参阅演示。

http://regex101.com/r/hQ1rP0/6

http://regex101.com/r/hQ1rP0/6