string 将人名解析为其组成部分的简单方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/103422/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:16:45  来源:igfitidea点击:

Simple way to parse a person's name into its component parts?

stringparsingdatabase-design

提问by Keithius

A lot of contact management programs do this - you type in a name (e.g., "John W. Smith") and it automatically breaks it up internally into:

许多联系人管理程序都这样做 - 您输入一个名字(例如,“John W. Smith”),它会在内部自动将其分解为:

First name:John
Middle name:W.
Last name:Smith

名字:John
中间名:W.
姓氏:Smith

Likewise, it figures out things like "Mrs. Jane W. Smith" and "Dr. John Doe, Jr." correctly as well (assuming you allow for fields like "prefix" and "suffix" in names).

同样,它会计算出诸如“Jane W. Smith 夫人”和“John Doe, Jr. 博士”之类的内容。也正确(假设您允许名称中的“前缀”和“后缀”等字段)。

I assume this is a fairly common things that people would want to do... so the question is... how would you do it? Is there a simplealgorithm for this? Maybe a regular expression?

我认为这是人们想要做的相当普遍的事情......所以问题是......你会怎么做?有没有一个简单的算法?也许是正则表达式?

I'm after a .NET solution, but I'm not picky.

我追求 .NET 解决方案,但我并不挑剔。

Update:I appreciate that there is no simple solution for this that covers ALL edge cases and cultures... but let's say for the sake of argument that you need the name in pieces (filling out forms - as in, say, tax or other government forms - is one case where you are bound to enter the name into fixed fields, whether you like it or not), but you don't necessarily want to force the user to enter their name into discrete fields (less typing = easier for novice users).

更新:我很欣赏没有简单的解决方案可以涵盖所有边缘情况和文化......政府表格 - 是一种情况,您必须将姓名输入固定字段,无论您喜欢与否),但您不一定要强制用户将其姓名输入到离散字段中(较少输入 = 更容易新手用户)。

You'd want to have the program "guess" (as best it can) on what's first, middle, last, etc. If you can, look at how Microsoft Outlook does this for contacts - it lets you type in the name, but if you need to clarify, there's an extra little window you can open. I'd do the same thing - give the user the window in case they want to enter the name in discrete pieces - but allow for entering the name in one box and doing a "best guess" that covers mostcommon names.

您希望程序“猜测”(尽其所能)第一个、中间的、最后一个等。如果可以,请查看 Microsoft Outlook 如何为联系人执行此操作 - 它允许您输入名称,但是如果您需要澄清,您可以打开一个额外的小窗口。我会做同样的事情 - 为用户提供窗口,以防他们想以离散的形式输入名称 - 但允许在一个框中输入名称并进行涵盖常见名称的“最佳猜测” 。

采纳答案by Stephen Deken

There is no simple solution for this. Name construction varies from culture to culture, and even in the English-speaking world there's prefixes and suffixes that aren't necessarily part of the name.

对此没有简单的解决方案。 名称结构因文化而异,即使在英语世界中,前缀和后缀也不一定是名称的一部分。

A basic approach is to look for honorifics at the beginning of the string (e.g., "Hon. John Doe") and numbers or some other strings at the end (e.g., "John Doe IV", "John Doe Jr."), but really all you can do is apply a set of heuristics and hope for the best.

一个基本的方法是在字符串的开头查找敬语(例如,“Hon. John Doe”),在结尾查找数字或其他一些字符串(例如,“John Doe IV”、“John Doe Jr.”),但实际上,您所能做的就是应用一组启发式方法并希望获得最佳效果。

It might be useful to find a list of unprocessed names and test your algorithm against it. I don't know that there's anything prepackaged out there, though.

查找未处理名称的列表并针对它测试您的算法可能很有用。不过,我不知道那里有预先包装好的东西。

回答by shadit

If you mustdo this parsing, I'm sure you'll get lots of good suggestions here.

如果您必须进行这种解析,我相信您会在这里得到很多好的建议。

My suggestion is - don't do this parsing.

我的建议是 -不要做这个解析

Instead, create your input fields so that the information is already separated out. Have separate fields for title, first name, middle initial, last name, suffix, etc.

相反,创建您的输入字段,以便信息已经被分离出来。为标题、名字、中间名首字母、姓氏、后缀等设置单独的字段。

回答by eselk

I know this is old and might be answers somewhere I couldn't find already, but since I couldn't find anything that works for me, this is what I came up with which I think works a lot like Google Contacts and Microsoft Outlook. It doesn't handle edge cases well, but for a good CRM type app, the user can always be asked to resolve those (in my app I actually have separate fields all the time, but I need this for data import from another app that only has one field):

我知道这是旧的,可​​能是我找不到的地方的答案,但由于我找不到任何适合我的东西,这就是我想出的,我认为它很像 Google 通讯录和 Microsoft Outlook。它不能很好地处理边缘情况,但对于一个好的 CRM 类型的应用程序,用户总是可以被要求解决这些问题(在我的应用程序中,我实际上一直都有单独的字段,但我需要这个来从另一个应用程序导入数据只有一个字段):

    public static void ParseName(this string s, out string prefix, out string first, out string middle, out string last, out string suffix)
    {
        prefix = "";
        first = "";
        middle = "";
        last = "";
        suffix = "";

        // Split on period, commas or spaces, but don't remove from results.
        List<string> parts = Regex.Split(s, @"(?<=[., ])").ToList();

        // Remove any empty parts
        for (int x = parts.Count - 1; x >= 0; x--)
            if (parts[x].Trim() == "")
                parts.RemoveAt(x);

        if (parts.Count > 0)
        {
            // Might want to add more to this list
            string[] prefixes = { "mr", "mrs", "ms", "dr", "miss", "sir", "madam", "mayor", "president" };

            // If first part is a prefix, set prefix and remove part
            string normalizedPart = parts.First().Replace(".", "").Replace(",", "").Trim().ToLower();
            if (prefixes.Contains(normalizedPart))
            {
                prefix = parts[0].Trim();
                parts.RemoveAt(0);
            }
        }

        if (parts.Count > 0)
        {
            // Might want to add more to this list, or use code/regex for roman-numeral detection
            string[] suffixes = { "jr", "sr", "i", "ii", "iii", "iv", "v", "vi", "vii", "viii", "ix", "x", "xi", "xii", "xiii", "xiv", "xv" };

            // If last part is a suffix, set suffix and remove part
            string normalizedPart = parts.Last().Replace(".", "").Replace(",", "").Trim().ToLower();
            if (suffixes.Contains(normalizedPart))
            {
                suffix = parts.Last().Replace(",", "").Trim();
                parts.RemoveAt(parts.Count - 1);
            }
        }

        // Done, if no more parts
        if (parts.Count == 0)
            return;

        // If only one part left...
        if (parts.Count == 1)
        {
            // If no prefix, assume first name, otherwise last
            // i.e.- "Dr Jones", "Ms Jones" -- likely to be last
            if(prefix == "")
                first = parts.First().Replace(",", "").Trim();
            else
                last = parts.First().Replace(",", "").Trim();
        }

        // If first part ends with a comma, assume format:
        //   Last, First [...First...]
        else if (parts.First().EndsWith(","))
        {
            last = parts.First().Replace(",", "").Trim();
            for (int x = 1; x < parts.Count; x++)
                first += parts[x].Replace(",", "").Trim() + " ";
            first = first.Trim();
        }

        // Otherwise assume format:
        // First [...Middle...] Last

        else
        {
            first = parts.First().Replace(",", "").Trim();
            last = parts.Last().Replace(",", "").Trim();
            for (int x = 1; x < parts.Count - 1; x++)
                middle += parts[x].Replace(",", "").Trim() + " ";
            middle = middle.Trim();
        }
    }

Sorry that the code is long and ugly, I haven't gotten around to cleaning it up. It is a C# extension, so you would use it like:

抱歉,代码又长又丑,我还没来得及清理它。它是一个 C# 扩展,所以你可以像这样使用它:

string name = "Miss Jessica Dark-Angel Alba";
string prefix, first, middle, last, suffix;
name.ParseName(out prefix, out first, out middle, out last, out suffix);

回答by Vincent McNabb

You probably don't need to do anything fancy really. Something like this should work.

你可能真的不需要做任何花哨的事情。像这样的事情应该有效。

    Name = Name.Trim();

    arrNames = Name.Split(' ');

    if (arrNames.Length > 0) {
        GivenName = arrNames[0];
    }
    if (arrNames.Length > 1) {
        FamilyName = arrNames[arrNames.Length - 1];
    }
    if (arrNames.Length > 2) {
        MiddleName = string.Join(" ", arrNames, 1, arrNames.Length - 2);
    }

You may also want to check for titles first.

您可能还想先检查标题。

回答by Corey Trager

I had to do this. Actually, something much harder than this, because sometimes the "name" would be "Smith, John" or "Smith John" instead of "John Smith", or not a person's name at all but instead a name of a company. And it had to do it automatically with no opportunity for the user to correct it.

我不得不这样做。实际上,还有比这更难的事情,因为有时“名字”会是“Smith, John”或“Smith John”而不是“John Smith”,或者根本不是一个人的名字,而是一个公司的名字。它必须自动执行,用户没有机会纠正它。

What I ended up doing was coming up with a finite list of patterns that the name could be in, like:
Last, First Middle-Initial
First Last
First Middle-Initial Last
Last, First Middle
First Middle Last
First Last

我最终做的是想出一个有限的模式列表,名称可以包含在其中,例如:
Last, First Middle-Initial
First Last
First Middle-Initial Last
Last, First Middle
First Middle Last
First Last

Throw in your Mr's, Jr's, there too. Let's say you end up with a dozen or so patterns.

把你的小先生也扔进去。假设您最终得到了十几种模式。

My application had a dictionary of common first name, common last names (you can find these on the web), common titles, common suffixes (jr, sr, md) and using that would be able to make real good guesses about the patterns. I'm not that smart, my logic wasn't that fancy, and yet still, it wasn't that hard to create some logic that guessed right more than 99% of the time.

我的应用程序有一个包含常用名字、常用姓氏(您可以在网络上找到这些)、常用标题、常用后缀(jr、sr、md)的字典,使用它可以对模式做出真正好的猜测。我没有那么聪明,我的逻辑也没有那么花哨,但是,创建一些猜对率超过 99% 的逻辑并不难。

回答by Thelema

Understanding this is a bad idea, I wrote this regex in perl - here's what worked the best for me. I had already filtered out company names.
Output in vcard format: (hon_prefix, given_name, additional_name, family_name, hon. suffix)

理解这是一个坏主意,我用 perl 编写了这个正则表达式 - 这是最适合我的方法。我已经过滤掉了公司名称。
以 vcard 格式输出:(hon_prefix, given_name, additional_name, family_name, hon.suffix)

/^ \s*
    (?:((?:Dr.)|(?:Mr.)|(?:Mr?s.)|(?:Miss)|(?:2nd\sLt.)|(?:Sen\.?))\s+)? # prefix
    ((?:\w+)|(?:\w\.)) # first name
(?: \s+ ((?:\w\.?)|(?:\w\w+)) )?  # middle initial
(?: \s+ ((?:[OD]['']\s?)?[-\w]+))    # last name
(?: ,? \s+ ( (?:[JS]r\.?) | (?:Esq\.?) | (?: (?:M)|(?:Ph)|(?:Ed) \.?\s*D\.?) | 
         (?: R\.?N\.?) | (?: I+) )  )? # suffix
\s* $/x

notes:

笔记:

  • doesn't handle IV, V, VI
  • Hard-coded lists of prefixes, suffixes. evolved from dataset of ~2K names
  • Doesn't handle multiple suffixes (eg. MD, PhD)
  • Designed for American names - will not work properly on romanized Japanese names or other naming systems
  • 不处理 IV、V、VI
  • 硬编码的前缀、后缀列表。从大约 2K 个名字的数据集进化而来
  • 不处理多个后缀(例如 MD、PhD)
  • 专为美国名字设计 - 不能在罗马化的日本名字或其他命名系统上正常工作

回答by Keithius

I appreciate that this is hard to do right- but if you provide the user a way to edit the results (say, a pop-up window to edit the name if it didn't guess right) and still guess "right" for most cases... of course it's the guessing that's tough.

我很欣赏这很难做到正确- 但是如果您为用户提供了一种编辑结果的方法(例如,一个弹出窗口来编辑名称,如果它没有猜对的话)并且大多数情况下仍然猜测“正确”案例......当然这是很难的猜测。

It's easy to say "don't do it" when looking at the problem theoretically, but sometimes circumstances dictate otherwise. Having fields for all the parts of a name (title, first, middle, last, suffix, just to name a few) can take up a lot of screen real estate - and combined with the problem of the address (a topic for another day) can really clutter up what shouldbe a clean, simple UI.

从理论上看问题时,很容易说“不要这样做”,但有时情况并非如此。为姓名的所有部分(标题,名字,中间,姓氏,后缀,仅举几例)设置字段可能会占用大量屏幕空间 - 并结合地址问题(另一天的主题) ) 真的会把应该是干净、简单的 UI弄得乱七八糟。

I guess the answer should be "don't do it unless you absolutely have to, and if you do, keep it simple (some methods for this have been posted here) and provide the user the means to edit the results if needed."

我想答案应该是“除非绝对必要,否则不要这样做,如果您这样做,请保持简单(此处已发布一些用于此的方法)并在需要时为用户提供编辑结果的方法。”

回答by PKD

Having come to this conversation 10 years late, but still looking for an elegant solution, I read through this thread, and decided to take the path @eselk took, but expand on it:

晚了 10 年才开始这次谈话,但仍在寻找一个优雅的解决方案,我通读了这个帖子,并决定走 @eselk 的道路,但继续扩展它:

public class FullNameDTO
{
    public string Prefix     { get; set; }
    public string FirstName  { get; set; }
    public string MiddleName { get; set; }
    public string LastName   { get; set; }
    public string Suffix     { get; set; }
}

public static class FullName
{
    public static FullNameDTO GetFullNameDto(string fullName)
    {
        string[] knownPrefixes    = { "mr", "mrs", "ms", "miss", "dr", "sir", "madam", "master", "fr", "rev", "atty", "hon", "prof", "pres", "vp", "gov", "ofc" };
        string[] knownSuffixes    = { "jr", "sr", "ii", "iii", "iv", "v", "esq", "cpa", "dc", "dds", "vm", "jd", "md", "phd" };
        string[] lastNamePrefixes = { "da", "de", "del", "dos", "el", "la", "st", "van", "von" };

        var prefix     = string.Empty;
        var firstName  = string.Empty;
        var middleName = string.Empty;
        var lastName   = string.Empty;
        var suffix     = string.Empty;

        var fullNameDto = new FullNameDTO
        {
            Prefix     = prefix,
            FirstName  = firstName,
            MiddleName = middleName,
            LastName   = lastName,
            Suffix     = suffix
        };

        // Split on period, commas or spaces, but don't remove from results.
        var namePartsList = Regex.Split(fullName, "(?<=[., ])").ToList();

        #region Clean out the crap.
        for (var x = namePartsList.Count - 1; x >= 0; x--)
        {
            if (namePartsList[x].Trim() == string.Empty)
            {
                namePartsList.RemoveAt(x);
            }
        }
        #endregion

        #region Trim all of the parts in the list
        for (var x = namePartsList.Count - 1; x >= 0; x--)
        {
            namePartsList[x] = namePartsList[x].Trim();
        }
        #endregion

        #region Only one Name Part - assume a name like "Cher"
        if (namePartsList.Count == 1)
        {
            firstName = namePartsList.First().Replace(",", string.Empty).Trim();
            fullNameDto.FirstName = firstName;

            namePartsList.RemoveAt(0);
        }
        #endregion

        #region Get the Prefix
        if (namePartsList.Count > 0)
        {
            //If we find a prefix, save it and drop it from the overall parts
            var cleanedPart = namePartsList.First()
                                           .Replace(".", string.Empty)
                                           .Replace(",", string.Empty)
                                           .Trim()
                                           .ToLower();

            if (knownPrefixes.Contains(cleanedPart))
            {
                prefix = namePartsList[0].Trim();
                fullNameDto.Prefix = prefix;

                namePartsList.RemoveAt(0);
            }
        }
        #endregion

        #region Get the Suffix
        if (namePartsList.Count > 0)
        {
            #region Scan the full parts list for a potential Suffix
            foreach (var namePart in namePartsList)
            {
                var cleanedPart = namePart.Replace(",", string.Empty)
                                          .Trim()
                                          .ToLower();

                if (!knownSuffixes.Contains(cleanedPart.Replace(".", string.Empty))) { continue; }

                if (namePart.ToLower() == "jr" && namePart != namePartsList.Last()) { continue; }

                suffix             = namePart.Replace(",", string.Empty).Trim();
                fullNameDto.Suffix = suffix;

                namePartsList.Remove(namePart);
                break;
            }
            #endregion
        }
        #endregion

        //If, strangely, there's nothing else in the overall parts... we're done here.
        if (namePartsList.Count == 0) { return fullNameDto; }

        #region Prefix/Suffix taken care of - only one "part" left.
        if (namePartsList.Count == 1)
        {
            //If no prefix, assume first name (e.g. "Cher"), otherwise last (e.g. "Dr Jones", "Ms Jones")
            if (prefix == string.Empty)
            {
                firstName = namePartsList.First().Replace(",", string.Empty).Trim();
                fullNameDto.FirstName = firstName;
            }
            else
            {
                lastName = namePartsList.First().Replace(",", string.Empty).Trim();
                fullNameDto.LastName = lastName;
            }
        }
        #endregion

        #region First part ends with a comma
        else if (namePartsList.First().EndsWith(",") || (namePartsList.Count >= 3 && namePartsList.Any(n => n == ",") && namePartsList.Last() != ","))
        {
            #region Assume format: "Last, First"
            if (namePartsList.First().EndsWith(","))
            {
                lastName             = namePartsList.First().Replace(",", string.Empty).Trim();
                fullNameDto.LastName = lastName;
                namePartsList.Remove(namePartsList.First());

                firstName             = namePartsList.First();
                fullNameDto.FirstName = firstName;
                namePartsList.Remove(namePartsList.First());

                if (!namePartsList.Any()) { return fullNameDto; }

                foreach (var namePart in namePartsList)
                {
                    middleName += namePart.Trim() + " ";
                }
                fullNameDto.MiddleName = middleName;

                return fullNameDto;
            }
            #endregion

            #region Assume strange scenario like "Last Suffix, First"
            var indexOfComma = namePartsList.IndexOf(",");

            #region Last Name is the first thing in the list
            if (indexOfComma == 1)
            {
                namePartsList.Remove(namePartsList[indexOfComma]);

                lastName             = namePartsList.First().Replace(",", string.Empty).Trim();
                fullNameDto.LastName = lastName;
                namePartsList.Remove(namePartsList.First());

                firstName             = namePartsList.First();
                fullNameDto.FirstName = firstName;
                namePartsList.Remove(namePartsList.First());

                if (!namePartsList.Any()) { return fullNameDto; }

                foreach (var namePart in namePartsList)
                {
                    middleName += namePart.Trim() + " ";
                }
                fullNameDto.MiddleName = middleName;

                return fullNameDto;
            }
            #endregion

            #region Last Name might be a prefixed one, like "da Vinci"
            if (indexOfComma == 2)
            {
                var possibleLastPrefix = namePartsList.First()
                                                      .Replace(".", string.Empty)
                                                      .Replace(",", string.Empty)
                                                      .Trim()
                                                      .ToLower();

                if (lastNamePrefixes.Contains(possibleLastPrefix))
                {
                    namePartsList.Remove(namePartsList[indexOfComma]);

                    var lastPrefix = namePartsList.First().Trim();
                    namePartsList.Remove(lastPrefix);

                    lastName             = $"{lastPrefix} {namePartsList.First().Replace(",", string.Empty).Trim()}";
                    fullNameDto.LastName = lastName;
                    namePartsList.Remove(namePartsList.First());
                }
                else
                {
                    lastName = namePartsList.First().Replace(",", string.Empty).Trim();
                    namePartsList.Remove(namePartsList.First());

                    lastName = lastName + " " + namePartsList.First().Replace(",", string.Empty).Trim();
                    namePartsList.Remove(namePartsList.First());

                    fullNameDto.LastName = lastName;
                }

                namePartsList.Remove(",");

                firstName             = namePartsList.First();
                fullNameDto.FirstName = firstName;
                namePartsList.Remove(namePartsList.First());

                if (!namePartsList.Any()) { return fullNameDto; }

                foreach (var namePart in namePartsList)
                {
                    middleName += namePart.Trim() + " ";
                }
                fullNameDto.MiddleName = middleName;

                return fullNameDto;
            }
            #endregion
            #endregion
        }
        #endregion

        #region Everything else
        else
        {
            if (namePartsList.Count >= 3)
            {
                firstName = namePartsList.First().Replace(",", string.Empty).Trim();
                fullNameDto.FirstName = firstName;
                namePartsList.RemoveAt(0);

                //Check for possible last name prefix

                var possibleLastPrefix = namePartsList[namePartsList.Count - 2]
                                               .Replace(".", string.Empty)
                                               .Replace(",", string.Empty)
                                               .Trim()
                                               .ToLower();

                if (lastNamePrefixes.Contains(possibleLastPrefix))
                {
                    lastName = $"{namePartsList[namePartsList.Count - 2].Trim()} {namePartsList[namePartsList.Count -1].Replace(",", string.Empty).Trim()}";
                    fullNameDto.LastName = lastName;

                    namePartsList.RemoveAt(namePartsList.Count - 1);
                    namePartsList.RemoveAt(namePartsList.Count - 1);
                }
                else
                {
                    lastName = namePartsList.Last().Replace(",", string.Empty).Trim();
                    fullNameDto.LastName = lastName;

                    namePartsList.RemoveAt(namePartsList.Count - 1);
                }

                middleName = string.Join(" ", namePartsList).Trim();
                fullNameDto.MiddleName = middleName;

                namePartsList.Clear();
            }
            else
            {
                if (namePartsList.Count == 1)
                {
                    lastName = namePartsList.First().Replace(",", string.Empty).Trim();
                    fullNameDto.LastName = lastName;

                    namePartsList.RemoveAt(0);
                }
                else
                {
                    var possibleLastPrefix = namePartsList.First()
                                             .Replace(".", string.Empty)
                                             .Replace(",", string.Empty)
                                             .Trim()
                                             .ToLower();

                    if (lastNamePrefixes.Contains(possibleLastPrefix))
                    {
                        lastName = $"{namePartsList.First().Replace(",", string.Empty).Trim()} {namePartsList.Last().Replace(",", string.Empty).Trim()}";
                        fullNameDto.LastName = lastName;

                        namePartsList.Clear();
                    }
                    else
                    {
                        firstName = namePartsList.First().Replace(",", string.Empty).Trim();
                        fullNameDto.FirstName = firstName;

                        namePartsList.RemoveAt(0);

                        lastName = namePartsList.Last().Replace(",", string.Empty).Trim();
                        fullNameDto.LastName = lastName;

                        namePartsList.Clear();
                    }
                }
            }
        }
        #endregion

        namePartsList.Clear();

        fullNameDto.Prefix     = prefix;
        fullNameDto.FirstName  = firstName;
        fullNameDto.MiddleName = middleName;
        fullNameDto.LastName   = lastName;
        fullNameDto.Suffix     = suffix;

        return fullNameDto;
    }
}

This will handle quite a few different scenarios, and I've written out (thus far) over 50 different unit tests against it to make sure.

这将处理很多不同的场景,我已经写出(到目前为止)超过 50 个不同的单元测试来确保它。

Props again to @eselk for his ideas that helped in my writing an expanded version of his excellent solution. And, as a bonus, this also handles the strange instance of a person named "JR".

再次向@eselk 表示支持,他的想法帮助我编写了他出色解决方案的扩展版本。而且,作为奖励,这也处理了一个名为“JR”的人的奇怪实例。

回答by Aeryes

The real solution here does not answer the question. The portent of the information must be observed. A name is not just a name; it is how we are known.

这里的真正解决方案并没有回答这个问题。必须遵守信息的预兆。名字不仅仅是一个名字;我们就是这样被认识的。

The problem here is not knowing exactly what parts are labled what, and what they are used for. Honorable prefixes should be granted only in personal corrospondences; Doctor is an honorific that is derived from a title. All information about a person is relavent to their identity, it is just determining what is relavent information. You need a first and last name for reasons of administration; phone number, email addresses, land descriptions and mailing addresses; all to the portent of identity, knowing who you are dealing with.

这里的问题是不确切知道哪些部件标记了什么,以及它们的用途。荣誉前缀只应在个人信函中授予;博士是一种尊称,源于头衔。关于一个人的所有信息都与其身份有关,它只是确定什么是相关信息。出于管理原因,您需要名字和姓氏;电话号码、电子邮件地址、土地描述和邮寄地址;所有这些都是身份的象征,知道你在和谁打交道。

The real problem here is that the person gets lost in the administration. All of a sudden, after only entering their personal information into a form and submitting it to an arbitrary program for processing, they become afforded all sorts of honorifics and pleasentries spewed out by a prefabricated template. This is wrong; honorable Sir or Madam, if personal interest is shown toward the reason of corrospondence, then a letter should never be written from a template. Personal corrospondance requires a little knowledge about the recipient. Male or female, went to school to be a doctor or judge, what culture in which they were raised.

这里真正的问题是这个人在管理中迷失了方向。一时之间,他们只要将个人信息填入表格,提交给任意程序进行处理,就得到了预制模板喷出的各种敬语和取悦。这是错误的;尊敬的先生或女士,如果对信函的原因表现出个人兴趣,那么绝不应根据模板写一封信。个人信函需要对收件人有一些了解。男性或女性,去学校当医生或法官,他们是在什么文化中长大的。

In other cultures, a name is made up from a variable number of characters. The person's name to us can only be interpretted as a string of numbers where the spaces are actually determined by character width instead of the space character. Honorifics in these cases are instead one or many characters prefixing and suffixing the actual name. The polite thing to do is use the string you are given, if you know the honorific then by all means use it, but this again implies some sort of personal knowledge of the recipient. Calling Sensei anything other than Sensei is wrong. Not in the sense of a logic error, but in that you have just insulted your caller, and now you should find a template that helps you apologize.

在其他文化中,名称由可变数量的字符组成。人名对我们来说只能解释为一串数字,其中空格实际上是由字符宽度而不是空格字符决定的。在这些情况下,敬语是一个或多个字符作为实际名称的前缀和后缀。礼貌的做法是使用给定的字符串,如果您知道敬语,那么一定要使用它,但这又暗示了接收者的某种个人知识。称 Sensei 为 Sensei 以外的任何东西都是错误的。不是在逻辑错误的意义上,而是因为您刚刚侮辱了您的来电者,现在您应该找到一个可以帮助您道歉的模板。

For the purposes of automated, impersonal corrospondence, a template may be devised for such things as daily articles, weekly issues or whatever, but the problem becomes important when corrospondence is instigated by the recipient to an automated service.

出于自动化、非个人通信的目的,可以为诸如每日文章、每周问题等之类的事物设计模板,但是当收件人向自动化服务发起通信时,问题变得重要。

What happens is an error. Missing information. Unknown or missing information will always generate an Exception. The real problem is not how do you seperate a person's name into its seperate components with an expression, but what do you call them.

发生的情况是错误。丢失的信息。未知或丢失的信息总是会产生异常。真正的问题不是你如何用表达式将一个人的名字分离成它的独立组件,而是你怎么称呼它们。

The solution is to create an extra field, make it optional if there is already a first and last name, and call it "What may we call you" or "How should we refer to you as". A doctor and a judge will ensure you address them properly. These are not programming issues, they are issues of communication.

解决方案是创建一个额外的字段,如果已经有名字和姓氏,则将其设为可选字段,并将其称为“我们可以怎么称呼您”或“我们应该如何称呼您”。医生和法官将确保您正确处理这些问题。这些不是编程问题,而是沟通问题。

Ok, bad way to put it, but in my opinion, Username, Tagname, and ID are worse. So my solution; is the missing question, "What should we call you?"

好吧,说得不好,但在我看来,用户名、标记名和 ID 更糟。所以我的解决方案;缺少的问题是“我们应该怎么称呼你?”

This is only a solution where you can afford to make a new question. Tact prevails. Create a new field upon your user forms, call it Alias, label for the user "What should we call you?", then you have a means to communicate with. Use the first and last name unless the recipient has given an Alias, or is personally familiar with the sender then first and middle is acceptable.

这只是一个您可以提出新问题的解决方案。机智占上风。在您的用户表单上创建一个新字段,称为别名,为用户标记“我们应该怎么称呼您?”,然后您就有了与之交流的方式。除非收件人提供了别名,或者与发件人个人熟悉,否则使用名字和姓氏,则可以使用名字和中间名。

To Me, _______________________ (standard subscribed corrospondence)
To Me ( Myself | I ), ________ (standard recipient instigated corrospondence)
To Me Myself I, ______________ (look out, its your mother, and you're in big trouble;
                                nobody addresses a person by their actual full name)

Dear *(Mr./Mrs./Ms./Dr./Hon./Sen.) Me M. I *(I),
To Whom it may Concern;

Otherwise you are looking for something standard: hello, greetings, you may be a winner.

否则,您正在寻找标准的东西:您好,问候,您可能是赢家。

Where you have data that is a person's name all in one string, you don't have a problem because you already have their alias. If what you need is the first and last name, then just Left(name,instr(name," ")) & " " & Right(name,instrrev(name," ")), my math is probably wrong, i'm a bit out of practice. compare left and right with known prefixes and suffixes and eliminate them from your matches. Generally the middle name is rarely used except for instances of confirming an identity; which an address or phone number tells you a lot more. Watching for hyphanation, one can determine that if the last name is not used, then one of the middle ones would be instead.

如果你的数据是一个人的名字,那么你就没有问题,因为你已经有了他们的别名。如果你需要的是名字和姓氏,那么只要 Left(name,instr(name," ")) & " " & Right(name,instrrev(name," ")),我的数学可能是错的,我'我有点不习惯。将左右与已知的前缀和后缀进行比较,并将它们从您的匹配项中消除。通常中间名很少使用,除非是为了确认身份;地址或电话号码可以告诉您更多信息。观察连字符,可以确定如果不使用姓氏,则取而代之的是中间名之一。

For searching lists of first and last names, one must consider the possibility that one of the middle ones was instead used; this would require four searches: one to filter for first & last, then another to filter first & middle, then another to filter middle & last, and then another to filter middle & middle. Ultimately, the first name is always first, and the last is always last, and there can be any number of middle names; less is more, and where zero is likely, but improbable.

对于搜索名字和姓氏的列表,必须考虑使用中间名之一的可能性;这将需要四次搜索:一个过滤第一个和最后一个,然后另一个过滤第一个和中间,然后另一个过滤中间和最后,然后另一个过滤中间和中间。归根结底,名字总是第一个,最后一个总是最后一个,中间名可以有任意数量;少即是多,其中可能为零,但不太可能。

Sometimes people prefer to be called Bill, Harry, Jim, Bob, Doug, Beth, Sue, or Madonna; than their actual names; similar, but unrealistically expected of anyone to fathom all the different possibilities.

有时人们更喜欢被称为比尔、哈利、吉姆、鲍勃、道格、贝丝、苏或麦当娜;比他们的真实姓名;类似但不切实际的期望任何人都能理解所有不同的可能性。

The most polite thing you could do, is ask; What can we call you?

你能做的最礼貌的事情就是问;我们可以叫你什么?

回答by Sean Fair

There are a few add-ins we have used in our company to accomplish this. I ended up creating a way to actually specify the formats for the name on our different imports for different clients. There is a company that has a tool that in my experience is well worth the price and is really incredible when tackling this subject. It's at: http://www.softwarecompany.com/and works great. The most efficient way to do this w/out using any statistical approach is to split the string by commas or spaces then: 1. strip titles and prefixes out 2. strip suffixes out 3, parse name in the order of ( 2 names = F & L, 3 names = F M L or L M F) depending on order of string().

我们在公司中使用了一些插件来实现这一点。我最终创建了一种方法来实际指定不同客户端的不同导入的名称格式。有一家公司有一个工具,根据我的经验,它物有所值,并且在解决这个问题时真的很棒。它位于:http: //www.softwarecompany.com/并且运行良好。不使用任何统计方法执行此操作的最有效方法是用逗号或空格分割字符串,然后: 1. 去除标题和前缀 2. 去除后缀 3,按照 ( 2 names = F & L, 3 个名字 = FML 或 LMF) 取决于 string() 的顺序。