SQL:从全名字段中解析名字、中间名和姓氏

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/159567/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:44:00  来源:igfitidea点击:

SQL: parse the first, middle and last name from a fullname field

sqlsql-servertsqlparsing

提问by Even Mien

How do I parse the first, middle, and last name out of a fullname field with SQL?

如何使用 SQL 从全名字段中解析名字、中间名和姓氏?

I need to try to match up on names that are not a direct match on full name. I'd like to be able to take the full name field and break it up into first, middle and last name.

我需要尝试匹配与全名不直接匹配的名称。我希望能够获取全名字段并将其分解为名字、中间名和姓氏。

The data does not include any prefixes or suffixes. The middle name is optional. The data is formatted 'First Middle Last'.

数据不包括任何前缀或后缀。中间名是可选的。数据格式为“First Middle Last”。

I'm interested in some practical solutions to get me 90% of the way there. As it has been stated, this is a complex problem, so I'll handle special cases individually.

我对一些实用的解决方案很感兴趣,可以让我完成 90% 的工作。如前所述,这是一个复杂的问题,因此我将单独处理特殊情况。

回答by JosephStyons

Here is a self-contained example, with easily manipulated test data.

这是一个独立的示例,带有易于操作的测试数据。

With this example, if you have a name with more than three parts, then all the "extra" stuff will get put in the LAST_NAME field. An exception is made for specific strings that are identified as "titles", such as "DR", "MRS", and "MR".

在此示例中,如果您的名称包含三个以上的部分,则所有“额外”内容都将放入 LAST_NAME 字段中。标识为“titles”的特定字符串除外,例如“DR”、“MRS”和“MR”。

If the middle name is missing, then you just get FIRST_NAME and LAST_NAME (MIDDLE_NAME will be NULL).

如果缺少中间名,那么您只会得到 FIRST_NAME 和 LAST_NAME(MIDDLE_NAME 将为 NULL)。

You could smash it into a giant nested blob of SUBSTRINGs, but readability is hard enough as it is when you do this in SQL.

你可以把它粉碎成一个巨大的嵌套的 SUBSTRING blob,但是可读性已经够难了,就像你在 SQL 中做的那样。

Edit-- Handle the following special cases:

编辑——处理以下特殊情况:

1 - The NAME field is NULL

1 - NAME 字段为 NULL

2 - The NAME field contains leading / trailing spaces

2 - NAME 字段包含前导/尾随空格

3 - The NAME field has > 1 consecutive space within the name

3 - NAME 字段在名称中有 > 1 个连续空格

4 - The NAME field contains ONLY the first name

4 - NAME 字段只包含名字

5 - Include the original full name in the final output as a separate column, for readability

5 - 将原始全名作为单独的列包含在最终输出中,以提高可读性

6 - Handle a specific list of prefixes as a separate "title" column

6 - 将特定的前缀列表作为单独的“标题”列处理

SELECT
  FIRST_NAME.ORIGINAL_INPUT_DATA
 ,FIRST_NAME.TITLE
 ,FIRST_NAME.FIRST_NAME
 ,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
       THEN NULL  --no more spaces?  assume rest is the last name
       ELSE SUBSTRING(
                       FIRST_NAME.REST_OF_NAME
                      ,1
                      ,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1
                     )
       END AS MIDDLE_NAME
 ,SUBSTRING(
             FIRST_NAME.REST_OF_NAME
            ,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
            ,LEN(FIRST_NAME.REST_OF_NAME)
           ) AS LAST_NAME
FROM
  (  
  SELECT
    TITLE.TITLE
   ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
         THEN TITLE.REST_OF_NAME --No space? return the whole thing
         ELSE SUBSTRING(
                         TITLE.REST_OF_NAME
                        ,1
                        ,CHARINDEX(' ',TITLE.REST_OF_NAME)-1
                       )
    END AS FIRST_NAME
   ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)  
         THEN NULL  --no spaces @ all?  then 1st name is all we have
         ELSE SUBSTRING(
                         TITLE.REST_OF_NAME
                        ,CHARINDEX(' ',TITLE.REST_OF_NAME)+1
                        ,LEN(TITLE.REST_OF_NAME)
                       )
    END AS REST_OF_NAME
   ,TITLE.ORIGINAL_INPUT_DATA
  FROM
    (   
    SELECT
      --if the first three characters are in this list,
      --then pull it as a "title".  otherwise return NULL for title.
      CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
           THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3)))
           ELSE NULL
           END AS TITLE
      --if you change the list, don't forget to change it here, too.
      --so much for the DRY prinicple...
     ,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
           THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME))))
           ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME))
           END AS REST_OF_NAME
     ,TEST_DATA.ORIGINAL_INPUT_DATA
    FROM
      (
      SELECT
        --trim leading & trailing spaces before trying to process
        --disallow extra spaces *within* the name
        REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),'  ',' '),'  ',' ') AS FULL_NAME
       ,FULL_NAME AS ORIGINAL_INPUT_DATA
      FROM
        (
        --if you use this, then replace the following
        --block with your actual table
              SELECT 'GEORGE W BUSH' AS FULL_NAME
        UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME
        UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME
        UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME
        UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME
        UNION SELECT 'TOMMY' AS FULL_NAME
        UNION SELECT 'BILLY' AS FULL_NAME
        UNION SELECT NULL AS FULL_NAME
        UNION SELECT ' ' AS FULL_NAME
        UNION SELECT '    JOHN  JACOB     SMITH' AS FULL_NAME
        UNION SELECT ' DR  SANJAY       GUPTA' AS FULL_NAME
        UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME
        UNION SELECT ' MRS  SUSAN ADAMS' AS FULL_NAME
        UNION SELECT ' MS AUGUSTA  ADA   KING ' AS FULL_NAME      
        ) RAW_DATA
      ) TEST_DATA
    ) TITLE
  ) FIRST_NAME

回答by neonski

It's difficult to answer without knowing how the "full name" is formatted.

如果不知道“全名”是如何格式化的,就很难回答。

It could be "Last Name, First Name Middle Name" or "First Name Middle Name Last Name", etc.

它可以是“姓氏,名字中间名”或“名字中间名姓氏”等。

Basically you'll have to use the SUBSTRINGfunction

基本上你必须使用SUBSTRING函数

SUBSTRING ( expression , start , length )

And probably the CHARINDEXfunction

可能还有CHARINDEX函数

CHARINDEX (substr, expression)

To figure out the start and length for each part you want to extract.

找出要提取的每个部分的开始和长度。

So let's say the format is "First Name Last Name" you could (untested.. but should be close) :

所以让我们说格式是“名字姓氏”你可以(未经测试..但应该接近):

SELECT 
SUBSTRING(fullname, 1, CHARINDEX(' ', fullname) - 1) AS FirstName, 
SUBSTRING(fullname, CHARINDEX(' ', fullname) + 1, len(fullname)) AS LastName
FROM YourTable

回答by Lasse V. Karlsen

Reverse the problem, add columns to hold the individual pieces and combine them to get the full name.

反转问题,添加列以保存各个部分并将它们组合以获得全名。

The reason this will be the bestanswer is that there is no guaranteed way to figure out a person has registered as their first name, and what is their middle name.

这将是最佳答案的原因是没有保证的方法来确定一个人已注册为他们的名字,以及他们的中间名是什么。

For instance, how would you split this?

例如,您将如何拆分它?

Jan Olav Olsen Heggelien

This, while being fictious, is a legal name in Norway, and could, but would not have to, be split like this:

这虽然是虚构的,但在挪威是一个合法名称,可以但不必像这样拆分:

First name: Jan Olav
Middle name: Olsen
Last name: Heggelien

or, like this:

或者,像这样:

First name: Jan Olav
Last name: Olsen Heggelien

or, like this:

或者,像这样:

First name: Jan
Middle name: Olav
Last name: Olsen Heggelien

I would imagine similar occurances can be found in most languages.

我想在大多数语言中都可以找到类似的情况。

So instead of trying to interpreting data which does not have enough information to get it right, store the correct interpretation, and combine to get the full name.

因此,与其试图解释没有足够信息的数据,不如存储正确的解释,然后结合起来得到全名。

回答by Josh Millard

Unless you have very, very well-behaved data, this is a non-trivial challenge. A naive approach would be to tokenize on whitespace and assume that a three-token result is [first, middle, last] and a two-token result is [first, last], but you're going to have to deal with multi-word surnames (e.g. "Van Buren") and multiple middle names.

除非你有非常非常好的数据,否则这是一个不小的挑战。一种天真的方法是对空格进行标记并假设三个标记的结果是 [first, middle, last] 和两个标记的结果是 [first, last],但是您将不得不处理多个-单词姓氏(例如“Van Buren”)和多个中间名。

回答by hajili

Alternative simple way is to use parsename:

另一种简单的方法是使用parsename

select full_name,
   parsename(replace(full_name, ' ', '.'), 3) as FirstName,
   parsename(replace(full_name, ' ', '.'), 2) as MiddleName,
   parsename(replace(full_name, ' ', '.'), 1) as LastName 
from YourTableName

source

来源

回答by Mukesh Pandey

This query is working fine.

此查询工作正常。

SELECT name
    ,Ltrim(SubString(name, 1, Isnull(Nullif(CHARINDEX(' ', name), 0), 1000))) AS FirstName
    ,Ltrim(SUBSTRING(name, CharIndex(' ', name), CASE 
                WHEN (CHARINDEX(' ', name, CHARINDEX(' ', name) + 1) - CHARINDEX(' ', name)) <= 0
                    THEN 0
                ELSE CHARINDEX(' ', name, CHARINDEX(' ', name) + 1) - CHARINDEX(' ', name)
                END)) AS MiddleName
    ,Ltrim(SUBSTRING(name, Isnull(Nullif(CHARINDEX(' ', name, Charindex(' ', name) + 1), 0), CHARINDEX(' ', name)), CASE 
                WHEN Charindex(' ', name) = 0
                    THEN 0
                ELSE LEN(name)
                END)) AS LastName
FROM yourtableName

回答by Mukesh Pandey

This Will Work in Case String Is FirstName/MiddleName/LastName

这将适用于字符串是名字/中间名/姓氏的情况

Select 

DISTINCT NAMES ,

   SUBSTRING(NAMES , 1, CHARINDEX(' ', NAMES) - 1) as FirstName,

   RTRIM(LTRIM(REPLACE(REPLACE(NAMES,SUBSTRING(NAMES , 1, CHARINDEX(' ', NAMES) - 1),''),REVERSE( LEFT( REVERSE(NAMES), CHARINDEX(' ', REVERSE(NAMES))-1 ) ),'')))as MiddleName,

   REVERSE( LEFT( REVERSE(NAMES), CHARINDEX(' ', REVERSE(NAMES))-1 ) ) as LastName

From TABLENAME

回答by Franci Penov

Are you sure the Full Legal Name will always include First, Middle and Last? I know people that have only one name as Full Legal Name, and honestly I am not sure if that's their First or Last Name. :-) I also know people that have more than one Fisrt names in their legal name, but don't have a Middle name. And there are some people that have multiple Middle names.

您确定法定全名将始终包括名字、中间名和姓氏吗?我认识只有一个名字作为法定全名的人,老实说,我不确定这是他们的名字还是姓氏。:-) 我也认识一些人,他们的法定名字中有多个 Fisrt 名字,但没有中间名。有些人有多个中间名。

Then there's also the order of the names in the Full Legal Name. As far as I know, in some Asian cultures the Last Name comes first in the Full Legal Name.

然后还有完整法定名称中名称的顺序。据我所知,在某些亚洲文化中,姓氏在法定全名中排在第一位。

On a more practical note, you could split the Full Name on whitespace and threat the first token as First name and the last token (or the only token in case of only one name) as Last name. Though this assumes that the order will be always the same.

更实用的一点是,您可以在空格上拆分全名,并将第一个标记威胁为名字,将最后一个标记(或只有一个名字的唯一标记)威胁为姓氏。尽管这假设顺序将始终相同。

回答by Marc Bernier

Like #1 said, it's not trivial. Hyphenated last names, initials, double names, inverse name sequence and a variety of other anomalies can ruin your carefully crafted function.

就像#1 所说的,这不是微不足道的。带连字符的姓氏、首字母缩写、双名、反向名称序列和各种其他异常情况可能会破坏您精心设计的功能。

You could use a 3rd party library (plug/disclaimer - I worked on this product):

您可以使用 3rd 方库(插件/免责声明 - 我在此产品上工作过):

http://www.melissadata.com/nameobject/nameobject.htm

http://www.melissadata.com/nameobject/nameobject.htm

回答by Ben

I would do this as an iterative process.

我会把它作为一个迭代过程来做。

1) Dump the table to a flat file to work with.

1) 将表转储到平面文件中以使用。

2) Write a simple program to break up your Names using a space as separator where firsts token is the first name, if there are 3 token then token 2 is middle name and token 3 is last name. If there are 2 tokens then the second token is the last name. (Perl, Java, or C/C++, language doesn't matter)

2) 编写一个简单的程序来使用空格作为分隔符来拆分您的姓名,其中第一个标记是名字,如果有 3 个标记,则标记 2 是中间名,标记 3 是姓氏。如果有 2 个标记,则第二个标记是姓氏。(Perl、Java 或 C/C++,语言无关紧要)

3) Eyeball the results. Look for names that don't fit this rule.

3) 关注结果。查找不符合此规则的名称。

4) Using that example, create a new rule to handle that exception...

4)使用该示例,创建一个新规则来处理该异常...

5) Rinse and Repeat

5)冲洗并重复

Eventually you will get a program that fixes all your data.

最终你会得到一个修复你所有数据的程序。