在 Python DataFrame 中拆分字符串

Question

提问by Rene Decol

I have a DataFrame in Python with a column with names (such as Joseph Haydn, Wolfgang Amadeus Mozart, Antonio Salieri and so forth).

我在 Python 中有一个 DataFrame，其中有一列带有名称（例如 Joseph Haydn、Wolfgang Amadeus Mozart、Antonio Salieri 等）。

I want to get a new column with the last names: Haydn, Mozart, Salieri and so forth.

我想要一个新的专栏，上面写着姓氏：海顿、莫扎特、萨列里等等。

I know how to split a string, but I could not find a way to apply it to a series, or a Data Frame column.

我知道如何拆分字符串，但找不到将其应用于系列或数据框列的方法。

Answer 1

回答by Andre Holzner

if you have:

如果你有：

import pandas
data = pandas.DataFrame({"composers": [ 
    "Joseph Haydn", 
    "Wolfgang Amadeus Mozart", 
    "Antonio Salieri",
    "Eumir Deodato"]})

assuming you want only the first name (and not the middle name like Amadeus):

假设您只需要名字（而不是像 Amadeus 这样的中间名）：

data.composers.str.split('\s+').str[0]

will give:

会给：

0      Joseph
1    Wolfgang
2     Antonio
3       Eumir
dtype: object

you can assign this to a new column in the same dataframe:

您可以将其分配给同一数据框中的新列：

data['firstnames'] = data.composers.str.split('\s+').str[0]

Last names would be:

姓氏将是：

data.composers.str.split('\s+').str[-1]

which gives:

这使：

0      Haydn
1     Mozart
2    Salieri
3    Deodato
dtype: object

(see also Python Pandas: selecting element in array columnfor accessing elements in an 'array' column)

（另请参阅Python Pandas：选择数组列中的元素以访问“数组”列中的元素）

For all but the last names you can apply " ".join(..)to all but the last element ([:-1]) of each row:

对于除姓氏以外的所有名称，您可以应用于每行" ".join(..)的最后一个元素 ( [:-1]) 以外的所有元素：

data.composers.str.split('\s+').str[:-1].apply(lambda parts: " ".join(parts))

which gives:

这使：

0              Joseph
1    Wolfgang Amadeus
2             Antonio
3               Eumir
dtype: object

Answer 2

回答by Mahdi4SM

Try this to solve your problem:

试试这个来解决你的问题：

import pandas as pd
df = pd.DataFrame(
    {'composers':
        [ 
            'Joseph Haydn', 
            'Wolfgang Amadeus Mozart', 
            'Antonio Salieri',
            'Eumir Deodato',
        ]
    }
)

df['lastname'] = df['composers'].str.split(n = 0, expand = False).str[1]

You can now find the DataFrame, as shown below.

您现在可以找到 DataFrame，如下所示。

composers   lastname
0   Joseph Haydn    Haydn
1   Wolfgang Amadeus Mozart Amadeus Mozart
2   Antonio Salieri Salieri
3   Eumir Deodato   Deodato

在 Python DataFrame 中拆分字符串

提问by Rene Decol

回答by Andre Holzner

回答by Mahdi4SM

相关推荐

最近更新

标签

在 Python DataFrame 中拆分字符串

提问by Rene Decol

回答by Andre Holzner

回答by Mahdi4SM

相关推荐

Python 不使用 reversed() 或 [::-1] 来反转字符串？

Python 3 上的 dict.keys()[0]

Python 如何在 Windows 上运行 Airflow

Python ValueError：无法将大小为 2 的序列复制到维度为 4 的数组轴

相关推荐

最近更新

标签