将列值拆分为 2 个新列 - Python Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44206962/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Splitting column value into 2 new columns - Python Pandas
提问by JD2775
I have a dataframe that has column 'name'. With values like 'James Cameron'. I'd like to split it out into 2 new columns 'First_Name' and 'Last_Name', but there is no delimiter in the data so I am not quite sure how. I realize that 'James' is in position [0] and 'Cameron' is in position [1], but I am not sure you can recognize that without the delimiter
我有一个包含“名称”列的数据框。像“詹姆斯卡梅隆”这样的价值观。我想将其拆分为 2 个新列“First_Name”和“Last_Name”,但数据中没有分隔符,因此我不太确定如何进行。我意识到“詹姆斯”在位置 [0] 和“卡梅隆”在位置 [1],但我不确定如果没有分隔符,您是否能认出
df = pd.DataFrame({'name':['James Cameron','Martin Sheen'],
'Id':[1,2]})
df
EDIT:
编辑:
Vaishali's answer below worked perfectly, for the dataframe I had provided. I created that dataframe as an example though. My real code looks like this"
对于我提供的数据框,Vaishali 下面的回答非常有效。不过,我创建了该数据框作为示例。我的真实代码看起来像这样”
data[['First_Name','Last_Name']] = data.director_name.str.split(' ', expand = True)
and that unfortunately, is throwing an error:
不幸的是,抛出了一个错误:
'Columns must be same length as key'
The column holds the same values as my example though. Any suggestions?
不过,该列与我的示例具有相同的值。有什么建议?
Thanks
谢谢
回答by Vaishali
You can split on space
你可以分割空间
df[['Name', 'Lastname']] = df.name.str.split(' ', expand = True)
Id name Name Lastname
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
EDIT: Handling the error 'Columns must be same length as key'. The data might have some names with more than one space, eg: George Martin Jr. In that case, one way is to split on space and use the first and the second string, ignoring third if it exists
编辑:处理错误“列必须与键的长度相同”。数据的一些名称可能有多个空格,例如:George Martin Jr. 在这种情况下,一种方法是在空格上拆分并使用第一个和第二个字符串,如果存在则忽略第三个
df['First_Name'] = df.name.str.split(' ', expand = True)[0]
df['Last_Name'] = df.name.str.split(' ', expand = True)[1]
回答by piokuc
Slightly different way of doing this:
这样做的方式略有不同:
df[['first_name', 'last_name']] = df.apply(lambda row: row['name'].split(), axis=1)
df
Id name first_name last_name
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen
回答by piRSquared
I like this method... Not as quick as simply splitting but it drops in column names in a very convenient way.
我喜欢这种方法......不像简单的拆分那么快,但它以一种非常方便的方式放入列名。
df.join(df.name.str.extract('(?P<First>\S+)\s+(?P<Last>\S+)', expand=True))
Id name First Last
0 1 James Cameron James Cameron
1 2 Martin Sheen Martin Sheen