Pandas 基于拆分另一列添加新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38956778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:49:50  来源:igfitidea点击:

Pandas add new columns based on splitting another column

pythonpandasdataframesplitmultiple-columns

提问by dagg3r

I have a pandas dataframe like the following:

我有一个如下所示的Pandas数据框:

A              B
US,65,AMAZON   2016
US,65,EBAY     2016

My goal is to get to look like this:

我的目标是看起来像这样:

A              B      country    code    com
US.65.AMAZON   2016   US         65      AMAZON
US.65.AMAZON   2016   US         65      EBAY

I know this question has been asked before hereand herebut noneof them works for me. I have tried:

我知道这里这里之前有人问过这个问题,但没有一个对我有用。我试过了:

df['country','code','com'] = df.Field.str.split('.')

and

df2 = pd.DataFrame(df.Field.str.split('.').tolist(),columns = ['country','code','com','A','B'])

Am I missing something? Any help is much appreciated.

我错过了什么吗?任何帮助深表感谢。

回答by jezrael

You can use splitwith parameter expand=Trueand add one []to left side:

您可以使用splitwith 参数expand=True[]在左侧添加一个:

df[['country','code','com']] = df.A.str.split(',', expand=True)

Then replace,to .:

然后到:replace,.

df.A = df.A.str.replace(',','.')

print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

Another solution with DataFrameconstructor if there are no NaNvalues:

DataFrame如果没有NaN值,则使用构造函数的另一种解决方案:

df[['country','code','com']] = pd.DataFrame([ x.split(',') for x in df['A'].tolist() ])
df.A = df.A.str.replace(',','.')
print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

Also you can use column names in constructor, but then concatis necessary:

您也可以在构造函数中使用列名,但这concat是必要的:

df1=pd.DataFrame([x.split(',') for x in df['A'].tolist()],columns= ['country','code','com'])
df.A = df.A.str.replace(',','.')
df = pd.concat([df, df1], axis=1)
print (df)
              A     B country code     com
0  US.65.AMAZON  2016      US   65  AMAZON
1    US.65.EBAY  2016      US   65    EBAY

回答by user10451754

This will not give the output as expected it will only give the df['A'] first value which is 'U'

这不会像预期的那样给出输出,它只会给出 df['A'] 的第一个值,即 'U'

This is okay to create column based on provided data df1=pd.DataFrame([x.split(',') for x in df['A'].tolist()],columns= ['country','code','com'])

可以根据提供的数据创建列 df1=pd.DataFrame([x.split(',') for x in df['A'].tolist()],columns= ['country','code' ,'com'])

instead of for lambda also can be use

也可以使用代替 lambda

回答by Nithin Narla

For getting the new columns I would prefer doing it as following:

为了获得新列,我更愿意按以下方式进行:

df['Country'] = df['A'].apply(lambda x: x[0])
df['Code'] = df['A'].apply(lambda x: x[1])
df['Com'] = df['A'].apply(lambda x: x[2])

As for the replacement of ,with a .you can use the following:

至于替换用一个. 您可以使用以下内容:

df['A'] = df['A'].str.replace(',','.')