pandas 在熊猫数据框中将一列拆分为具有特定名称的多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48207115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
splitting a column into multiple columns with specific name in pandas dataframe
提问by Avinash Clinton
I have following dataframe:
我有以下数据框:
pri sec
TOM AB,CD,EF
Hyman XY,YZ
HARRY FG
NICK KY,NY,SD,EF,FR
I need following output with column names as following(based on how many , separated fields exists in column 'sec'):
我需要以下带有列名的输出(基于列“秒”中存在多少个分隔字段):
pri sec sec0 sec1 sec2 sec3 sec4
TOM AB,CD,EF AB CD EF NaN NaN
Hyman XY,YZ XY YZ NaN NaN NaN
HARRY FG FG NaN NaN NaN NaN
NICK KY,NY,SD,EF,FR KY NY SD EF ER
Can I get any suggestions?
我能得到任何建议吗?
回答by jezrael
Use join
+ split
+ add_prefix
:
使用join
+ split
+ add_prefix
:
df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF None None
1 Hyman XY,YZ XY YZ None None None
2 HARRY FG FG None None None None
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
And if need NaN
s add fillna
:
如果需要NaN
添加fillna
:
df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF NaN NaN
1 Hyman XY,YZ XY YZ NaN NaN NaN
2 HARRY FG FG NaN NaN NaN NaN
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
回答by rnso
Try following code (explanations as comments). It finds max length of items in "sec" column and creates names accordingly:
尝试以下代码(解释为注释)。它在“秒”列中找到项目的最大长度并相应地创建名称:
maxlen = max(list(map(lambda x: len(x.split(",")) ,df.sec))) # find max length in 'sec' column
cols = ["sec"+str(x) for x in range(maxlen)] # create new column names
datalist = list(map(lambda x: x.split(","), df.sec)) # create list from entries in "sec"
newdf = pd.DataFrame(data=datalist, columns=cols) # create dataframe of new columns
newdf = pd.concat([df, newdf], axis=1) # add it to original dataframe
print(newdf)
Output:
输出:
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF None None
1 Hyman XY,YZ XY YZ None None None
2 HARRY FG FG None None None None
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR