pandas 在熊猫数据框中将一列拆分为具有特定名称的多列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48207115/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:02:43  来源:igfitidea点击:

splitting a column into multiple columns with specific name in pandas dataframe

pythonpandasdataframe

提问by Avinash Clinton

I have following dataframe:

我有以下数据框:

pri    sec
TOM    AB,CD,EF
Hyman   XY,YZ
HARRY  FG
NICK   KY,NY,SD,EF,FR

I need following output with column names as following(based on how many , separated fields exists in column 'sec'):

我需要以下带有列名的输出(基于列“秒”中存在多少个分隔字段):

pri    sec             sec0  sec1  sec2  sec3 sec4
TOM    AB,CD,EF        AB    CD    EF    NaN  NaN
Hyman   XY,YZ           XY    YZ    NaN   NaN  NaN
HARRY  FG              FG    NaN   NaN   NaN  NaN
NICK   KY,NY,SD,EF,FR  KY    NY    SD    EF   ER

Can I get any suggestions?

我能得到任何建议吗?

回答by jezrael

Use join+ split+ add_prefix:

使用join+ split+ add_prefix

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
     pri             sec sec0  sec1  sec2  sec3  sec4
0    TOM        AB,CD,EF   AB    CD    EF  None  None
1   Hyman           XY,YZ   XY    YZ  None  None  None
2  HARRY              FG   FG  None  None  None  None
3   NICK  KY,NY,SD,EF,FR   KY    NY    SD    EF    FR

And if need NaNs add fillna:

如果需要NaN添加fillna

df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
     pri             sec sec0 sec1 sec2 sec3 sec4
0    TOM        AB,CD,EF   AB   CD   EF  NaN  NaN
1   Hyman           XY,YZ   XY   YZ  NaN  NaN  NaN
2  HARRY              FG   FG  NaN  NaN  NaN  NaN
3   NICK  KY,NY,SD,EF,FR   KY   NY   SD   EF   FR

回答by rnso

Try following code (explanations as comments). It finds max length of items in "sec" column and creates names accordingly:

尝试以下代码(解释为注释)。它在“秒”列中找到项目的最大长度并相应地创建名称:

maxlen = max(list(map(lambda x: len(x.split(",")) ,df.sec))) # find max length in 'sec' column
cols = ["sec"+str(x)   for x in range(maxlen)]      # create new column names 
datalist = list(map(lambda x: x.split(","), df.sec)) # create list from entries in "sec" 
newdf = pd.DataFrame(data=datalist, columns=cols)   # create dataframe of new columns
newdf = pd.concat([df, newdf], axis=1)              # add it to original dataframe
print(newdf)

Output:

输出:

     pri             sec sec0  sec1  sec2  sec3  sec4
0    TOM        AB,CD,EF   AB    CD    EF  None  None
1   Hyman           XY,YZ   XY    YZ  None  None  None
2  HARRY              FG   FG  None  None  None  None
3   NICK  KY,NY,SD,EF,FR   KY    NY    SD    EF    FR