pandas 合并熊猫列(一对多)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44842458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:54:27  来源:igfitidea点击:

Merging pandas columns (one-to-many)

pythonexcelpandas

提问by Andi Maier

I am new to python pandas in which I want to combine several Excel sheets by a common ID. Besides, there it is a one-to-many relationship.

我是 python pandas 的新手,我想通过一个公共 ID 组合多个 Excel 工作表。此外,它是一对多的关系。

Here is the input:

这是输入:

df1

df1

<b>ID       Name</b><br/>
3763058 Andi<br/>
3763077 Mark

and

df2:

df2:

<b>ID   Tag</b><br/>
3763058 item1 <br/>
3763058 item2<br/>
3763058 item3<br/>
3763077 item_4<br/>
3763077 item_5<br/>
3763077 item_6

I would now like to merge the two pandas data frames df1 and df2 into the following output (the column tag is merged in a single column per ID):

我现在想将两个 Pandas 数据帧 df1 和 df2 合并到以下输出中(列标记合并为每个 ID 的单个列):

<b>ID   Name    Tag</b><br/>
3763058 Andi    item1, item2, item3<br/>
3763077 Mark    item_4, item_5, item_6<br/>

Could anybody please help me with this?

有人可以帮我解决这个问题吗?

Cheers, Andi

干杯,安迪

回答by jezrael

You can use first groupbywith join:

您可以首先groupby使用join

df2 = df2.groupby('ID')['Tag'].apply(', '.join).reset_index()
print (df2)
        ID                     Tag
0  3763058     item1, item2, item3
1  3763077  item_4, item_5, item_6

Then is possible use merge, especially if df1has more columns:

然后是可能的使用merge,特别是如果df1有更多的列:

df = pd.merge(df1, df2, on='ID', how='left')
print (df)
        ID  Name                     Tag
0  3763058  Andi     item1, item2, item3
1  3763077  Mark  item_4, item_5, item_6

Solution with map, if need add only one column:

用 解决map,如果只需要添加一列:

df2 = df2.groupby('ID')['Tag'].apply(', '.join).reset_index()
df2['Name'] = df2['ID'].map(df1.set_index('ID')['Name'])
print (df2)
        ID                     Tag  Name
0  3763058     item1, item2, item3  Andi
1  3763077  item_4, item_5, item_6  Mark

If important position of Namecolumn use insert:

如果Name列的重要位置使用insert

df2 = df2.groupby('ID')['Tag'].apply(', '.join).reset_index()
df2.insert(1, 'Name', df2['ID'].map(df1.set_index('ID')['Name']))
print (df2)
        ID  Name                     Tag
0  3763058  Andi     item1, item2, item3
1  3763077  Mark  item_4, item_5, item_6