pandas 合并熊猫列(一对多)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44842458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merging pandas columns (one-to-many)
提问by Andi Maier
I am new to python pandas in which I want to combine several Excel sheets by a common ID. Besides, there it is a one-to-many relationship.
我是 python pandas 的新手,我想通过一个公共 ID 组合多个 Excel 工作表。此外,它是一对多的关系。
Here is the input:
这是输入:
df1
df1
<b>ID Name</b><br/>
3763058 Andi<br/>
3763077 Mark
and
和
df2:
df2:
<b>ID Tag</b><br/>
3763058 item1 <br/>
3763058 item2<br/>
3763058 item3<br/>
3763077 item_4<br/>
3763077 item_5<br/>
3763077 item_6
I would now like to merge the two pandas data frames df1 and df2 into the following output (the column tag is merged in a single column per ID):
我现在想将两个 Pandas 数据帧 df1 和 df2 合并到以下输出中(列标记合并为每个 ID 的单个列):
<b>ID Name Tag</b><br/>
3763058 Andi item1, item2, item3<br/>
3763077 Mark item_4, item_5, item_6<br/>
Could anybody please help me with this?
有人可以帮我解决这个问题吗?
Cheers, Andi
干杯,安迪
回答by jezrael
You can use first groupby
with join
:
您可以首先groupby
使用join
:
df2 = df2.groupby('ID')['Tag'].apply(', '.join).reset_index()
print (df2)
ID Tag
0 3763058 item1, item2, item3
1 3763077 item_4, item_5, item_6
Then is possible use merge
, especially if df1
has more columns:
然后是可能的使用merge
,特别是如果df1
有更多的列:
df = pd.merge(df1, df2, on='ID', how='left')
print (df)
ID Name Tag
0 3763058 Andi item1, item2, item3
1 3763077 Mark item_4, item_5, item_6
Solution with map
, if need add only one column:
用 解决map
,如果只需要添加一列:
df2 = df2.groupby('ID')['Tag'].apply(', '.join).reset_index()
df2['Name'] = df2['ID'].map(df1.set_index('ID')['Name'])
print (df2)
ID Tag Name
0 3763058 item1, item2, item3 Andi
1 3763077 item_4, item_5, item_6 Mark
If important position of Name
column use insert
:
如果Name
列的重要位置使用insert
:
df2 = df2.groupby('ID')['Tag'].apply(', '.join).reset_index()
df2.insert(1, 'Name', df2['ID'].map(df1.set_index('ID')['Name']))
print (df2)
ID Name Tag
0 3763058 Andi item1, item2, item3
1 3763077 Mark item_4, item_5, item_6