pandas 熊猫合并具有不同名称的列并避免重复

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39985861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:10:57  来源:igfitidea点击:

pandas merge on columns with different names and avoid duplicates

pythonpandasmerge

提问by E.K.

How can I merge two pandas DataFrames on two columns with different names and keep one of the columns?

如何在具有不同名称的两列上合并两个 Pandas DataFrames 并保留其中一列?

df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']})
df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']})
pd.merge(df1, df2, left_on='UserName', right_on='UserID')

This provides a DataFrame like this

这提供了一个像这样的 DataFrame

enter image description here

在此处输入图片说明

But clearly I am merging on UserNameand UserIDso they are the same. I want it to look like this. Is there any clean ways to do this?

但显然我正在合并UserNameUserID所以他们是一样的。我希望它看起来像这样。有没有干净的方法来做到这一点?

enter image description here

在此处输入图片说明

Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like

只有我能想到的方法是在合并之前将列重新命名为相同,或者在合并后删除其中之一。如果Pandas自动丢弃其中一个,我会很好,或者我可以做类似的事情

pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left')

回答by Psidom

How about set the UserIDas index and then join on index for the second data frame?

如何设置UserIDas 索引,然后加入第二个数据框的索引?

pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)

#   Col1    UserName    Col2
# 0    a           1       d
# 1    b           2       e
# 2    c           3       f

回答by Boud

There is nothing really nice in it: it's meant to be keeping the columns as the larger cases like left right or outer joins would bring additional information with two columns. Don't try to overengineer your merge line, be explicit as you suggest

它没有什么好的东西:它意味着保留列,因为像左右连接或外部连接这样的较大情况会为两列带来额外的信息。不要试图过度设计您的合并线,按照您的建议明确

Solution 1:

解决方案1:

df2.columns = ['Col2', 'UserName']

pd.merge(df1, df2,on='UserName')
Out[67]: 
  Col1  UserName Col2
0    a         1    d
1    b         2    e
2    c         3    f

Solution 2:

解决方案2:

pd.merge(df1, df2, left_on='UserName', right_on='UserID').drop('UserID', axis=1)
Out[71]: 
  Col1  UserName Col2
0    a         1    d
1    b         2    e
2    c         3    f