pandas 熊猫合并具有不同名称的列并避免重复
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39985861/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas merge on columns with different names and avoid duplicates
提问by E.K.
How can I merge two pandas DataFrames on two columns with different names and keep one of the columns?
如何在具有不同名称的两列上合并两个 Pandas DataFrames 并保留其中一列?
df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']})
df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']})
pd.merge(df1, df2, left_on='UserName', right_on='UserID')
This provides a DataFrame like this
这提供了一个像这样的 DataFrame
But clearly I am merging on UserName
and UserID
so they are the same. I want it to look like this. Is there any clean ways to do this?
但显然我正在合并UserName
,UserID
所以他们是一样的。我希望它看起来像这样。有没有干净的方法来做到这一点?
Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like
只有我能想到的方法是在合并之前将列重新命名为相同,或者在合并后删除其中之一。如果Pandas自动丢弃其中一个,我会很好,或者我可以做类似的事情
pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left')
回答by Psidom
How about set the UserID
as index and then join on index for the second data frame?
如何设置UserID
as 索引,然后加入第二个数据框的索引?
pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)
# Col1 UserName Col2
# 0 a 1 d
# 1 b 2 e
# 2 c 3 f
回答by Boud
There is nothing really nice in it: it's meant to be keeping the columns as the larger cases like left right or outer joins would bring additional information with two columns. Don't try to overengineer your merge line, be explicit as you suggest
它没有什么好的东西:它意味着保留列,因为像左右连接或外部连接这样的较大情况会为两列带来额外的信息。不要试图过度设计您的合并线,按照您的建议明确
Solution 1:
解决方案1:
df2.columns = ['Col2', 'UserName']
pd.merge(df1, df2,on='UserName')
Out[67]:
Col1 UserName Col2
0 a 1 d
1 b 2 e
2 c 3 f
Solution 2:
解决方案2:
pd.merge(df1, df2, left_on='UserName', right_on='UserID').drop('UserID', axis=1)
Out[71]:
Col1 UserName Col2
0 a 1 d
1 b 2 e
2 c 3 f