在python中合并数据帧时出现重复的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39019591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:46:26  来源:igfitidea点击:

Duplicated rows when merging dataframes in python

pythonpython-2.7python-3.xpandasmerge

提问by Roberto Bertinetti

I am currently merging 2 dataframes with an outer join, but after merging, I see all the rows are duplicated even when the columns I did the merge upon contain the same values. In detail:

我目前正在将 2 个数据帧与外部连接合并,但合并后,即使我进行合并的列包含相同的值,我也看到所有行都重复。详细:

list_1 = pd.read_csv('list_1.csv')
list_2 = pd.read_csv('list_2.csv')

merged_list = pd.merge(list_1 , list_2 , on=['email_address'], how='inner')

with the following input and results:

具有以下输入和结果:

list_1:

列表_1:

email_address, name, surname
[email protected], john, smith
[email protected], john, smith
[email protected], elvis, presley

list_2:

列表_2:

email_address, street, city
[email protected], street1, NY
[email protected], street1, NY
[email protected], street2, LA

merged_list:

合并列表:

email_address, name, surname, street, city
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], elvis, presley, street2, LA
[email protected], elvis, presley, street2, LA

My question is, shouldn't it be like this?

我的问题是,不应该是这样吗?

merged_list (how I would like it to be :D):

合并列表(我希望它如何:D):

email_address, name, surname, street, city
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], elvis, presley, street2, LA

How can I make it so that it becomes like this? Thanks a lot for your help!

我怎样才能让它变成这样?非常感谢你的帮助!

回答by piRSquared

list_2_nodups = list_2.drop_duplicates()
pd.merge(list_1 , list_2_nodups , on=['email_address'])

enter image description here

在此处输入图片说明

The duplicate rows are expected. Each john smith in list_1matches with each john smith in list_2. I had to drop the duplicates in one of the lists. I chose list_2.

预计会出现重复的行。每个 john smithlist_1与 . 的每个 john smith 匹配list_2。我不得不在其中一个列表中删除重复项。我选择了list_2