在python中合并数据帧时出现重复的行

Question

提问by Roberto Bertinetti

I am currently merging 2 dataframes with an outer join, but after merging, I see all the rows are duplicated even when the columns I did the merge upon contain the same values. In detail:

我目前正在将 2 个数据帧与外部连接合并，但合并后，即使我进行合并的列包含相同的值，我也看到所有行都重复。详细：

list_1 = pd.read_csv('list_1.csv')
list_2 = pd.read_csv('list_2.csv')

merged_list = pd.merge(list_1 , list_2 , on=['email_address'], how='inner')

with the following input and results:

具有以下输入和结果：

list_1:

列表_1：

email_address, name, surname
[email protected], john, smith
[email protected], john, smith
[email protected], elvis, presley

list_2:

列表_2：

email_address, street, city
[email protected], street1, NY
[email protected], street1, NY
[email protected], street2, LA

merged_list:

合并列表：

email_address, name, surname, street, city
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], elvis, presley, street2, LA
[email protected], elvis, presley, street2, LA

My question is, shouldn't it be like this?

我的问题是，不应该是这样吗？

merged_list (how I would like it to be :D):

合并列表（我希望它如何：D）：

email_address, name, surname, street, city
[email protected], john, smith, street1, NY
[email protected], john, smith, street1, NY
[email protected], elvis, presley, street2, LA

How can I make it so that it becomes like this? Thanks a lot for your help!

我怎样才能让它变成这样？非常感谢你的帮助！

Answer 1

回答by piRSquared

list_2_nodups = list_2.drop_duplicates()
pd.merge(list_1 , list_2_nodups , on=['email_address'])

The duplicate rows are expected. Each john smith in list_1matches with each john smith in list_2. I had to drop the duplicates in one of the lists. I chose list_2.

预计会出现重复的行。每个 john smithlist_1与 . 的每个 john smith 匹配list_2。我不得不在其中一个列表中删除重复项。我选择了list_2。

在python中合并数据帧时出现重复的行

提问by Roberto Bertinetti

回答by piRSquared

相关推荐

最近更新

标签

在python中合并数据帧时出现重复的行

提问by Roberto Bertinetti

回答by piRSquared

相关推荐

在 Python 中循环遍历 JSON 数组

Python 从 Keras 功能模型中获取类标签

Python 在 virtualenv 中运行 Jupyter notebook：安装的 sklearn 模块不可用

Python 如何计算pyspark数据帧中每个不同值的计数？

相关推荐

最近更新

标签