Python Pandas：两个数据帧的差异

Question

提问by Andy

I need to compare two dataframes of different size row-wise and print out non matching rows. Lets take the following two:

我需要逐行比较两个不同大小的数据帧并打印出不匹配的行。让我们采取以下两个：

df1 = DataFrame({
'Buyer': ['Carl', 'Carl', 'Carl'],
'Quantity': [18, 3, 5, ]})

df2 = DataFrame({
'Buyer': ['Carl', 'Mark', 'Carl', 'Carl'],
'Quantity': [2, 1, 18, 5]})

What is the most efficient way to row-wise over df2 and print out rows not in df1 e.g:

在 df2 上逐行打印并打印出不在 df1 中的行的最有效方法是什么，例如：

Buyer     Quantity 
Carl         2
Mark         1

Important: I do not want to have row:

重要：我不想有行：

Buyer     Quantity 
Carl         3

included in the diff:

包含在差异中：

I have already tried: Comparing two dataframes of different length row by row and adding columns for each row with equal valueand Outputting difference in two Pandas dataframes side by side - highlighting the difference

我已经尝试过：逐行比较两个不同长度的数据帧并为每行添加具有相等值的列并并排输出两个 Pandas 数据帧中的差异 - 突出显示差异

But these do not match with my problem.

但这些与我的问题不符。

Thank you

谢谢

Andy

安迪

Answer 1

回答by EdChum

mergethe 2 dfs using method 'outer' and pass param indicator=Truethis will tell you whether the rows are present in both/left only/right only, you can then filter the merged df after:

merge使用方法'outer'并传递参数的2个dfsindicator=True这将告诉您行是否存在于两个/仅左侧/仅右侧，然后您可以在以下之后过滤合并的df：

In [22]:
merged = df1.merge(df2, indicator=True, how='outer')
merged[merged['_merge'] == 'right_only']

Out[22]:
  Buyer  Quantity      _merge
3  Carl         2  right_only
4  Mark         1  right_only

Answer 2

回答by A. Nadjar

you may find this as the best:

你可能会发现这是最好的：

df2[ ~df2.isin(df1)].dropna()

Answer 3

回答by Shovalt

diff = set(zip(df2.Buyer, df2.Quantity)) - set(zip(df1.Buyer, df1.Quantity))

This is the first solution that came to mind. You can then put the diff set back in a DF for presentation.

这是想到的第一个解决方案。然后，您可以将差异集放回 DF 中进行演示。

Answer 4

回答by Shayan Amani

@EdChum's answer is self-explained. But using not 'both'condition makes more sense and you do not need to care about the order of comparison, and this is what a real diffsupposed to be. For the sake of answering your question:

@EdChum 的回答是不言自明的。但是使用not 'both'条件更有意义，您不需要关心比较的顺序，这就是真正的差异应该是什么。为了回答你的问题：

merged = df1.merge(df2, indicator=True, how='outer')
merged.loc = [merged['_merge'] != 'both']

Answer 5

回答by Yasir

Try the following if you only care about adding the new Buyers to the other df:

如果您只关心将新买家添加到另一个 df，请尝试以下操作：

df_delta=df2[df2['Buyer'].apply(lambda x: x not in df1['Buyer'].values)]

Python Pandas：两个数据帧的差异

提问by Andy

回答by EdChum

回答by A. Nadjar

回答by Shovalt

回答by Shayan Amani

回答by Yasir

相关推荐

最近更新

标签

Python Pandas：两个数据帧的差异

提问by Andy

回答by EdChum

回答by A. Nadjar

回答by Shovalt

回答by Shayan Amani

回答by Yasir

相关推荐

Python 在 Docker 容器中安装 OpenCV

Python “进程已完成，退出代码为 1”是什么意思？

Python 按索引遍历数据帧

Python pandas.read_excel 参数“sheet_name”不起作用

相关推荐

最近更新

标签