Python Pandas:两个数据帧的差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36891977/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:31:01  来源:igfitidea点击:

Pandas: Diff of two Dataframes

pythonpandasdataframediff

提问by Andy

I need to compare two dataframes of different size row-wise and print out non matching rows. Lets take the following two:

我需要逐行比较两个不同大小的数据帧并打印出不匹配的行。让我们采取以下两个:

df1 = DataFrame({
'Buyer': ['Carl', 'Carl', 'Carl'],
'Quantity': [18, 3, 5, ]})

df2 = DataFrame({
'Buyer': ['Carl', 'Mark', 'Carl', 'Carl'],
'Quantity': [2, 1, 18, 5]})

What is the most efficient way to row-wise over df2 and print out rows not in df1 e.g:

在 df2 上逐行打印并打印出不在 df1 中的行的最有效方法是什么,例如:

Buyer     Quantity 
Carl         2
Mark         1

Important: I do not want to have row:

重要:我不想有行:

Buyer     Quantity 
Carl         3

included in the diff:

包含在差异中:

I have already tried: Comparing two dataframes of different length row by row and adding columns for each row with equal valueand Outputting difference in two Pandas dataframes side by side - highlighting the difference

我已经尝试过: 逐行比较两个不同长度的数据帧并为每行添加具有相等值的列并排输出两个 Pandas 数据帧中的差异 - 突出显示差异

But these do not match with my problem.

但这些与我的问题不符。

Thank you

谢谢

Andy

安迪

回答by EdChum

mergethe 2 dfs using method 'outer' and pass param indicator=Truethis will tell you whether the rows are present in both/left only/right only, you can then filter the merged df after:

merge使用方法'outer'并传递参数的2个dfsindicator=True这将告诉您行是否存在于两个/仅左侧/仅右侧,然后您可以在以下之后过滤合并的df:

In [22]:
merged = df1.merge(df2, indicator=True, how='outer')
merged[merged['_merge'] == 'right_only']

Out[22]:
  Buyer  Quantity      _merge
3  Carl         2  right_only
4  Mark         1  right_only

回答by A. Nadjar

you may find this as the best:

你可能会发现这是最好的:

df2[ ~df2.isin(df1)].dropna()

回答by Shovalt

diff = set(zip(df2.Buyer, df2.Quantity)) - set(zip(df1.Buyer, df1.Quantity))

This is the first solution that came to mind. You can then put the diff set back in a DF for presentation.

这是想到的第一个解决方案。然后,您可以将差异集放回 DF 中进行演示。

回答by Shayan Amani

@EdChum's answer is self-explained. But using not 'both'condition makes more sense and you do not need to care about the order of comparison, and this is what a real diffsupposed to be. For the sake of answering your question:

@EdChum 的回答是不言自明的。但是使用not 'both'条件更有意义,您不需要关心比较的顺序,这就是真正的差异应该是什么。为了回答你的问题:

merged = df1.merge(df2, indicator=True, how='outer')
merged.loc = [merged['_merge'] != 'both']

回答by Yasir

Try the following if you only care about adding the new Buyers to the other df:

如果您只关心将新买家添加到另一个 df,请尝试以下操作:

df_delta=df2[df2['Buyer'].apply(lambda x: x not in df1['Buyer'].values)]