pandas Python:如何比较两个数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34161091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:21:20  来源:igfitidea点击:

Python : How to compare two data frames

pythonpandasdataframe

提问by AbtPst

I have two data frames:

我有两个数据框:

df1

A1    B1
1     a
2     s
3     d

and

df2

A1    B1
1     a
2     x
3     d

I want to compare df1 and df2 on column B1. The column A1 can be used to join. I want to know:

我想比较 B1 列上的 df1 和 df2。A1 列可用于连接。我想知道:

  1. Which rows are different in df1 and df2 with respect to column B1?
  2. If there is a mismatch in the values of column A1. For example whether df2 is missing some values that are there in df1 and vice versa. And if so, which ones?
  1. df1 和 df2 中哪些行相对于列 B1 不同?
  2. 如果 A1 列的值不匹配。例如 df2 是否缺少 df1 中存在的某些值,反之亦然。如果有,是哪些?

I tried using merge and join but that is not what I am looking for.

我尝试使用合并和连接,但这不是我要找的。

回答by Alexander

I've edited the raw data to illustrate the case of A1 keys in one dataframe but not the other.

我编辑了原始数据以说明一个数据帧中 A1 键的情况,而不是另一个。

When doing your merge, you want to specify an 'outer' merge so that you can see these items with an A1 key in one dataframe but not the other.

在进行合并时,您希望指定一个“外部”合并,以便您可以在一个数据框中使用 A1 键查看这些项目,但在另一个中看不到。

I've included the suffixes '_1' and '_2' to indicate the dataframe source (_1 = df1and _2 = df2) of column B1.

我已经包含了后缀 '_1' 和 '_2' 来指示column的数据帧源(_1 =df1和 _2 = df2B1

df1 = pd.DataFrame({'A1': [1, 2, 3, 4], 'B1': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'A1': [1, 2, 3, 5], 'B1': ['a', 'd', 'c', 'e']})

df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'])
df3['check'] = df3.B1_1 == df3.B1_2

>>> df3
   A1 B1_1 B1_2  check
0   1    a    a   True
1   2    b    d  False
2   3    c    c   True
3   4    d  NaN  False
4   5  NaN    e  False

To check for missing A1 keys in df1and df2:

要检查缺失A1键df1df2

# A1 value missing in `df1`
>>> d3[df3.B1_1.isnull()]
   A1 B1_1 B1_2  check
4   5  NaN    e  False

# A1 value missing in `df2`
>>> df3[df3.B1_2.isnull()]
   A1 B1_1 B1_2  check
3   4    d  NaN  False

EDITThanks to @EdChum (the source of all Pandas knowledge...).

编辑感谢@EdChum(所有Pandas知识的来源......)。

df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'], indicator=True)
df3['check'] = df3.B1_1 == df3.B1_2

>>> df3
   A1 B1_1 B1_2      _merge  check
0   1    a    a        both   True
1   2    b    d        both  False
2   3    c    c        both   True
3   4    d  NaN   left_only  False
4   5  NaN    e  right_only  False