pandas Python:如何比较两个数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34161091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python : How to compare two data frames
提问by AbtPst
I have two data frames:
我有两个数据框:
df1
A1 B1
1 a
2 s
3 d
and
和
df2
A1 B1
1 a
2 x
3 d
I want to compare df1 and df2 on column B1. The column A1 can be used to join. I want to know:
我想比较 B1 列上的 df1 和 df2。A1 列可用于连接。我想知道:
- Which rows are different in df1 and df2 with respect to column B1?
- If there is a mismatch in the values of column A1. For example whether df2 is missing some values that are there in df1 and vice versa. And if so, which ones?
- df1 和 df2 中哪些行相对于列 B1 不同?
- 如果 A1 列的值不匹配。例如 df2 是否缺少 df1 中存在的某些值,反之亦然。如果有,是哪些?
I tried using merge and join but that is not what I am looking for.
我尝试使用合并和连接,但这不是我要找的。
回答by Alexander
I've edited the raw data to illustrate the case of A1 keys in one dataframe but not the other.
我编辑了原始数据以说明一个数据帧中 A1 键的情况,而不是另一个。
When doing your merge, you want to specify an 'outer' merge so that you can see these items with an A1 key in one dataframe but not the other.
在进行合并时,您希望指定一个“外部”合并,以便您可以在一个数据框中使用 A1 键查看这些项目,但在另一个中看不到。
I've included the suffixes '_1' and '_2' to indicate the dataframe source (_1 = df1
and _2 = df2
) of column B1
.
我已经包含了后缀 '_1' 和 '_2' 来指示column的数据帧源(_1 =df1
和 _2 = df2
)B1
。
df1 = pd.DataFrame({'A1': [1, 2, 3, 4], 'B1': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'A1': [1, 2, 3, 5], 'B1': ['a', 'd', 'c', 'e']})
df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'])
df3['check'] = df3.B1_1 == df3.B1_2
>>> df3
A1 B1_1 B1_2 check
0 1 a a True
1 2 b d False
2 3 c c True
3 4 d NaN False
4 5 NaN e False
To check for missing A1 keys in df1
and df2
:
要检查缺失A1键df1
和df2
:
# A1 value missing in `df1`
>>> d3[df3.B1_1.isnull()]
A1 B1_1 B1_2 check
4 5 NaN e False
# A1 value missing in `df2`
>>> df3[df3.B1_2.isnull()]
A1 B1_1 B1_2 check
3 4 d NaN False
EDITThanks to @EdChum (the source of all Pandas knowledge...).
编辑感谢@EdChum(所有Pandas知识的来源......)。
df3 = df1.merge(df2, how='outer', on='A1', suffixes=['_1', '_2'], indicator=True)
df3['check'] = df3.B1_1 == df3.B1_2
>>> df3
A1 B1_1 B1_2 _merge check
0 1 a a both True
1 2 b d both False
2 3 c c both True
3 4 d NaN left_only False
4 5 NaN e right_only False