Python 熊猫“只能比较相同标记的数据帧对象”错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18548370/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:00:50  来源:igfitidea点击:

Pandas "Can only compare identically-labeled DataFrame objects" error

pythonpandas

提问by user1804633

I'm using Pandas to compare the outputs of two files loaded into two data frames (uat, prod): ...

我正在使用 Pandas 来比较加载到两个数据帧(uat、prod)中的两个文件的输出:...

uat = uat[['Customer Number','Product']]
prod = prod[['Customer Number','Product']]
print uat['Customer Number'] == prod['Customer Number']
print uat['Product'] == prod['Product']
print uat == prod

The first two match exactly:
74357    True
74356    True
Name: Customer Number, dtype: bool
74357    True
74356    True
Name: Product, dtype: bool

For the third print, I get an error: Can only compare identically-labeled DataFrame objects. If the first two compared fine, what's wrong with the 3rd?

对于第三次打印,我收到一个错误:只能比较标记相同的 DataFrame 对象。如果前两个比较好,那么第三个有什么问题?

Thanks

谢谢

采纳答案by Andy Hayden

Here's a small example to demonstrate this (which only applied to DataFrames, not Series, until Pandas 0.19 where it applies to both):

这是一个演示这一点的小示例(它仅适用于 DataFrames,而不适用于 Series,直到 Pandas 0.19 适用于两者):

In [1]: df1 = pd.DataFrame([[1, 2], [3, 4]])

In [2]: df2 = pd.DataFrame([[3, 4], [1, 2]], index=[1, 0])

In [3]: df1 == df2
Exception: Can only compare identically-labeled DataFrame objects

One solution is to sort the indexfirst (Note: some functions require sorted indexes):

一种解决方案是先对索引进行排序(注意:有些函数需要排序索引):

In [4]: df2.sort_index(inplace=True)

In [5]: df1 == df2
Out[5]: 
      0     1
0  True  True
1  True  True

Note: ==is also sensitive to the order of columns, so you may have to use sort_index(axis=1):

注意:对列的顺序==也很敏感,因此您可能必须使用sort_index(axis=1)

In [11]: df1.sort_index().sort_index(axis=1) == df2.sort_index().sort_index(axis=1)
Out[11]: 
      0     1
0  True  True
1  True  True

Note: This can still raise (if the index/columns aren't identically labelled after sorting).

注意:这仍然可以引发(如果排序后索引/列的标签不同)。

回答by CoreDump

You can also try dropping the index column if it is not needed to compare:

如果不需要比较,您也可以尝试删除索引列:

print(df1.reset_index(drop=True) == df2.reset_index(drop=True))

I have used this same technique in a unit test like so:

我在单元测试中使用了同样的技术,如下所示:

from pandas.util.testing import assert_frame_equal

assert_frame_equal(actual.reset_index(drop=True), expected.reset_index(drop=True))