pandas.DataFrame.equals 的合约

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26552116/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:36:34  来源:igfitidea点击:

contract of pandas.DataFrame.equals

pythonpandas

提问by jwilner

I have a simple test case of a function which returns a df that can potentially contain NaN. I was testing if the output and expected output were equal.

我有一个函数的简单测试用例,它返回一个可能包含 NaN 的 df。我正在测试输出和预期输出是否相等。

>>> output
Out[1]: 
      r   t  ts  tt  ttct
0  2048  30   0  90     1
1  4096  90   1  30     1
2     0  70   2  65     1

[3 rows x 5 columns]
>>> expected
Out[2]: 
      r   t  ts  tt  ttct
0  2048  30   0  90     1
1  4096  90   1  30     1
2     0  70   2  65     1

[3 rows x 5 columns]
>>> output == expected
Out[3]: 
      r     t    ts    tt  ttct
0  True  True  True  True  True
1  True  True  True  True  True
2  True  True  True  True  True

However, I can't simply rely on the ==operator because of NaNs. I was under the impression that the appropriate way to resolve this was by using the equals method. From the documentation:

但是,==由于 NaN ,我不能简单地依赖运算符。我的印象是,解决这个问题的适当方法是使用 equals 方法。从文档

pandas.DataFrame.equals
DataFrame.equals(other)
Determines if two NDFrame objects contain the same elements. NaNs in the same location are considered equal.

Nonetheless:

尽管如此:

>>> expected.equals(log_events)
Out[4]: False

A little digging around reveals the difference in the frames:

稍微挖掘一下就会发现帧的差异:

>>> output._data
Out[5]: 
BlockManager
Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object')
Axis 1: Int64Index([0, 1, 2], dtype='int64')
FloatBlock: [r], 1 x 3, dtype: float64
IntBlock: [t, ts, tt, ttct], 4 x 3, dtype: int64
>>> expected._data
Out[6]: 
BlockManager
Items: Index([u'r', u't', u'ts', u'tt', u'ttct'], dtype='object')
Axis 1: Int64Index([0, 1, 2], dtype='int64')
IntBlock: [r, t, ts, tt, ttct], 5 x 3, dtype: int64

Force that output float block to int, or force the expected int block to float, and the test passes.

强制输出 float 块为 int,或强制预期的 int 块为 float,测试通过。

Obviously, there are different senses of equality, and the sort of test that DataFrame.equalsperforms could be useful in some cases. Nonetheless, the disparity between ==and DataFrame.equalsis frustrating to me and seems like an inconsistency. In pseudo-code, I would expect its behavior to match:

显然,平等有不同的含义,DataFrame.equals执行的测试类型在某些情况下可能有用。尽管如此,之间的差距==,并DataFrame.equals是令人沮丧的我,似乎是一个矛盾。在伪代码中,我希望它的行为匹配:

(self.index == other.index).all() \
and (self.columns == other.columns).all() \
and (self.values.fillna(SOME_MAGICAL_VALUE) == other.values.fillna(SOME_MAGICAL_VALUE)).all().all()

However, it doesn't. Am I wrong in my thinking, or is this an inconsistency in the Pandas API? Moreover, what IS the test I should be performing for my purposes, given the possible presence of NaN?

然而,事实并非如此。是我的想法错了,还是 Pandas API 不一致?此外,考虑到可能存在 NaN,我应该为我的目的执行什么测试?

回答by Jeff

.equals()does just what it says. It tests for exact equality among elements, positioning of nans (and NaTs), dtype equality, and index equality. Think of this as as df is df2type of test but they don't have to actually be the same object, IOW, df.equals(df.copy())IS always True.

.equals()做它所说的。它测试元素之间的精确相等性、nans(和 NaTs)的定位、dtype 相等性和索引相等性。将此视为df is df2测试类型,但它们实际上不必是同一个对象,IOW,df.equals(df.copy())始终为真。

Your example fails because different dtypes are not equal (they may be equivalent though). So you canuse com.array_equivalentfor this, or (df == df2).all().all()if you don't have nans.

您的示例失败,因为不同的 dtypes 不相等(尽管它们可能等效)。所以你可以使用com.array_equivalent这个,或者(df == df2).all().all()如果你没有nans.

This is a replacement for np.array_equalwhich is broken for nan positional detections (and object dtypes).

这是np.array_equalnan 位置检测(和对象 dtypes)的替代品。

It is mostly used internally. That said if you like an enhancement for equivalence (e.g. the elements are equivalent in the ==sense and nanpositionals match), pls open an issue on github. (and even better submit a PR!)

它主要用于内部。也就是说,如果您喜欢等效性的增强(例如,元素在==意义和nan位置匹配上是等效的),请在 github 上打开一个问题。(甚至更好地提交 PR!)

回答by ClementWalter

I used a workaround digging into the MagicMockinstance:

我使用了一种解决方法来挖掘MagicMock实例:

assert mock_instance.call_count == 1
call_args = mock_instance.call_args[0]
call_kwargs = mock_instance.call_args[1]
pd.testing.assert_frame_equal(call_kwargs['dataframe'], pd.DataFrame())