Pandas 数据框值相等性测试

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19928284/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:19:35  来源:igfitidea点击:

Pandas dataframe values equality test

pythonpandas

提问by MarkNS

Another Pandas question!

另一个Pandas问题!

I am writing some unit tests that test two data frames for equality, however, the test does not appear to look at the values of the data frame, only the structure:

我正在编写一些单元测试来测试两个数据框的相等性,但是,该测试似乎没有查看数据框的值,只查看结构:

dates = pd.date_range('20130101', periods=6)

df1 = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))

print df1
print df2
self.assertItemsEqual(df1, df2)

-->True

-->真

Do I need to convert the data frames to another data structure before asserting equality?

在断言相等之前是否需要将数据帧转换为另一种数据结构?

回答by MarkNS

Ah, of course there is a solution for this already:

啊,当然已经有一个解决方案了:

from pandas.util.testing import assert_frame_equal

回答by Rustam Aliyev

While assert_frame_equal is useful in unit tests, I found the following useful on analysis as one might want to further check which values are not equal: df1.equals(df2)

虽然 assert_frame_equal 在单元测试中很有用,但我发现以下内容对分析很有用,因为人们可能想进一步检查哪些值不相等: df1.equals(df2)

回答by ankostis

Also numpy's utilities work:

numpy 的实用程序也可以工作:

import numpy.testing as npt

npt.assert_array_equal(df1, df2)

回答by Surya

In [62]: import numpy as np

In [63]: import pandas as pd

In [64]: np.random.seed(30)

In [65]: df_old = pd.DataFrame(np.random.randn(4,5))

In [66]: df_old
Out[66]: 
          0         1         2         3         4
0 -1.264053  1.527905 -0.970711  0.470560 -0.100697
1  0.303793 -1.725962  1.585095  0.134297 -1.106855
2  1.578226  0.107498 -0.764048 -0.775189  1.383847
3  0.760385 -0.285646  0.538367 -2.083897  0.937782

In [67]: np.random.seed(30)

In [68]: df_new = pd.DataFrame(np.random.randn(4,5))

In [69]: df_new
Out[69]: 
          0         1         2         3         4
0 -1.264053  1.527905 -0.970711  0.470560 -0.100697
1  0.303793 -1.725962  1.585095  0.134297 -1.106855
2  1.578226  0.107498 -0.764048 -0.775189  1.383847
3  0.760385 -0.285646  0.538367 -2.083897  0.937782

In [70]: df_old.equals(df_new) #Equality check here, returns boolean expression: True/False
Out[70]: True