Pandas 数据框值相等性测试
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19928284/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe values equality test
提问by MarkNS
Another Pandas question!
另一个Pandas问题!
I am writing some unit tests that test two data frames for equality, however, the test does not appear to look at the values of the data frame, only the structure:
我正在编写一些单元测试来测试两个数据框的相等性,但是,该测试似乎没有查看数据框的值,只查看结构:
dates = pd.date_range('20130101', periods=6)
df1 = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
print df1
print df2
self.assertItemsEqual(df1, df2)
-->True
-->真
Do I need to convert the data frames to another data structure before asserting equality?
在断言相等之前是否需要将数据帧转换为另一种数据结构?
回答by MarkNS
Ah, of course there is a solution for this already:
啊,当然已经有一个解决方案了:
from pandas.util.testing import assert_frame_equal
回答by Rustam Aliyev
While assert_frame_equal is useful in unit tests, I found the following useful on analysis as one might want to further check which values are not equal:
df1.equals(df2)
虽然 assert_frame_equal 在单元测试中很有用,但我发现以下内容对分析很有用,因为人们可能想进一步检查哪些值不相等:
df1.equals(df2)
回答by ankostis
Also numpy's utilities work:
numpy 的实用程序也可以工作:
import numpy.testing as npt
npt.assert_array_equal(df1, df2)
回答by Surya
In [62]: import numpy as np
In [63]: import pandas as pd
In [64]: np.random.seed(30)
In [65]: df_old = pd.DataFrame(np.random.randn(4,5))
In [66]: df_old
Out[66]:
0 1 2 3 4
0 -1.264053 1.527905 -0.970711 0.470560 -0.100697
1 0.303793 -1.725962 1.585095 0.134297 -1.106855
2 1.578226 0.107498 -0.764048 -0.775189 1.383847
3 0.760385 -0.285646 0.538367 -2.083897 0.937782
In [67]: np.random.seed(30)
In [68]: df_new = pd.DataFrame(np.random.randn(4,5))
In [69]: df_new
Out[69]:
0 1 2 3 4
0 -1.264053 1.527905 -0.970711 0.470560 -0.100697
1 0.303793 -1.725962 1.585095 0.134297 -1.106855
2 1.578226 0.107498 -0.764048 -0.775189 1.383847
3 0.760385 -0.285646 0.538367 -2.083897 0.937782
In [70]: df_old.equals(df_new) #Equality check here, returns boolean expression: True/False
Out[70]: True

