database 如何检查两个数据帧是否相等

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19119320/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 09:05:47  来源:igfitidea点击:

How to check if two data frames are equal

databaserdatasetcomparedataframe

提问by Waldir Leoncio

Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets:

假设我在 R 中有大量数据集,我只想知道它们中的两个是否相同。当我尝试不同的算法以达到相同的结果时,我经常使用它。例如,假设我们有以下数据集:

df1 <- data.frame(num = 1:5, let = letters[1:5])
df2 <- df1
df3 <- data.frame(num = c(1:5, NA), let = letters[1:6])
df4 <- df3

So this is what I do to compare them:

所以这就是我用来比较它们的方法:

table(x == y, useNA = 'ifany')

Which works great when the datasets have no NAs:

当数据集没有 NA 时,这很有效:

> table(df1 == df2, useNA = 'ifany')
TRUE 
  10 

But not so much when they have NAs:

但是当他们有 NA 时就没有那么多了:

> table(df3 == df4, useNA = 'ifany')
TRUE <NA> 
  11    1 

In the example, it's easy to dismiss the NAas not a problem since we know that both dataframes areequal. The problem is that NA == <anything>yields NA, so whenever one of the datasets has an NA, it doesn't matter what the other one has on that same position, the result is always going to be NA.

在示例中,很容易将NA视为不是问题,因为我们知道两个数据帧相等的。问题是NA == <anything>yields NA,所以只要其中一个数据集有一个NA,另一个数据集在同一位置上的位置无关紧要,结果总是会是NA

So using table()to compare datasets doesn't seem ideal to me. How can I better check if two data frames are identical?

因此,table()用于比较数据集对我来说似乎并不理想。如何更好地检查两个数据帧是否相同?

P.S.: Note this is not a duplicate of R - comparing several datasets, Comparing 2 datasets in Ror Compare datasets in R

PS:请注意,这不是R - 比较多个数据集比较 R 中的 2 个数据集比较R 中的数据集的副本

回答by TheComeOnMan

Look up all.equal. It has some riders but it might work for you.

查找 all.equal。它有一些骑手,但它可能对你有用。

all.equal(df3,df4)
# [1] TRUE
all.equal(df2,df1)
# [1] TRUE

回答by Waldir Leoncio

As Metricspointed out, one could also use identical()to compare the datasets. The difference between this approach and that of Codoremifais that identical()will just yield TRUEof FALSE, depending whether the objects being compared are identical or not, whereas all.equal()will either return TRUEor hints about the differences between the objects. For instance, consider the following:

正如Metrics指出的那样,还可以identical()用来比较数据集。这种方法和其之间的差Codoremifaidentical()将刚刚得到TRUEFALSE,这取决于是否被比较的对象是相同或不同,而all.equal()将要么返回TRUE或关于对象之间的差异提示。例如,请考虑以下事项:

> identical(df1, df3)
[1] FALSE

> all.equal(df1, df3)
[1] "Attributes: < Component 2: Numeric: lengths (5, 6) differ >"                                
[2] "Component 1: Numeric: lengths (5, 6) differ"                                                
[3] "Component 2: Lengths: 5, 6"                                                                 
[4] "Component 2: Attributes: < Component 2: Lengths (5, 6) differ (string compare on first 5) >"
[5] "Component 2: Lengths (5, 6) differ (string compare on first 5)"   

Moreover, from what I've tested identical()seems to run much faster than all.equal().

此外,从我测试过的identical()来看,它的运行速度似乎比all.equal().