database 如何检查两个数据帧是否相等

Question

提问by Waldir Leoncio

Say I have large datasets in R and I just want to know whether two of them they are the same. I use this often when I'm experimenting different algorithms to achieve the same result. For example, say we have the following datasets:

假设我在 R 中有大量数据集，我只想知道它们中的两个是否相同。当我尝试不同的算法以达到相同的结果时，我经常使用它。例如，假设我们有以下数据集：

df1 <- data.frame(num = 1:5, let = letters[1:5])
df2 <- df1
df3 <- data.frame(num = c(1:5, NA), let = letters[1:6])
df4 <- df3

So this is what I do to compare them:

所以这就是我用来比较它们的方法：

table(x == y, useNA = 'ifany')

Which works great when the datasets have no NAs:

当数据集没有 NA 时，这很有效：

> table(df1 == df2, useNA = 'ifany')
TRUE 
  10

But not so much when they have NAs:

但是当他们有 NA 时就没有那么多了：

> table(df3 == df4, useNA = 'ifany')
TRUE <NA> 
  11    1

In the example, it's easy to dismiss the NAas not a problem since we know that both dataframes areequal. The problem is that NA == <anything>yields NA, so whenever one of the datasets has an NA, it doesn't matter what the other one has on that same position, the result is always going to be NA.

在示例中，很容易将NA视为不是问题，因为我们知道两个数据帧是相等的。问题是NA == <anything>yields NA，所以只要其中一个数据集有一个NA，另一个数据集在同一位置上的位置无关紧要，结果总是会是NA。

So using table()to compare datasets doesn't seem ideal to me. How can I better check if two data frames are identical?

因此，table()用于比较数据集对我来说似乎并不理想。如何更好地检查两个数据帧是否相同？

P.S.: Note this is not a duplicate of R - comparing several datasets, Comparing 2 datasets in Ror Compare datasets in R

PS：请注意，这不是R - 比较多个数据集、比较 R 中的 2 个数据集或比较 R 中的数据集的副本

Answer 1

回答by TheComeOnMan

Look up all.equal. It has some riders but it might work for you.

查找 all.equal。它有一些骑手，但它可能对你有用。

all.equal(df3,df4)
# [1] TRUE
all.equal(df2,df1)
# [1] TRUE

Answer 2

回答by Waldir Leoncio

As Metricspointed out, one could also use identical()to compare the datasets. The difference between this approach and that of Codoremifais that identical()will just yield TRUEof FALSE, depending whether the objects being compared are identical or not, whereas all.equal()will either return TRUEor hints about the differences between the objects. For instance, consider the following:

正如Metrics指出的那样，还可以identical()用来比较数据集。这种方法和其之间的差Codoremifa是identical()将刚刚得到TRUE的FALSE，这取决于是否被比较的对象是相同或不同，而all.equal()将要么返回TRUE或关于对象之间的差异提示。例如，请考虑以下事项：

> identical(df1, df3)
[1] FALSE

> all.equal(df1, df3)
[1] "Attributes: < Component 2: Numeric: lengths (5, 6) differ >"                                
[2] "Component 1: Numeric: lengths (5, 6) differ"                                                
[3] "Component 2: Lengths: 5, 6"                                                                 
[4] "Component 2: Attributes: < Component 2: Lengths (5, 6) differ (string compare on first 5) >"
[5] "Component 2: Lengths (5, 6) differ (string compare on first 5)"

Moreover, from what I've tested identical()seems to run much faster than all.equal().

此外，从我测试过的identical()来看，它的运行速度似乎比all.equal().

database 如何检查两个数据帧是否相等

提问by Waldir Leoncio

回答by TheComeOnMan

回答by Waldir Leoncio

相关推荐

最近更新

标签

database 如何检查两个数据帧是否相等

提问by Waldir Leoncio

回答by TheComeOnMan

回答by Waldir Leoncio

相关推荐

database 每种类型数据库的实际示例（真实案例）

使用 VBA 将值从一个表粘贴到另一个表

database 如何恢复损坏的 SQLite3 数据库？

VBA EXCEL 提示用户响应选择文件夹并将路径作为字符串变量返回

相关推荐

最近更新

标签