MySQL 如何验证两个表是否具有完全相同的数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2129717/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to verify if two tables have exactly the same data?
提问by SiLent SoNG
Basically, we have one table (original table
) and it is backed up into another table (backup table
); thus the two tables have exactly the same schema.
基本上,我们有一个表 ( original table
) 并且它被备份到另一个表 ( backup table
) 中;因此这两个表具有完全相同的模式。
In the beginning, both tables (original table
and backup table
) contains exactly the same set of data. After some time for some reason, I need to verify whether dataset in the original table
has changed or not.
一开始,两个表(original table
和backup table
)都包含完全相同的数据集。由于某种原因一段时间后,我需要验证数据集是否original table
已更改。
In order to do this, I have to compare the dataset in the original table
against the backup table
.
为此,我必须original table
将backup table
.
Let's say the original table
has the following schema:
假设original table
具有以下架构:
create table LemmasMapping (
lemma1 int,
lemma2 int,
index ix_lemma1 using btree (lemma1),
index ix_lemma2 using btree (lemma2)
)
How could I achieve the dataset comparison?
我怎样才能实现数据集比较?
Update: the table does not have a primary key. It simply stores mappings between two ids.
更新:该表没有主键。它只是存储两个 id 之间的映射。
采纳答案by Mike Thompson
I would write three queries.
我会写三个查询。
An inner join to pick up the rows where the primary key exists in both tables, but there is a difference in the value of one or more of the other columns. This would pick up changed rows in original.
A left outer join to pick up the rows that are in the original tables, but not in the backup table (i.e. a row in original has a primary key that does not exist in backup). This would return rows inserted into the original.
A right outer join to pick up the rows in backup which no longer exist in the original. This would return rows that have been deleted from the original.
一个内部联接,用于选取两个表中存在主键的行,但一个或多个其他列的值存在差异。这将在原始中获取更改的行。
左外连接选取原始表中但不在备份表中的行(即原始表中的行具有备份中不存在的主键)。这将返回插入到原始行中的行。
一个右外连接,用于在备份中提取原始数据中不再存在的行。这将返回已从原始行中删除的行。
You could union the three queries together to return a single result set. If you did this you would need to add a column to indicate what type of row it is (updated, inserted or deleted).
您可以将三个查询结合在一起以返回单个结果集。如果你这样做了,你需要添加一列来指示它是什么类型的行(更新、插入或删除)。
With a bit of effort, you might be able to do this in one query using a full outer join. Be careful with outer joins, as they behave differently in different SQL engines. Predicates put in the where clause, instead of the join clause can sometimes turn your outer join into an inner join.
稍加努力,您就可以使用完全外连接在一个查询中完成此操作。小心外连接,因为它们在不同的 SQL 引擎中表现不同。放在 where 子句而不是 join 子句中的谓词有时可以将您的外连接变成内连接。
回答by Josh Davis
You can just use CHECKSUM TABLEand compare the results. You can even alter the tableto enable live checksums so that they are continuously available.
您可以只使用CHECKSUM TABLE并比较结果。您甚至可以更改表以启用实时校验和,以便它们持续可用。
CHECKSUM TABLE original_table, backup_table;
It doesn't require the tables to have a primary key.
它不需要表具有主键。
回答by Gopal
SELECT * FROM Table1
UNION
SELECT * FROM Table2
If you get records greater than any of two tables, they don't have same data.
如果您获得的记录大于两个表中的任何一个,则它们没有相同的数据。
回答by Ulghar
Try the following to compare two tables:
尝试以下操作来比较两个表:
SELECT 'different' FROM DUAL WHERE EXISTS(
SELECT * FROM (
SELECT /*DISTINCT*/ +1 AS chk,a.c1,a.c2,a.c3 FROM a
UNION ALL
SELECT /*DISTINCT*/ +1 AS chk,b.c1,b.c2,b.c3 FROM b
) c
GROUP BY c1,c2,c3
HAVING SUM(chk)<>2
)
UNION SELECT 'equal' FROM DUAL
LIMIT 1;
回答by Kyle Butt
select count(*)
from lemmas as original_table
full join backup_table using (lemma_id)
where backup_table.lemma_id is null
or original_table.lemma_id is null
or original_table.lemma != backup_table.lemma
The full join / check for null should cover additions or deletions as well as changes.
完全连接/检查空值应涵盖添加或删除以及更改。
- backup.id is null = addition
- original.id is null = deletion
- neither null = change
- backup.id 为空 = 添加
- original.id 为空 = 删除
- 既不为空 = 改变
回答by indra
For the lazier or more SQL-averse developer working with MS SQL Server, I would recommend SQL Delta (www.sqldelta.com) for this and any other database-diff type work. It has a great GUI, is quick and accurate and can diff all database objects, generate and run the necessary change scripts, synchronise entire databases. Its the next best thing to a DBA ;-)
对于使用 MS SQL Server 的懒惰或更不喜欢 SQL 的开发人员,我会推荐 SQL Delta (www.sqldelta.com) 来完成这项工作和任何其他数据库差异类型的工作。它有一个很棒的 GUI,快速准确,可以区分所有数据库对象,生成和运行必要的更改脚本,同步整个数据库。它是 DBA 的次佳选择 ;-)
I think there is a similar tool available from RedGate called SQL Compare. I believe someeditions of the latest version of Visual Studio (2010) also include a very similar tool.
我认为 RedGate 提供了一个类似的工具,称为 SQL 比较。我相信最新版本的 Visual Studio (2010) 的某些版本也包含一个非常相似的工具。
回答by Hank
Please try the following method for determining if two tables are exactly the same, when there is no primary key of any kind and there are no duplicate rows within a table, using the below logic:
请尝试以下方法来确定两个表是否完全相同,当没有任何类型的主键并且表中没有重复的行时,使用以下逻辑:
Step 1 - Test for Duplicate Rows on TABLEA
步骤 1 - 测试 TABLEA 上的重复行
If SELECT DISTINCT * FROM TABLEA
如果 SELECT DISTINCT * FROM TABLEA
has the same row count as
具有相同的行数
SELECT * FROM TABLEA
从 TABLEA 中选择 *
then go to the next step, otherwise you can't use this method...
然后进行下一步,否则无法使用此方法...
Step 2 - Test for Duplicate Rows on TABLEB
步骤 2 - 测试 TABLEB 上的重复行
If SELECT DISTINCT * FROM TABLEB
如果 SELECT DISTINCT * FROM TABLEB
has the same row count as
具有相同的行数
SELECT * FROM TABLEB
从 TABLEB 中选择 *
then go to the next step, else you can't use this method...
然后进入下一步,否则你不能使用这个方法......
Step 3 - INNER JOIN TABLEA to TABLEB on every column
第 3 步 - 在每一列上将 TABLEA 内部连接到 TABLEB
If the row count of the below query has the same row count as the row counts from Steps 1 and 2, then the tables are the same:
如果以下查询的行数与步骤 1 和 2 中的行数相同,则表相同:
SELECT
*
FROM
TABLEA
INNER JOIN TABLEA ON
TABLEA.column1 = TABLEB.column1
AND TABLEA.column2 = TABLEB.column2
AND TABLEA.column3 = TABLEB.column3
--etc...for every column
Note that this method doesn't necessarily test for different data types, and probably won't work on non-joinable data types (like VARBINARY)
请注意,此方法不一定测试不同的数据类型,并且可能不适用于不可连接的数据类型(如 VARBINARY)
Feedback welcome!
欢迎反馈!
回答by pkhabya
1: First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
1:首先计算表 C1 和 C2 的计数。C1 和 C2 应该相等。C1 和 C2 可以从以下查询中获得
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
如果 C1 和 C2 不相等,则表不相同。
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
2:为表 DC1 和 DC2 找到不同的计数。DC1 和 DC2 应该相等。可以使用以下查询找到不同记录的数量:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
如果 DC1 和 DC2 不相等,则表不相同。
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
3:现在获取通过对 2 个表执行联合获得的记录数。设为 U。 使用以下查询获取 2 个表的联合中的记录数:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U
如果2个表的distinct count等于2个表合并得到的记录数,可以说2个表中的数据是相同的。即 DC1 = U 和 DC2 = U