MySQL 使用唯一索引删除重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36647058/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 21:43:42  来源:igfitidea点击:

Removing duplicates with unique index

mysqlduplicatesunique-index

提问by user3649739

I inserted between two tables fields A,B,C,D, believing I had created a Unique Index on A,B,C,D to prevent duplicates. However I somehow simply made a normal index on those. So duplicates got inserted. It is 20 million record table.

我在两个表字段 A、B、C、D 之间插入,相信我在 A、B、C、D 上创建了一个唯一索引以防止重复。然而,我以某种方式简单地对这些做了一个正常的索引。因此插入了重复项。它是2000万条记录表。

If I change my existing index from normal to unique or simply a add a new unique index for A,B,C,D will the duplicates be removed or will adding fail since unique records exist? I'd test it yet it is 30 mil records and I neither wish to mess the table up or duplicate it.

如果我将现有索引从正常更改为唯一索引,或者只是为 A、B、C、D 添加一个新的唯一索引,重复项会被删除还是因为存在唯一记录而添加失败?我会测试它,但它有 3000 万条记录,我不想弄乱表格或复制它。

回答by Paul Spiegel

If you have duplicates in your table and you use

如果您的表中有重复项并且您使用

ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);

the query will fail with Error 1062 (duplicate key).

查询将因错误 1062(重复键)而失败。

But if you use IGNORE

但是如果你使用 IGNORE

-- (only works before MySQL 5.7.4)
ALTER IGNORE TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);

the duplicates will be removed. But the documentation doesn't specify which row will be kept:

重复项将被删除。但文档没有指定将保留哪一行:

  • IGNOREis a MySQL extension to standard SQL. It controls how ALTER TABLEworks if there are duplicates on unique keys in the new table or if warnings occur when strict mode is enabled. If IGNOREis not specified, the copy is aborted and rolled back if duplicate-key errors occur. If IGNOREis specified, only one row is used of rows with duplicates on a unique key. The other conflicting rows are deleted. Incorrect values are truncated to the closest matching acceptable value.

    As of MySQL 5.7.4, the IGNORE clause for ALTER TABLE is removed and its use produces an error.
  • IGNORE是标准 SQL 的 MySQL 扩展。ALTER TABLE如果新表中的唯一键存在重复项,或者在启用严格模式时出现警告,它会控制如何工作。如果IGNORE未指定,则在发生重复键错误时中止并回滚副本。如果IGNORE指定,则只有一行用于在唯一键上具有重复项的行。其他冲突行被删除。不正确的值被截断为最接近匹配的可接受值。

    从 MySQL 5.7.4 开始,ALTER TABLE 的 IGNORE 子句被删除,它的使用会产生错误。

(ALTER TABLE Syntax)

( ALTER TABLE 语法)

If your version is 5.7.4 or greater - you can:

如果您的版本是 5.7.4 或更高版本 - 您可以:

  • Copy the data into a temporary table (it doesn't technically need to be temporary).
  • Truncate the original table.
  • Create the UNIQUE INDEX.
  • And copy the data back with INSERT IGNORE(which is still available).
  • 将数据复制到临时表中(技术上不需要是临时表)。
  • 截断原始表。
  • 创建唯一索引。
  • 并将数据复制回INSERT IGNORE(仍然可用)。
CREATE TABLE tmp_data SELECT * FROM mytable;
TRUNCATE TABLE mytable;
ALTER TABLE mytable ADD UNIQUE INDEX myindex (A, B, C, D);
INSERT IGNORE INTO mytable SELECT * from tmp_data;
DROP TABLE tmp_data;

If you use the IGNOREmodifier, errors that occur while executing the INSERTstatement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUEindex or PRIMARY KEYvalue in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors generate warnings instead.

如果使用IGNORE修饰符,INSERT则会忽略执行语句时发生的错误 。例如,如果没有IGNORE,复制表中现有UNIQUE索引或PRIMARY KEY值的行会导致重复键错误并且语句被中止。使用 IGNORE,该行将被丢弃并且不会发生错误。忽略的错误会生成警告。

(INSERT Syntax)

(插入语法)

Also see: INSERT ... SELECT Syntaxand Comparison of the IGNORE Keyword and Strict SQL Mode

另请参阅:INSERT ... SELECT 语法IGNORE 关键字和 Strict SQL 模式的比较

回答by verhie

if you think there will be duplicates, adding the unique index will fail. first check what duplicates there are:

如果您认为会有重复,则添加唯一索引将失败。首先检查有哪些重复项:

select * from
(select a,b,c,d,count(*) as n from table_name group by a,b,c,d) x
where x.n > 1

This may be a expensive query on 20M rows, but will get you all duplicate keys that will prevent you from adding the primary index. You could split this up into smaller chunks if you do a where in the subquery: where a='some_value'

这可能是对 20M 行的昂贵查询,但会为您提供所有重复的键,这将阻止您添加主索引。如果在子查询中执行 where ,则可以将其拆分为更小的块:where a='some_value'

For the records retrieved, you will have to change something to make the rows unique. If that is done (query returns 0 rows) you should be safe to add the primary index.

对于检索到的记录,您必须进行一些更改以使行唯一。如果这样做(查询返回 0 行),您应该可以安全地添加主索引。

回答by Oriol Vilaseca

Instead of IGNORE you can use ON DUPLICATE KEY UPDATE, which will give you control over which values should prevail.

您可以使用 ON DUPLICATE KEY UPDATE 代替 IGNORE,这将使您能够控制哪些值应该占上风。

回答by Sarath Chandra

To answer your question- adding a UNIQUEconstraint on a column that has duplicate values will throw an error.

要回答您的问题 -UNIQUE在具有重复值的列上添加约束将引发错误。

For example, you can try the following script:

例如,您可以尝试以下脚本:

CREATE TABLE `USER` (
  `USER_ID` INT NOT NULL,
  `USERNAME` VARCHAR(45) NOT NULL,
  `NAME` VARCHAR(45) NULL,
  PRIMARY KEY (`USER_ID`));

INSERT INTO USER VALUES(1,'apple', 'woz'),(2,'apple', 'jobs'),
(3,'google', 'sergey'),(4,'google', 'larry');

ALTER TABLE `USER` 
ADD UNIQUE INDEX `USERNAME_UNIQUE` (`USERNAME` ASC);
/*
Operation failed: There was an error while applying the SQL script to the database.
ERROR 1062: Duplicate entry 'apple' for key 'USERNAME_UNIQUE'
*/