仅使用 MySQL 查询删除重复项?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3383898/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove duplicates using only a MySQL query?
提问by Jim
I have a table with the following columns:
我有一个包含以下列的表格:
URL_ID
URL_ADDR
URL_Time
I want to remove duplicates on the URL_ADDR
column using a MySQL query.
我想URL_ADDR
使用 MySQL 查询删除列上的重复项。
Is it possible to do such a thing without using any programming?
是否可以在不使用任何编程的情况下做这样的事情?
回答by Daniel Vassallo
Consider the following test case:
考虑以下测试用例:
CREATE TABLE mytb (url_id int, url_addr varchar(100));
INSERT INTO mytb VALUES (1, 'www.google.com');
INSERT INTO mytb VALUES (2, 'www.microsoft.com');
INSERT INTO mytb VALUES (3, 'www.apple.com');
INSERT INTO mytb VALUES (4, 'www.google.com');
INSERT INTO mytb VALUES (5, 'www.cnn.com');
INSERT INTO mytb VALUES (6, 'www.apple.com');
Where our test table now contains:
我们的测试表现在包含:
SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr |
+--------+-------------------+
| 1 | www.google.com |
| 2 | www.microsoft.com |
| 3 | www.apple.com |
| 4 | www.google.com |
| 5 | www.cnn.com |
| 6 | www.apple.com |
+--------+-------------------+
5 rows in set (0.00 sec)
Then we can use the multiple-table DELETE
syntax as follows:
然后我们可以使用多表DELETE
语法如下:
DELETE t2
FROM mytb t1
JOIN mytb t2 ON (t2.url_addr = t1.url_addr AND t2.url_id > t1.url_id);
... which will delete duplicate entries, leaving only the first url based on url_id
:
...这将删除重复的条目,只留下基于以下内容的第一个网址url_id
:
SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr |
+--------+-------------------+
| 1 | www.google.com |
| 2 | www.microsoft.com |
| 3 | www.apple.com |
| 5 | www.cnn.com |
+--------+-------------------+
3 rows in set (0.00 sec)
UPDATE- Further to new comments above:
更新- 对上述新评论的进一步补充:
If the duplicate URLs will not have the same format, you may want to apply the REPLACE()
function to remove www.
or http://
parts. For example:
如果重复的 URL 不会具有相同的格式,您可能需要应用该REPLACE()
功能来删除www.
或http://
部分。例如:
DELETE t2
FROM mytb t1
JOIN mytb t2 ON (REPLACE(t2.url_addr, 'www.', '') =
REPLACE(t1.url_addr, 'www.', '') AND
t2.url_id > t1.url_id);
回答by Box
You may want to try the method mentioned at http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/.
您可能想尝试在http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/ 中提到的方法。
ALTER IGNORE TABLE your_table ADD UNIQUE INDEX `tmp_index` (URL_ADDR);
回答by Martin Smith
This will leave the ones with the highest URL_ID
for a particular URL_ADDR
这将留下最高URL_ID
的特定URL_ADDR
DELETE FROM table
WHERE URL_ID NOT IN
(SELECT ID FROM
(SELECT MAX(URL_ID) AS ID
FROM table
WHERE URL_ID IS NOT NULL
GROUP BY URL_ADDR ) X) /*Sounds like you would need to GROUP BY a
calculated form - e.g. using REPLACE to
strip out www see Daniel's answer*/
(The derived table 'X' is to avoid the error"You can't specify target table 'tablename' for update in FROM clause")
(派生表'X'是为了避免错误“You can't specified target table 'tablename' for update in FROM clause”)
回答by Vilx-
Well, you could always:
好吧,你总是可以:
- create a temporary table;
INSERT INTO ... SELECT DISTINCT
into the temp table from original table;- clear original table
INSERT INTO ... SELECT
into the original table from the temp table- drop temp table.
- 创建临时表;
INSERT INTO ... SELECT DISTINCT
从原始表进入临时表;- 清除原始表格
INSERT INTO ... SELECT
从临时表到原始表- 删除临时表。
It's clumsy and awkward, and requires several queries (not to mention privileges), but it will do the trick if you don't find another solution.
它既笨拙又笨拙,并且需要多次查询(更不用说权限),但是如果您找不到其他解决方案,它也可以解决问题。
回答by Doug
You can group by on the URL_ADDR which will effectively give you only distinct values in the URL_ADDR field.
您可以对 URL_ADDR 进行分组,这将有效地仅在 URL_ADDR 字段中为您提供不同的值。
select
URL_ID
URL_ADDR
URL_Time
from
some_table
group by
URL_ADDR
Enjoy!
享受!
回答by Olariu Romeo Vicentiu
Daniel Vassallo How to for multiple column?
Daniel Vassallo 如何为多列?
DELETE t2
FROM directory1 t1
JOIN directory1 t2 ON
(t2.page = t1.page,
t2.parentTopic = t1.parentTopic,
t2.title = t1.title,
t2.description = t1.description,
t2.linktype = t1.linktype,
t2.priority = t1.priority
AND t2.linkID > t1.linkID);
DELETE t2
FROM directory1 t1
JOIN directory1 t2 ON
(t2.page = t1.page,
t2.parentTopic = t1.parentTopic,
t2.title = t1.title,
t2.description = t1.description,
t2.linktype = t1.linktype,
t2.priority = t1.priority
AND t2.linkID > t1.linkID);
maybe like this?
也许像这样?
回答by Tahbaza
This will work provided that your URL_ID column is unique.
只要您的 URL_ID 列是唯一的,这将起作用。
DELETE FROM url WHERE URL_ID IN (
SELECT URL_ID
FROM url a INNER JOIN (
SELECT URL_ADDR, MAX(URL_ID) MaxURLId
FROM url
GROUP BY URL_ADDR
HAVING COUNT(*) > 1) b ON a.URL_ID <> b.MaxURLId AND a.URL_ADDR = b.URL_ADDR
)