仅使用 MySQL 查询删除重复项？

Question

提问by Jim

I have a table with the following columns:

我有一个包含以下列的表格：

URL_ID    
URL_ADDR    
URL_Time

I want to remove duplicates on the URL_ADDRcolumn using a MySQL query.

我想URL_ADDR使用 MySQL 查询删除列上的重复项。

Is it possible to do such a thing without using any programming?

是否可以在不使用任何编程的情况下做这样的事情？

Answer 1

回答by Daniel Vassallo

Consider the following test case:

考虑以下测试用例：

CREATE TABLE mytb (url_id int, url_addr varchar(100));

INSERT INTO mytb VALUES (1, 'www.google.com');
INSERT INTO mytb VALUES (2, 'www.microsoft.com');
INSERT INTO mytb VALUES (3, 'www.apple.com');
INSERT INTO mytb VALUES (4, 'www.google.com');
INSERT INTO mytb VALUES (5, 'www.cnn.com');
INSERT INTO mytb VALUES (6, 'www.apple.com');

Where our test table now contains:

我们的测试表现在包含：

SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr          |
+--------+-------------------+
|      1 | www.google.com    |
|      2 | www.microsoft.com |
|      3 | www.apple.com     |
|      4 | www.google.com    |
|      5 | www.cnn.com       |
|      6 | www.apple.com     |
+--------+-------------------+
5 rows in set (0.00 sec)

Then we can use the multiple-table DELETEsyntax as follows:

然后我们可以使用多表DELETE语法如下：

DELETE t2
FROM   mytb t1
JOIN   mytb t2 ON (t2.url_addr = t1.url_addr AND t2.url_id > t1.url_id);

... which will delete duplicate entries, leaving only the first url based on url_id:

...这将删除重复的条目，只留下基于以下内容的第一个网址url_id：

SELECT * FROM mytb;
+--------+-------------------+
| url_id | url_addr          |
+--------+-------------------+
|      1 | www.google.com    |
|      2 | www.microsoft.com |
|      3 | www.apple.com     |
|      5 | www.cnn.com       |
+--------+-------------------+
3 rows in set (0.00 sec)

UPDATE- Further to new comments above:

更新- 对上述新评论的进一步补充：

If the duplicate URLs will not have the same format, you may want to apply the REPLACE()function to remove www.or http://parts. For example:

如果重复的 URL 不会具有相同的格式，您可能需要应用该REPLACE()功能来删除www.或http://部分。例如：

DELETE t2
FROM   mytb t1
JOIN   mytb t2 ON (REPLACE(t2.url_addr, 'www.', '') = 
                   REPLACE(t1.url_addr, 'www.', '') AND 
                   t2.url_id > t1.url_id);

Answer 2

回答by Box

You may want to try the method mentioned at http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/.

您可能想尝试在http://labs.creativecommons.org/2010/01/12/removing-duplicate-rows-in-mysql/ 中提到的方法。

ALTER IGNORE TABLE your_table ADD UNIQUE INDEX `tmp_index` (URL_ADDR);

Answer 3

回答by Martin Smith

This will leave the ones with the highest URL_IDfor a particular URL_ADDR

这将留下最高URL_ID的特定URL_ADDR

DELETE FROM table
WHERE URL_ID NOT IN 
    (SELECT ID FROM 
       (SELECT MAX(URL_ID) AS ID 
        FROM table 
        WHERE URL_ID IS NOT NULL
        GROUP BY URL_ADDR ) X)   /*Sounds like you would need to GROUP BY a 
                                   calculated form - e.g. using REPLACE to 
                                  strip out www see Daniel's answer*/

(The derived table 'X' is to avoid the error"You can't specify target table 'tablename' for update in FROM clause")

（派生表'X'是为了避免错误“You can't specified target table 'tablename' for update in FROM clause”）

Answer 4

回答by Vilx-

Well, you could always:

好吧，你总是可以：

create a temporary table;
INSERT INTO ... SELECT DISTINCTinto the temp table from original table;
clear original table
INSERT INTO ... SELECTinto the original table from the temp table
drop temp table.

创建临时表；
INSERT INTO ... SELECT DISTINCT从原始表进入临时表；
清除原始表格
INSERT INTO ... SELECT从临时表到原始表
删除临时表。

It's clumsy and awkward, and requires several queries (not to mention privileges), but it will do the trick if you don't find another solution.

它既笨拙又笨拙，并且需要多次查询（更不用说权限），但是如果您找不到其他解决方案，它也可以解决问题。

Answer 5

回答by Doug

You can group by on the URL_ADDR which will effectively give you only distinct values in the URL_ADDR field.

您可以对 URL_ADDR 进行分组，这将有效地仅在 URL_ADDR 字段中为您提供不同的值。

select 
 URL_ID
 URL_ADDR
 URL_Time
from
 some_table
group by
 URL_ADDR

Enjoy!

享受！

Answer 6

回答by Olariu Romeo Vicentiu

Daniel Vassallo How to for multiple column?

Daniel Vassallo 如何为多列？

DELETE t2 FROM directory1 t1 JOIN directory1 t2 ON (t2.page = t1.page, t2.parentTopic = t1.parentTopic, t2.title = t1.title, t2.description = t1.description, t2.linktype = t1.linktype, t2.priority = t1.priority AND t2.linkID > t1.linkID);

maybe like this?

也许像这样？

Answer 7

回答by Tahbaza

This will work provided that your URL_ID column is unique.

只要您的 URL_ID 列是唯一的，这将起作用。

DELETE FROM url WHERE URL_ID IN (
SELECT URL_ID
FROM url a INNER JOIN (
    SELECT URL_ADDR, MAX(URL_ID) MaxURLId 
    FROM url
    GROUP BY URL_ADDR
    HAVING COUNT(*) > 1) b ON a.URL_ID <> b.MaxURLId AND a.URL_ADDR = b.URL_ADDR
)

仅使用 MySQL 查询删除重复项？

提问by Jim

回答by Daniel Vassallo

回答by Box

回答by Martin Smith

回答by Vilx-

回答by Doug

回答by Olariu Romeo Vicentiu

回答by Tahbaza

相关推荐

最近更新

标签

仅使用 MySQL 查询删除重复项？

提问by Jim

回答by Daniel Vassallo

回答by Box

回答by Martin Smith

回答by Vilx-

回答by Doug

回答by Olariu Romeo Vicentiu

回答by Tahbaza

相关推荐

mysql 查询将字段更新为 max(field) + 1

MySQL 如何编写程序将数据插入到phpmyadmin的表中？

MySQL MySQL中的排名函数

MySQL 在 phpmyadmin 中仅将几个值设置为要枚举的域

相关推荐

最近更新

标签