MySQL SQL 查询:删除表中除最新 N 之外的所有记录?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/578867/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:50:04  来源:igfitidea点击:

SQL query: Delete all records from the table except latest N?

sqlmysql

提问by serg

Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?

是否可以构建单个 mysql 查询(不带变量)以从表中删除所有记录,但最新的 N(按 id desc 排序)除外?

Something like this, only it doesn't work :)

像这样的东西,只是它不起作用:)

delete from table order by id ASC limit ((select count(*) from table ) - N)

Thanks.

谢谢。

回答by Alex Barrett

You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.

您不能以这种方式删除记录,主要问题是您不能使用子查询来指定 LIMIT 子句的值。

This works (tested in MySQL 5.0.67):

这有效(在 MySQL 5.0.67 中测试):

DELETE FROM `table`
WHERE id NOT IN (
  SELECT id
  FROM (
    SELECT id
    FROM `table`
    ORDER BY id DESC
    LIMIT 42 -- keep this many records
  ) foo
);

The intermediate subquery isrequired. Without it we'd run into two errors:

中间子查询必需的。没有它,我们会遇到两个错误:

  1. SQL Error (1093): You can't specify target table 'table' for update in FROM clause- MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
  2. SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'- You can't use the LIMIT clause within a direct subquery of a NOT IN operator.
  1. SQL 错误 (1093):您不能在 FROM 子句中为更新指定目标表“表”- MySQL 不允许您在直接子查询中引用您要删除的表。
  2. SQL 错误 (1235):此版本的 MySQL 尚不支持“LIMIT & IN/ALL/ANY/SOME 子查询”-您不能在 NOT IN 运算符的直接子查询中使用 LIMIT 子句。

Fortunately, using an intermediate subquery allows us to bypass both of these limitations.

幸运的是,使用中间子查询允许我们绕过这两个限制。



Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answeras well to see if it fits yours.

Nicole 指出,对于某些用例(例如这个),可以显着优化此查询。我建议您也阅读该答案,看看它是否适合您。

回答by Nicole

I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.

我知道我正在重新提出一个相当古老的问题,但我最近遇到了这个问题,但需要一些可以很好地扩展到大量数字的东西。没有任何现有的性能数据,并且由于这个问题引起了相当多的关注,我想我会发布我发现的内容。

The solutions that actually worked were the Alex Barrett's double sub-query/NOT INmethod (similar to Bill Karwin's), and Quassnoi's LEFT JOINmethod.

实际有效的解决方案是Alex Barrett 的NOT IN双子查询/方法(类似于Bill Karwin 的)和Quassnoi 的LEFT JOIN方法。

Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records notbeing deleted gets large.

遗憾的是上述两种方法创建非常大的中间的临时表和性能下降最快的速度记录的数量没有被删除变大。

What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <=instead of NOT IN:

我决定使用 Alex Barrett 的双重子查询(谢谢!)但使用<=而不是NOT IN

DELETE FROM `test_sandbox`
  WHERE id <= (
    SELECT id
    FROM (
      SELECT id
      FROM `test_sandbox`
      ORDER BY id DESC
      LIMIT 1 OFFSET 42 -- keep this many records
    ) foo
  )

It uses OFFSETto get the id of the Nth record and deletes that record and all previous records.

它用于OFFSET获取第N 条记录的 id并删除该记录和所有以前的记录。

Since ordering is already an assumption of this problem (ORDER BY id DESC), <=is a perfect fit.

由于排序已经是这个问题的一个假设 ( ORDER BY id DESC),<=是一个完美的匹配。

It is much faster, since the temporary table generated by the subquery contains just one record instead of Nrecords.

它要快得多,因为子查询生成的临时表只包含一条记录而不是N条记录。

Test case

测试用例

I tested the three working methods and the new method above in two test cases.

我在两个测试用例中测试了三种工作方法和上面的新方法。

Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).

两个测试用例都使用 10000 个现有行,而第一个测试保留 9000(删除最旧的 1000),第二个测试保留 50(删除最旧的 9950)。

+-----------+------------------------+----------------------+
|           | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN    |         3.2542 seconds |       0.1629 seconds |
| NOT IN v2 |         4.5863 seconds |       0.1650 seconds |
| <=,OFFSET |         0.0204 seconds |       0.1076 seconds |
+-----------+------------------------+----------------------+

What's interesting is that the <=method sees better performance across the board, but actually gets better the more you keep, instead of worse.

有趣的是,该<=方法在整体上看到了更好的性能,但实际上,你保留的越多,效果越好,而不是更糟。

回答by Bill Karwin

Unfortunately for all the answers given by other folks, you can't DELETEand SELECTfrom a given table in the same query.

不幸的是,通过其他人给,你不能在所有的答案DELETE,并SELECT从表中给出相同的查询。

DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);

ERROR 1093 (HY000): You can't specify target table 'mytable' for update 
in FROM clause

Nor can MySQL support LIMITin a subquery. These are limitations of MySQL.

MySQL 也不能LIMIT在子查询中支持。这些是 MySQL 的局限性。

DELETE FROM mytable WHERE id NOT IN 
  (SELECT id FROM mytable ORDER BY id DESC LIMIT 1);

ERROR 1235 (42000): This version of MySQL doesn't yet support 
'LIMIT & IN/ALL/ANY/SOME subquery'

The best answer I can come up with is to do this in two stages:

我能想出的最佳答案是分两个阶段进行:

SELECT id FROM mytable ORDER BY id DESC LIMIT n; 

Collect the id's and make them into a comma-separated string:

收集 id 并将它们变成逗号分隔的字符串:

DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );

(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be id values from the database itself.)

(通常将逗号分隔列表插入 SQL 语句会引入一些 SQL 注入风险,但在这种情况下,这些值并非来自不受信任的来源,它们被称为来自数据库本身的 id 值。)

note:Though this doesn't get the job done in a singlequery, sometimes a more simple, get-it-done solution is the most effective.

注意:虽然这不能在单个查询中完成工作,但有时更简单、一劳永逸的解决方案是最有效的。

回答by Quassnoi

DELETE  i1.*
FROM    items i1
LEFT JOIN
        (
        SELECT  id
        FROM    items ii
        ORDER BY
                id DESC
        LIMIT 20
        ) i2
ON      i1.id = i2.id
WHERE   i2.id IS NULL

回答by Justin Wignall

If your id is incremental then use something like

如果你的 id 是增量的,那么使用类似的东西

delete from table where id < (select max(id) from table)-N

回答by Paolo

To delete all the records except te last Nyou may use the query reported below.

要删除除 te last N之外的所有记录,您可以使用下面报告的查询。

It's a single query but with many statements so it's actually not a single querythe way it was intended in the original question.

这是一个单一的查询,但有很多语句,所以它实际上不是原始问题中预期的单一查询

Also you need a variable and a built-in (in the query) prepared statement due to a bug in MySQL.

由于 MySQL 中的错误,您还需要一个变量和一个内置(在查询中)准备好的语句。

Hope it may be useful anyway...

希望它无论如何可能有用......

nnnare the rows to keepand theTableis the table you're working on.

nnn是要保留的行,而theTable是您正在处理的表。

I'm assuming you have an autoincrementing record named id

我假设您有一个名为id的自动递增记录

SELECT @ROWS_TO_DELETE := COUNT(*) - nnn FROM `theTable`;
SELECT @ROWS_TO_DELETE := IF(@ROWS_TO_DELETE<0,0,@ROWS_TO_DELETE);
PREPARE STMT FROM "DELETE FROM `theTable` ORDER BY `id` ASC LIMIT ?";
EXECUTE STMT USING @ROWS_TO_DELETE;

The good thing about this approach is performance: I've tested the query on a local DB with about 13,000 record, keeping the last 1,000. It runs in 0.08 seconds.

这种方法的好处是性能:我已经在本地数据库上测试了大约 13,000 条记录的查询,保留了最后 1,000 条。它在 0.08 秒内运行。

The script from the accepted answer...

已接受答案中的脚本...

DELETE FROM `table`
WHERE id NOT IN (
  SELECT id
  FROM (
    SELECT id
    FROM `table`
    ORDER BY id DESC
    LIMIT 42 -- keep this many records
  ) foo
);

Takes 0.55 seconds. About 7 times more.

需要 0.55 秒。大约是 7 倍。

Test environment: mySQL 5.5.25 on a late 2011 i7 MacBookPro with SSD

测试环境:带有 SSD 的 2011 年末 i7 MacBookPro 上的 mySQL 5.5.25

回答by Dave Swersky

DELETE FROM table WHERE ID NOT IN
(SELECT MAX(ID) ID FROM table)

回答by Nishant Nair

try below query:

尝试以下查询:

DELETE FROM tablename WHERE id < (SELECT * FROM (SELECT (MAX(id)-10) FROM tablename ) AS a)

the inner sub query will return the top 10 value and the outer query will delete all the records except the top 10.

内部子查询将返回前 10 个值,外部查询将删除除前 10 个之外的所有记录。

回答by Nivesh Saharan

If you need to delete the records based on some other column as well, then here is a solution:

如果您还需要删除基于其他列的记录,那么这里有一个解决方案:

DELETE
FROM articles
WHERE id IN
    (SELECT id
     FROM
       (SELECT id
        FROM articles
        WHERE user_id = :userId
        ORDER BY created_at DESC LIMIT 500, 10000000) abc)
  AND user_id = :userId

回答by Ken Palmer

Just wanted to throw this into the mix for anyone using Microsoft SQL Server instead of MySQL. The keyword 'Limit' isn't supported by MSSQL, so you'll need to use an alternative. This code worked in SQL 2008, and is based on this SO post. https://stackoverflow.com/a/1104447/993856

只是想为使用 Microsoft SQL Server 而不是 MySQL 的任何人加入这个组合。MSSQL 不支持关键字“限制”,因此您需要使用替代方法。此代码适用于 SQL 2008,并基于此 SO 帖子。https://stackoverflow.com/a/1104447/993856

-- Keep the last 10 most recent passwords for this user.
DECLARE @UserID int; SET @UserID = 1004
DECLARE @ThresholdID int -- Position of 10th password.
SELECT  @ThresholdID = UserPasswordHistoryID FROM
        (
            SELECT ROW_NUMBER()
            OVER (ORDER BY UserPasswordHistoryID DESC) AS RowNum, UserPasswordHistoryID
            FROM UserPasswordHistory
            WHERE UserID = @UserID
        ) sub
WHERE   (RowNum = 10) -- Keep this many records.

DELETE  UserPasswordHistory
WHERE   (UserID = @UserID)
        AND (UserPasswordHistoryID < @ThresholdID)

Admittedly, this is not elegant. If you're able to optimize this for Microsoft SQL, please share your solution. Thanks!

诚然,这并不优雅。如果您能够针对 Microsoft SQL 对此进行优化,请分享您的解决方案。谢谢!