SQL 删除除最新记录以外的所有记录?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7238983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 11:55:25  来源:igfitidea点击:

Delete all records except the most recent one?

sqloracle

提问by sim

I have two DB tables in a one-to-many relationship. The data looks like this:

我有两个处于一对多关系的数据库表。数据如下所示:

select * from student, application

Resultset:

结果集:

+-----------+---------------+---------------------+
| StudentID | ApplicationID | ApplicationDateTime |
+-----------+---------------+---------------------+
| 1         | 20001         | 12 April 2011       |
| 1         | 20002         | 15 May 2011         |
| 2         | 20003         | 02 Feb 2011         |
| 2         | 20004         | 13 March 2011       |
| 2         | 20005         | 05 June 2011        |
+-----------+---------------+---------------------+

I want to delete all applications except for the most recent one. In other words, each student must only have one application linked to it. Using the above example, the data should look like this:

我想删除除最近的应用程序之外的所有应用程序。换句话说,每个学生只能有一个与其相关联的应用程序。使用上面的示例,数据应如下所示:

+-----------+---------------+---------------------+
| StudentID | ApplicationID | ApplicationDateTime |
+-----------+---------------+---------------------+
| 1         | 20002         | 15 May 2011         |
| 2         | 20005         | 05 June 2011        |
+-----------+---------------+---------------------+

How would I go about constructing my DELETE statement to filter out the correct records?

我将如何构建我的 DELETE 语句来过滤掉正确的记录?

回答by a_horse_with_no_name

DELETE FROM student
WHERE ApplicationDateTime <> (SELECT max(ApplicationDateTime) 
                              FROM student s2
                              WHERE s2.StudentID  = student.StudentID)


Given the long discussion in the comments, please note the following:

鉴于评论中的长时间讨论,请注意以下几点:

The above statement willwork on any database that properly implements statement level read consistency regardless of any changes to the table while the statement is running.

上面的语句任何数据库上工作,妥善implements语句级别语句运行时读一致性,无论任何变化表。

Databases where I definitely know that this works correctly even withconcurrent modifications to the table: Oracle (the one which this question is about), Postgres, SAP HANA, Firebird (and most probably MySQL using InnoDB). Because they all guarantee a consistent view of the data at the point in time when the statement started. Changing the <>to <will not change anything for them (including Oracle which this question is about)

我肯定知道即使对表进行并发修改也能正常工作的数据库:Oracle(这个问题涉及的那个)、Postgres、SAP HANA、Firebird(很可能还有使用 InnoDB 的 MySQL)。因为它们都保证在语句开始时的数据视图一致。改变<>to<不会改变他们的任何东西(包括这个问题所涉及的甲骨文)

For the above mentioned databases, the statement is notsubject to the isolation level because phantom reads or non-repeatable reads can only happen between multiplestatements - not within a singlestatement.

对于上述数据库,语句受到隔离级别,因为幻读或不可重复读只能之间发生多次陈述-而不是内的单个语句。

For database that do not implement MVCC properly and rely on locking to manage concurrency (thus blocking concurrent write access) this might actually yield wrong results if the table is updated concurrently. For those the workaround using <is probably needed.

对于没有正确实现 MVCC 并依靠锁定来管理并发(从而阻止并发写入访问)的数据库,如果并发更新表,这实际上可能会产生错误的结果。对于那些<可能需要使用的解决方法。

回答by Alex Poole

You can use row_number()(or rank()or dense_rank(), or even just the rownumpseudocolumn) to apply an order to the records, and then use that order to decide which to discard. In this case, ordering by applicationdatetime descgives the application with the most recent date for each student the rank of 1:

您可以使用row_number()(or rank()or dense_rank(), 甚至只是rownum伪列) 将顺序应用于记录,然后使用该顺序来决定丢弃哪些。在这种情况下,排序依据applicationdatetime desc为每个学生的最近日期的应用程序提供 1 的排名:

select studentid, applicationid from (
    select studentid, applicationid,
        row_number() over (partition by studentid
            order by applicationdatetime desc) as rn
    from application
)
where rn = 1;

 STUDENTID APPLICATIONID
---------- -------------
         1         20002
         2         20005

You can then delete anything with a rank higher than 1, which will preseve the records you care about:

然后您可以删除任何排名高于 1 的内容,这将保留您关心的记录:

delete from application
where (studentid, applicationid) in (
    select studentid, applicationid from (
        select studentid, applicationid,
            row_number() over (partition by studentid
                order by applicationdatetime desc) as rn
        from application
    )
    where rn > 1
);

3 rows deleted.

回答by Tvitmsvleli



At first you can do so

起初你可以这样做

DELETE FROM [student]
           or [application]
WHERE (studentid, applicationid) NOT IN (SELECT StudentID
                                               ,MAX(ApplicationID)
                                         FROM student
                                             ,application
group by StudentID);

but there is another solution to, you can create the backup table, after delete all records in your tables and after insert your data (what you want) with max values select in your tables.

但是还有另一种解决方案,您可以在删除表中的所有记录并在表中选择最大值插入数据(您想要的)之后创建备份表。