SQL 如何有效地从 Postgresql 8.1 表中删除行？

Question

提问by Jin Kim

I'm working on a PostgreSQL 8.1 SQL script which needs to delete a large number of rows from a table.

我正在处理需要从表中删除大量行的 PostgreSQL 8.1 SQL 脚本。

Let's say the table I need to delete from is Employees (~260K rows). It has primary key named id.

假设我需要从中删除的表是员工（~260K 行）。它有一个名为 id 的主键。

The rows I need to delete from this table are stored in a separate temporary table called EmployeesToDelete (~10K records) with a foreign key reference to Employees.id called employee_id.

我需要从该表中删除的行存储在一个名为EmployeesToDelete（约10K 条记录）的单独临时表中，外键引用名为employee_id 的Employees.id。

Is there an efficient way to do this?

有没有一种有效的方法来做到这一点？

At first, I thought of the following:

起初，我想到了以下几点：

DELETE
FROM    Employees
WHERE   id IN
        (
        SELECT  employee_id
        FROM    EmployeesToDelete
        )

But I heard that using the "IN" clause and subqueries can be inefficient, especially with larger tables.

但我听说使用“IN”子句和子查询效率低下，尤其是对于较大的表。

I've looked at the PostgreSQL 8.1 documentation, and there's mention of DELETE FROM ... USING but it doesn't have examples so I'm not sure how to use it.

我查看了 PostgreSQL 8.1 文档，并提到了 DELETE FROM ... USING 但它没有示例，所以我不确定如何使用它。

I'm wondering if the following works and is more efficient?

我想知道以下是否有效并且效率更高？

DELETE
FROM    Employees
USING   Employees e
INNER JOIN
        EmployeesToDelete ed
ON      e.id = ed.employee_id

Your comments are greatly appreciated.

非常感谢您的意见。

Edit: I ran EXPLAIN ANALYZE and the weird thing is that the first DELETE ran pretty quickly (within seconds), while the second DELETE took so long (over 20 min) I eventually cancelled it.

编辑：我运行了 EXPLAIN ANALYZE，奇怪的是第一个 DELETE 运行得很快（几秒钟内），而第二个 DELETE 花了很长时间（超过 20 分钟），我最终取消了它。

Adding an index to the temp table helped the performance quite a bit.

向临时表添加索引对性能有很大帮助。

Here's a query plan of the first DELETE for anyone interested:

这是任何感兴趣的人的第一个 DELETE 的查询计划：

 Hash Join  (cost=184.64..7854.69 rows=256482 width=6) (actual time=54.089..660.788 rows=27295 loops=1)
   Hash Cond: ("outer".id = "inner".employee_id)
   ->  Seq Scan on Employees  (cost=0.00..3822.82 rows=256482 width=10) (actual time=15.218..351.978 rows=256482 loops=1)
   ->  Hash  (cost=184.14..184.14 rows=200 width=4) (actual time=38.807..38.807 rows=10731 loops=1)
         ->  HashAggregate  (cost=182.14..184.14 rows=200 width=4) (actual time=19.801..28.773 rows=10731 loops=1)
               ->  Seq Scan on EmployeesToDelete  (cost=0.00..155.31 rows=10731 width=4) (actual time=0.005..9.062 rows=10731 loops=1)

 Total runtime: 935.316 ms
(7 rows)

At this point, I'll stick with the first DELETE unless I can find a better way of writing it.

在这一点上，除非我能找到更好的书写方式，否则我将坚持使用第一个 DELETE。

Answer 1

回答by bortzmeyer

Don't guess, measure. Try the various methods and see which one is the shortest to execute. Also, use EXPLAINto know what PostgreSQL will do and see where you can optimize. Very few PostgreSQL users are able to guess correctlythe fastest query...

不要猜测，测量。尝试各种方法，看看哪一种执行时间最短。此外，使用EXPLAIN来了解 PostgreSQL 会做什么，并查看可以优化的地方。很少有 PostgreSQL 用户能够正确猜出最快的查询......

Answer 2

回答by Quassnoi

I'm wondering if the following works and is more efficient?

我想知道以下是否有效并且效率更高？

    DELETE
    FROM    Employees e
    USING   EmployeesToDelete ed
    WHERE   id = ed.employee_id;

This totally depend on your index selectivity.

这完全取决于您的索引选择性。

PostgreSQLtends to employ MERGE IN JOINfor INpredicates, which has stable execution time.

PostgreSQL倾向于聘请MERGE IN JOIN了IN谓词，它具有稳定的执行时间。

It's not affected by how many rows satisfy this condition, provided that you already have an ordered resultset.

它不受满足此条件的行数的影响，前提是您已经有一个有序的结果集。

An ordered resultset requires either a sort operation or an index. Full index traversal is very inefficient in PostgreSQLcompared to SEQ SCAN.

有序结果集需要排序操作或索引。全索引遍历是非常低效PostgreSQL相比SEQ SCAN。

The JOINpredicate, on the other hand, may benefit from using NESTED LOOPSif your index is very selective, and from using HASH JOINis it's inselective.

该JOIN谓词，而另一方面，可以使用中获益NESTED LOOPS，如果你的指数是非常有选择性的，并且使用HASH JOIN的是它的inselective。

PostgreSQLshould select the right one by estimating the row count.

PostgreSQL应该通过估计行数来选择正确的。

Since you have 30krows against 260Krows, I expect HASH JOINto be more efficient, and you should try to build a plan on a DELETE ... USINGquery.

由于您有30k针对260K行的行，我希望HASH JOIN效率更高，您应该尝试在DELETE ... USING查询上构建计划。

To make sure, please post execution plan for both queries.

为了确保，请发布两个查询的执行计划。

Answer 3

回答by matt b

I'm not sure about the DELETE FROM ... USING syntax, but generally, a subquery should logically be the same thing as an INNER JOINanyway. The database query optimizer should be capable (and this is just a guess) of executing the same query plan for both.

我不确定DELETE FROM ... USING syntax，但一般来说，子查询在逻辑上应该与INNER JOIN无论如何都是一样的。数据库查询优化器应该能够（这只是一个猜测）为两者执行相同的查询计划。

Answer 4

回答by Sophie Alpert

Why can't you delete the rows in the first place instead of adding them to the EmployeesToDeletetable?

为什么不能首先删除行而不是将它们添加到EmployeesToDelete表中？

Or if you need to undo, just add a "deleted" flag to Employees, so you can reverse the deletion, or make in permanent, all in one table?

或者，如果您需要撤消，只需在中添加一个“已删除”标志Employees，这样您就可以在一个表中撤销删除或永久删除？

SQL 如何有效地从 Postgresql 8.1 表中删除行？

提问by Jin Kim

回答by bortzmeyer

回答by Quassnoi

回答by matt b

回答by Sophie Alpert

相关推荐

最近更新

标签

SQL 如何有效地从 Postgresql 8.1 表中删除行？

提问by Jin Kim

回答by bortzmeyer

回答by Quassnoi

回答by matt b

回答by Sophie Alpert

相关推荐

SQL 将图像存储在数据库中还是系统文件中？

SQL Postgresql SELECT 如果字符串包含

SQL Server 2005/2008 中的异步触发器

SQL 日期范围之间的 Postgresql 查询

相关推荐

最近更新

标签