SQL 如何有效地从 Postgresql 8.1 表中删除行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/777880/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to efficiently delete rows from a Postgresql 8.1 table?
提问by Jin Kim
I'm working on a PostgreSQL 8.1 SQL script which needs to delete a large number of rows from a table.
我正在处理需要从表中删除大量行的 PostgreSQL 8.1 SQL 脚本。
Let's say the table I need to delete from is Employees (~260K rows). It has primary key named id.
假设我需要从中删除的表是员工(~260K 行)。它有一个名为 id 的主键。
The rows I need to delete from this table are stored in a separate temporary table called EmployeesToDelete (~10K records) with a foreign key reference to Employees.id called employee_id.
我需要从该表中删除的行存储在一个名为EmployeesToDelete(约10K 条记录)的单独临时表中,外键引用名为employee_id 的Employees.id。
Is there an efficient way to do this?
有没有一种有效的方法来做到这一点?
At first, I thought of the following:
起初,我想到了以下几点:
DELETE
FROM Employees
WHERE id IN
(
SELECT employee_id
FROM EmployeesToDelete
)
But I heard that using the "IN" clause and subqueries can be inefficient, especially with larger tables.
但我听说使用“IN”子句和子查询效率低下,尤其是对于较大的表。
I've looked at the PostgreSQL 8.1 documentation, and there's mention of DELETE FROM ... USING but it doesn't have examples so I'm not sure how to use it.
我查看了 PostgreSQL 8.1 文档,并提到了 DELETE FROM ... USING 但它没有示例,所以我不确定如何使用它。
I'm wondering if the following works and is more efficient?
我想知道以下是否有效并且效率更高?
DELETE
FROM Employees
USING Employees e
INNER JOIN
EmployeesToDelete ed
ON e.id = ed.employee_id
Your comments are greatly appreciated.
非常感谢您的意见。
Edit: I ran EXPLAIN ANALYZE and the weird thing is that the first DELETE ran pretty quickly (within seconds), while the second DELETE took so long (over 20 min) I eventually cancelled it.
编辑:我运行了 EXPLAIN ANALYZE,奇怪的是第一个 DELETE 运行得很快(几秒钟内),而第二个 DELETE 花了很长时间(超过 20 分钟),我最终取消了它。
Adding an index to the temp table helped the performance quite a bit.
向临时表添加索引对性能有很大帮助。
Here's a query plan of the first DELETE for anyone interested:
这是任何感兴趣的人的第一个 DELETE 的查询计划:
Hash Join (cost=184.64..7854.69 rows=256482 width=6) (actual time=54.089..660.788 rows=27295 loops=1)
Hash Cond: ("outer".id = "inner".employee_id)
-> Seq Scan on Employees (cost=0.00..3822.82 rows=256482 width=10) (actual time=15.218..351.978 rows=256482 loops=1)
-> Hash (cost=184.14..184.14 rows=200 width=4) (actual time=38.807..38.807 rows=10731 loops=1)
-> HashAggregate (cost=182.14..184.14 rows=200 width=4) (actual time=19.801..28.773 rows=10731 loops=1)
-> Seq Scan on EmployeesToDelete (cost=0.00..155.31 rows=10731 width=4) (actual time=0.005..9.062 rows=10731 loops=1)
Total runtime: 935.316 ms
(7 rows)
At this point, I'll stick with the first DELETE unless I can find a better way of writing it.
在这一点上,除非我能找到更好的书写方式,否则我将坚持使用第一个 DELETE。
回答by bortzmeyer
Don't guess, measure. Try the various methods and see which one is the shortest to execute. Also, use EXPLAINto know what PostgreSQL will do and see where you can optimize. Very few PostgreSQL users are able to guess correctlythe fastest query...
不要猜测,测量。尝试各种方法,看看哪一种执行时间最短。此外,使用EXPLAIN来了解 PostgreSQL 会做什么,并查看可以优化的地方。很少有 PostgreSQL 用户能够正确猜出最快的查询......
回答by Quassnoi
I'm wondering if the following works and is more efficient?
我想知道以下是否有效并且效率更高?
DELETE
FROM Employees e
USING EmployeesToDelete ed
WHERE id = ed.employee_id;
This totally depend on your index selectivity.
这完全取决于您的索引选择性。
PostgreSQL
tends to employ MERGE IN JOIN
for IN
predicates, which has stable execution time.
PostgreSQL
倾向于聘请MERGE IN JOIN
了IN
谓词,它具有稳定的执行时间。
It's not affected by how many rows satisfy this condition, provided that you already have an ordered resultset.
它不受满足此条件的行数的影响,前提是您已经有一个有序的结果集。
An ordered resultset requires either a sort operation or an index. Full index traversal is very inefficient in PostgreSQL
compared to SEQ SCAN
.
有序结果集需要排序操作或索引。全索引遍历是非常低效PostgreSQL
相比SEQ SCAN
。
The JOIN
predicate, on the other hand, may benefit from using NESTED LOOPS
if your index is very selective, and from using HASH JOIN
is it's inselective.
该JOIN
谓词,而另一方面,可以使用中获益NESTED LOOPS
,如果你的指数是非常有选择性的,并且使用HASH JOIN
的是它的inselective。
PostgreSQL
should select the right one by estimating the row count.
PostgreSQL
应该通过估计行数来选择正确的。
Since you have 30k
rows against 260K
rows, I expect HASH JOIN
to be more efficient, and you should try to build a plan on a DELETE ... USING
query.
由于您有30k
针对260K
行的行,我希望HASH JOIN
效率更高,您应该尝试在DELETE ... USING
查询上构建计划。
To make sure, please post execution plan for both queries.
为了确保,请发布两个查询的执行计划。
回答by matt b
I'm not sure about the DELETE FROM ... USING syntax
, but generally, a subquery should logically be the same thing as an INNER JOIN
anyway. The database query optimizer should be capable (and this is just a guess) of executing the same query plan for both.
我不确定DELETE FROM ... USING syntax
,但一般来说,子查询在逻辑上应该与INNER JOIN
无论如何都是一样的。数据库查询优化器应该能够(这只是一个猜测)为两者执行相同的查询计划。
回答by Sophie Alpert
Why can't you delete the rows in the first place instead of adding them to the EmployeesToDelete
table?
为什么不能首先删除行而不是将它们添加到EmployeesToDelete
表中?
Or if you need to undo, just add a "deleted" flag to Employees
, so you can reverse the deletion, or make in permanent, all in one table?
或者,如果您需要撤消,只需在 中添加一个“已删除”标志Employees
,这样您就可以在一个表中撤销删除或永久删除?