postgresql Postgres 没有表现
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17813492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Postgres NOT IN performance
提问by wutzebaer
Any ideas how to speed up this query?
任何想法如何加快此查询?
Input
输入
EXPLAIN SELECT entityid FROM entity e
LEFT JOIN level1entity l1 ON l1.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
WHERE
l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'
AND
(entityid NOT IN
(1377776,1377792,1377793,1377794,1377795,1377796... 50000 ids)
)
Output
输出
Nested Loop (cost=0.00..1452373.79 rows=3865 width=8)
-> Nested Loop (cost=0.00..8.58 rows=1 width=8)
Join Filter: (l1.level2_level2id = l2.level2id)
-> Seq Scan on level2entity l2 (cost=0.00..3.17 rows=1 width=8)
Filter: ((userid)::text = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'::text)
-> Seq Scan on level1entity l1 (cost=0.00..4.07 rows=107 width=16)
-> Index Scan using fk_fk18edb1cfb2a41235_idx on entity e (cost=0.00..1452086.09 rows=22329 width=16)
Index Cond: (level1_level1id = l1.level1id)
OK here a simplified version, the joins aren't the bottleneck
好的,这里是一个简化版本,连接不是瓶颈
SELECT enitityid FROM
(SELECT enitityid FROM enitity e LIMIT 5000) a
WHERE
(enitityid NOT IN
(1377776,1377792,1377793,1377794,1377795, ... 50000 ids)
)
the problem is to find the enties which don't have any of these ids
问题是找到没有任何这些 id 的实体
EXPLAIN
解释
Subquery Scan on a (cost=0.00..312667.76 rows=1 width=8)
Filter: (e.entityid <> ALL ('{1377776,1377792,1377793,1377794, ... 50000 ids}'::bigint[]))
-> Limit (cost=0.00..111.51 rows=5000 width=8)
-> Seq Scan on entity e (cost=0.00..29015.26 rows=1301026 width=8)
回答by Craig Ringer
A huge IN
list is very inefficient. PostgreSQL should ideally identify it and turn it into a relation that it does an anti-join on, but at this point the query planner doesn't know how to do that, and the planning time required to identify this case would cost every query that uses NOT IN
sensibly, so it'd have to be a very low cost check. See this earlier much more detailed answer on the topic.
一个庞大的IN
列表是非常低效的。理想情况下,PostgreSQL 应该识别它并将其转换为一个它进行反连接的关系,但此时查询计划器不知道如何做到这一点,并且识别这种情况所需的计划时间将花费每个查询NOT IN
明智地使用,所以它必须是一个非常低成本的检查。请参阅较早前关于该主题的更详细的答案。
As David Aldridge wrote this is best solved by turning it into an anti-join. I'd write it as a join over a VALUES
list simply because PostgreSQL is extremely fast at parsing VALUES
lists into relations, but the effect is the same:
正如大卫·奥尔德里奇 (David Aldridge) 所写,最好通过将其转换为反联接来解决此问题。我会把它写成一个VALUES
列表的连接,因为 PostgreSQL在将列表解析VALUES
为关系方面非常快,但效果是一样的:
SELECT entityid
FROM entity e
LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
LEFT OUTER JOIN (
VALUES
(1377776),(1377792),(1377793),(1377794),(1377795),(1377796)
) ex(ex_entityid) ON (entityid = ex_entityid)
WHERE l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'
AND ex_entityid IS NULL;
For a sufficiently large set of values you might even be better off creating a temporary table, COPY
ing the values into it, creating a PRIMARY KEY
on it, and joining on that.
对于足够大的一组值,您甚至最好创建一个临时表,COPY
将值放入其中,PRIMARY KEY
在其上创建一个,然后加入该表。
More possibilities explored here:
这里探索了更多的可能性:
回答by David Aldridge
You might get a better result if you can rewrite the query to use a hash anti-join.
如果您可以重写查询以使用散列反连接,您可能会得到更好的结果。
Something like:
就像是:
with exclude_list as (
select unnest(string_to_array('1377776,1377792,1377793,1377794,1377795, ...',','))::integer entity_id)
select entity_id
from entity left join exclude_list on entity.entity_id = exclude_list.entity_id
where exclude_list.entity_id is null;
回答by wutzebaer
ok my solution was
好的,我的解决方案是
- select all entities
- left join all entities which have one of the ids (without the not is is faster) on the entityid
- select all rows where the joined select is NULL
- 选择所有实体
- 左加入在 entityid 上具有其中一个 id(没有 not 更快)的所有实体
- 选择连接选择为 NULL 的所有行
as explained in
如解释
http://blog.hagander.net/archives/66-Speeding-up-NOT-IN.html
http://blog.hagander.net/archives/66-Speeding-up-NOT-IN.html
回答by Louis Ricci
Since you are requiring level2entity record because of your where clause check for a specific userid "l2.userid = " You should make your "LEFT JOIN level2entity" into an "INNER JOIN level2entity"
由于您需要 level2entity 记录,因为您的 where 子句检查特定用户 ID“l2.userid =”,因此您应该将“LEFT JOIN level2entity”变成“INNER JOIN level2entity”
INNER JOIN level2entity l2 ON l2.level2id = l1.level2_level2id AND l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'
This will, hopefully, filter down your entity's so your NOT IN will have less work to do.
希望这会过滤掉您的实体,这样您的 NOT IN 将有更少的工作要做。