postgresql Postgres 没有表现

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17813492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:02:00  来源:igfitidea点击:

Postgres NOT IN performance

performancepostgresqlpostgresql-performance

提问by wutzebaer

Any ideas how to speed up this query?

任何想法如何加快此查询?

Input

输入

EXPLAIN SELECT entityid FROM entity e

LEFT JOIN level1entity l1 ON l1.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
WHERE 

l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' 
AND 
(entityid NOT IN 
(1377776,1377792,1377793,1377794,1377795,1377796... 50000 ids)
)

Output

输出

Nested Loop  (cost=0.00..1452373.79 rows=3865 width=8)
  ->  Nested Loop  (cost=0.00..8.58 rows=1 width=8)
        Join Filter: (l1.level2_level2id = l2.level2id)
        ->  Seq Scan on level2entity l2  (cost=0.00..3.17 rows=1 width=8)
              Filter: ((userid)::text = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'::text)
        ->  Seq Scan on level1entity l1  (cost=0.00..4.07 rows=107 width=16)
  ->  Index Scan using fk_fk18edb1cfb2a41235_idx on entity e  (cost=0.00..1452086.09 rows=22329 width=16)
        Index Cond: (level1_level1id = l1.level1id)

OK here a simplified version, the joins aren't the bottleneck

好的,这里是一个简化版本,连接不是瓶颈

SELECT enitityid FROM 
(SELECT enitityid FROM enitity e LIMIT 5000) a

WHERE
(enitityid NOT IN 
(1377776,1377792,1377793,1377794,1377795, ... 50000 ids)
)

the problem is to find the enties which don't have any of these ids

问题是找到没有任何这些 id 的实体

EXPLAIN

解释

Subquery Scan on a  (cost=0.00..312667.76 rows=1 width=8)
  Filter: (e.entityid <> ALL ('{1377776,1377792,1377793,1377794, ... 50000 ids}'::bigint[]))
  ->  Limit  (cost=0.00..111.51 rows=5000 width=8)
        ->  Seq Scan on entity e  (cost=0.00..29015.26 rows=1301026 width=8)

回答by Craig Ringer

A huge INlist is very inefficient. PostgreSQL should ideally identify it and turn it into a relation that it does an anti-join on, but at this point the query planner doesn't know how to do that, and the planning time required to identify this case would cost every query that uses NOT INsensibly, so it'd have to be a very low cost check. See this earlier much more detailed answer on the topic.

一个庞大的IN列表是非常低效的。理想情况下,PostgreSQL 应该识别它并将其转换为一个它进行反连接的关系,但此时查询计划器不知道如何做到这一点,并且识别这种情况所需的计划时间将花费每个查询NOT IN明智地使用,所以它必须是一个非常低成本的检查。请参阅较早前关于该主题的更详细的答案

As David Aldridge wrote this is best solved by turning it into an anti-join. I'd write it as a join over a VALUESlist simply because PostgreSQL is extremely fast at parsing VALUESlists into relations, but the effect is the same:

正如大卫·奥尔德里奇 (David Aldridge) 所写,最好通过将其转换为反联接来解决此问题。我会把它写成一个VALUES列表的连接,因为 PostgreSQL在将列表解析VALUES为关系方面非常快,但效果是一样的:

SELECT entityid 
FROM entity e
LEFT JOIN level1entity l1 ON l.level1id = e.level1_level1id
LEFT JOIN level2entity l2 ON l2.level2id = l1.level2_level2id
LEFT OUTER JOIN (
    VALUES
    (1377776),(1377792),(1377793),(1377794),(1377795),(1377796)
) ex(ex_entityid) ON (entityid = ex_entityid)
WHERE l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f' 
AND ex_entityid IS NULL; 

For a sufficiently large set of values you might even be better off creating a temporary table, COPYing the values into it, creating a PRIMARY KEYon it, and joining on that.

对于足够大的一组值,您甚至最好创建一个临时表,COPY将值放入其中,PRIMARY KEY在其上创建一个,然后加入该表。

More possibilities explored here:

这里探索了更多的可能性:

https://stackoverflow.com/a/17038097/398670

https://stackoverflow.com/a/17038097/398670

回答by David Aldridge

You might get a better result if you can rewrite the query to use a hash anti-join.

如果您可以重写查询以使用散列反连接,您可能会得到更好的结果。

Something like:

就像是:

with exclude_list as (
  select unnest(string_to_array('1377776,1377792,1377793,1377794,1377795, ...',','))::integer entity_id)
select entity_id
from   entity left join exclude_list on entity.entity_id = exclude_list.entity_id
where  exclude_list.entity_id is null;

回答by wutzebaer

ok my solution was

好的,我的解决方案是

  • select all entities
  • left join all entities which have one of the ids (without the not is is faster) on the entityid
  • select all rows where the joined select is NULL
  • 选择所有实体
  • 左加入在 entityid 上具有其中一个 id(没有 not 更快)的所有实体
  • 选择连接选择为 NULL 的所有行

as explained in

如解释

http://blog.hagander.net/archives/66-Speeding-up-NOT-IN.html

http://blog.hagander.net/archives/66-Speeding-up-NOT-IN.html

回答by Louis Ricci

Since you are requiring level2entity record because of your where clause check for a specific userid "l2.userid = " You should make your "LEFT JOIN level2entity" into an "INNER JOIN level2entity"

由于您需要 level2entity 记录,因为您的 where 子句检查特定用户 ID“l2.userid =”,因此您应该将“LEFT JOIN level2entity”变成“INNER JOIN level2entity”

INNER JOIN level2entity l2 ON l2.level2id = l1.level2_level2id AND l2.userid = 'a987c246-65e5-48f6-9d2d-a7bcb6284c8f'

This will, hopefully, filter down your entity's so your NOT IN will have less work to do.

希望这会过滤掉您的实体,这样您的 NOT IN 将有更少的工作要做。