postgresql postgres 中的表连接顺序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1468302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Table join order in postgres
提问by Jay
Is there a way for me to force a specific join order in Postgres?
有没有办法让我在 Postgres 中强制执行特定的连接顺序?
I've got a query that looks like this. I've eliminated a bunch of stuff that was in the real query, but this simplification demonstrates the issue. What's left shouldn't be too cryptic: Using a role/task security system, I'm trying to determine whether a given user has privileges to perform a given task.
我有一个看起来像这样的查询。我已经消除了实际查询中的一堆东西,但这种简化说明了这个问题。剩下的不应该太神秘:使用角色/任务安全系统,我试图确定给定用户是否具有执行给定任务的权限。
select task.taskid
from userlogin
join userrole using (userloginid)
join roletask using (roleid)
join task using (taskid)
where loginname='foobar'
and taskfunction='plugh'
But I realized that the program already knows the value of userlogin, so it seemed the query could be made more efficient by skipping the lookup on userlogin and just filling in the userloginid, like this:
但是我意识到该程序已经知道 userlogin 的值,因此似乎可以通过跳过对 userlogin 的查找并仅填写 userloginid 来提高查询效率,如下所示:
select task.taskid
from userrole
join roletask using (roleid)
join task using (taskid)
where userloginid=42
and taskfunction='plugh'
When I did that -- eliminating a table from the query and hard-coding the value retrieved from that table instead -- the explain plan time went up! In the original query, Postgres read userlogin then userrole then roletask then task. But in the new query, it decided to read roletask first, and then join to userrole, even though this required doing a full-file scan on roletask.
当我这样做时——从查询中删除一个表并硬编码从该表中检索到的值——解释计划时间增加了!在原始查询中,Postgres 读取 userlogin 然后 userrole 然后 roletask 然后 task。但是在新的查询中,它决定先读取角色任务,然后加入用户角色,即使这需要对角色任务进行全文件扫描。
Full explain plans are:
完整的解释计划是:
Version 1:
版本 1:
Hash Join (cost=12.79..140.82 rows=1 width=8)
Hash Cond: (roletask.taskid = task.taskid)
-> Nested Loop (cost=4.51..129.73 rows=748 width=8)
-> Nested Loop (cost=4.51..101.09 rows=12 width=8)
-> Index Scan using idx_userlogin_loginname on userlogin (cost=0.00..8.27 rows=1 width=8)
Index Cond: ((loginname)::text = 'foobar'::text)
-> Bitmap Heap Scan on userrole (cost=4.51..92.41 rows=33 width=16)
Recheck Cond: (userrole.userloginid = userlogin.userloginid)
-> Bitmap Index Scan on idx_userrole_login (cost=0.00..4.50 rows=33 width=0)
Index Cond: (userrole.userloginid = userlogin.userloginid)
-> Index Scan using idx_roletask_role on roletask (cost=0.00..1.50 rows=71 width=16)
Index Cond: (roletask.roleid = userrole.roleid)
-> Hash (cost=8.27..8.27 rows=1 width=8)
-> Index Scan using idx_task_taskfunction on task (cost=0.00..8.27 rows=1 width=8)
Index Cond: ((taskfunction)::text = 'plugh'::text)
Version 2:
版本 2:
Hash Join (cost=96.58..192.82 rows=4 width=8)
Hash Cond: (roletask.roleid = userrole.roleid)
-> Hash Join (cost=8.28..104.10 rows=9 width=16)
Hash Cond: (roletask.taskid = task.taskid)
-> Seq Scan on roletask (cost=0.00..78.35 rows=4635 width=16)
-> Hash (cost=8.27..8.27 rows=1 width=8)
-> Index Scan using idx_task_taskfunction on task (cost=0.00..8.27 rows=1 width=8)
Index Cond: ((taskfunction)::text = 'plugh'::text)
-> Hash (cost=87.92..87.92 rows=31 width=8)
-> Bitmap Heap Scan on userrole (cost=4.49..87.92 rows=31 width=8)
Recheck Cond: (userloginid = 42)
-> Bitmap Index Scan on idx_userrole_login (cost=0.00..4.49 rows=31 width=0)
Index Cond: (userloginid = 42)
(Yes, I know that in both cases the costs are low and the difference doesn't look like it would matter. But this is after I eliminated a bunch of additional work from the query to simplify what I have to post. The real query still isn't outrageous, but I'm more interested in the principle.)
(是的,我知道在这两种情况下成本都很低,而且差异看起来并不重要。但这是在我从查询中删除了一堆额外工作以简化我必须发布的内容之后。真正的查询仍然不离谱,但我对原理更感兴趣。)
采纳答案by Bill Karwin
This page in the documentation describes how to prevent the PostgreSQL optimizer from reordering joined tables, allowing you to control the order of joins yourself:
文档中的这个页面描述了如何防止 PostgreSQL 优化器重新排序连接的表,允许您自己控制连接的顺序:
http://www.postgresql.org/docs/current/interactive/explicit-joins.html
http://www.postgresql.org/docs/current/interactive/explicit-joins.html
回答by Ants Aasma
Are you sure your table statistics are up to date? When PostgreSQLs cost based optimizer fails with such trivial things it's a pretty good sign something is seriously wrong with the table statistics. It's better to fix the root cause than to work around it by overriding the built in optimizer because the problem will inevitably pop up somewhere else as well.
您确定您的表统计信息是最新的吗?当 PostgreSQL 的基于成本的优化器因这些琐碎的事情而失败时,这是一个很好的迹象,表统计数据存在严重错误。解决根本原因比通过覆盖内置优化器来解决要好,因为问题也不可避免地会出现在其他地方。
Run ANALYZE
on the affected tables and see if it makes PostgreSQL pick a different plan. If it still chooses something silly it would be really interesting to see the query plans. The optimizer not doing the right thing is usually considered a bug.
ANALYZE
在受影响的表上运行,看看它是否让 PostgreSQL 选择不同的计划。如果它仍然选择一些愚蠢的东西,那么查看查询计划会非常有趣。优化器没有做正确的事情通常被认为是一个错误。