SQL 两列性能问题的左外连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/444820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Left outer join on two columns performance issue
提问by Jason Baker
I'm using a SQL query that is similar to the following form:
我正在使用类似于以下形式的 SQL 查询:
SELECT col1, col2
FROM table1
LEFT OUTER JOIN table2
ON table1.person_uid = table2.person_uid
AND table1.period = table2.period
And it's either way too slow or something's deadlocking because it takes at least 4 minutes to return. If I were to change it to this:
它要么太慢,要么陷入僵局,因为它至少需要 4 分钟才能返回。如果我要把它改成这样:
SELECT col1, col2
FROM table1
LEFT OUTER JOIN table2
ON table1.person_uid = table2.person_uid
WHERE table1.period = table2.period
then it works fine (albeit not returning the right number of columns). Is there any way to speed this up?
然后它工作正常(尽管没有返回正确的列数)。有没有办法加快这个速度?
UPDATE: It does the same thing if I switch the last two lines of the latter query:
更新:如果我切换后一个查询的最后两行,它会做同样的事情:
SELECT col1, col2
FROM table1
LEFT OUTER JOIN table2
ON table1.period = table2.period
WHERE table1.person_uid = table2.person_uid
UPDATE 2:These are actually views that I'm joining. Unfortunately, they're on a database I don't have control over, so I can't (easily) make any changes to the indexing. I am inclined to agree that this is an indexing issue though. I'll wait a little while before accepting an answer in case there's some magical way to tune this query that I don't know about. Otherwise, I'll accept one of the current answers and try to figure out another way to do what I want to do. Thanks for everybody's help so far.
更新 2:这些实际上是我加入的观点。不幸的是,它们位于我无法控制的数据库上,因此我无法(轻松)对索引进行任何更改。我倾向于同意这是一个索引问题。我会等一会儿再接受答案,以防万一有一些我不知道的神奇方法来调整这个查询。否则,我将接受当前的答案之一,并尝试找出另一种方法来做我想做的事情。到目前为止,感谢大家的帮助。
回答by cletus
Bear in mind that statements 2 and 3 are different to the first one.
请记住,陈述 2 和 3 与第一个不同。
How? Well, you're doing a left outer join and your WHERE clause isn't taking that into account (like the ON clause does). At a minimum, try:
如何?好吧,你正在做一个左外连接,你的 WHERE 子句没有考虑到这一点(就像 ON 子句一样)。至少,请尝试:
SELECT col1, col2
FROM table1, table2
WHERE table1.person_uid = table2.person_uid (+)
AND table1.period = table2.period (+)
and see if you get the same performance issue.
并查看您是否遇到相同的性能问题。
What indexes do you have on these tables? Is this relationship defined by a foreign key constraint?
你在这些表上有什么索引?这种关系是否由外键约束定义?
What you probably need is a composite index on both person_uid and period (on both tables).
您可能需要的是 person_uid 和 period(在两个表上)的复合索引。
回答by HLGEM
I think you need to understand why the last two are not the same query as the first one. If you do a left join and then add a where clause referncing a field in the table on the right side of the join (the one which may not always have a record to match the first table), then you have effectively changed the join to an inner join. There is one exception to this and that is if you reference something like
我认为您需要了解为什么最后两个查询与第一个查询不同。如果您进行左连接,然后在连接右侧的表中添加一个引用字段的 where 子句(可能并不总是有与第一个表匹配的记录),那么您已经有效地将连接更改为内部连接。对此有一个例外,那就是如果您引用类似
SELECT col1, col2
FROM table1
LEFT OUTER JOIN table2
ON table1.person_uid = table2.person_uid
WHERE table2.person_uid is null
In this case you asking for the record which don't have a record in the second table. But other than this special case, you are changing the left join to an inner join if you refence a field in table2 in the where clause.
在这种情况下,您要求在第二个表中没有记录的记录。但除了这种特殊情况,如果您在 where 子句中引用 table2 中的字段,您会将左连接更改为内连接。
If your query is not fast enough, I would look at your indexing.
如果您的查询速度不够快,我会查看您的索引。
回答by Dave Costa
Anything anyone tells you based on the information you provided is a guess.
任何人根据您提供的信息告诉您的任何内容都是猜测。
Look at the execution plan for the query. If you don't see a reason for the slowness in the plan, the post the plan here.
查看查询的执行计划。如果您没有看到计划缓慢的原因,请在此处发布计划。
http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/ex_plan.htm#PFGRF009
http://download.oracle.com/docs/cd/B28359_01/server.111/b28274/ex_plan.htm#PFGRF009
回答by Andrew Rollings
Do you have covering indexes on person_uid
and period
for both tables?
你对覆盖索引person_uid
和period
两个表?
If not, add them and try again.
如果没有,请添加它们并重试。
Take a look at the execution plan and see what the query is actually doing.
查看执行计划并查看查询实际执行的操作。
Also: What are the datatypes of the fields? Are they the same in both tables? An implicit cast can really slow things down.
另外:字段的数据类型是什么?它们在两个表中是否相同?隐式强制转换真的可以减慢速度。
回答by Paul Tomblin
Do these tables have indexes on the columns you're joining? Install Oracle's free SQLDeveloper product and use it to do an "explain" on that query and see if it's doing sequential scans of both tables.
这些表在您加入的列上有索引吗?安装 Oracle 的免费 SQLDeveloper 产品并使用它对该查询进行“解释”,看看它是否正在对两个表进行顺序扫描。
回答by MatBailie
In a left join you'd be scanning table1 for each unique combination of (person_uid,period) then searching table2 for all corresponding records there. If table2 doesn't have an appropriate index, this can involve scanning the whole of that table too.
在左连接中,您将扫描 table1 以获取 (person_uid,period) 的每个唯一组合,然后在 table2 中搜索所有相应的记录。如果 table2 没有合适的索引,这也可能涉及扫描整个表。
My best guess, without seeing an execution plan, is that the first query (the only one which seems to be correct) is having to table scan table2 as well as table1.
我最好的猜测是,在没有看到执行计划的情况下,第一个查询(唯一一个似乎正确的查询)必须对 table2 和 table1 进行表扫描。
As you say that you can't change the indexes, you need to change the query. As far as I can tell, there is only one realistic alternative...
正如您所说,您无法更改索引,您需要更改查询。据我所知,只有一种现实的选择......
SELECT
col1, col2
FROM
table2
FULL OUTER JOIN
table1
ON table1.person_uid = table2.person_uid
AND table1.period = table2.period
WHERE
table1.person_uid IS NOT NULL
The hope here is that you scan table2 for each unique combination of (person_uid, period), but make use of indexes on table1. (As opposed to scanning table1 and making use of indexes on table2, which what I expected from your query.)
这里希望您扫描 table2 以获取 (person_uid, period) 的每个唯一组合,但使用 table1 上的索引。(与扫描 table1 并使用 table2 上的索引相反,这是我从您的查询中期望的。)
If table1 doesn't have appropriate indexes, however, you'll be very unlikely to see any performance improvement at all...
但是,如果 table1 没有适当的索引,则您根本看不到任何性能改进......
Dems.
民主党。
回答by HLGEM
In one of the updates the OP states that he is actually querying views not tables. In this case, the performance could well be increased by directly querying the tables he needs especially if the views are complex and join to many other tables that do not contain information he needs or they are views that call views.
在其中一项更新中,OP 声明他实际上是在查询视图而不是表。在这种情况下,可以通过直接查询他需要的表来提高性能,特别是如果视图很复杂并且连接到许多其他不包含他需要的信息的表或者它们是调用视图的视图。
回答by BobC
ANSI join syntax provides a very clear distinction between JOIN conditions and FILTER predicates; this is very important when writing outer joins. Using the emp/dept tables, look at the results from the following two outer joins
ANSI 连接语法在 JOIN 条件和 FILTER 谓词之间提供了非常明确的区别;这在编写外连接时非常重要。使用 emp/dept 表,查看以下两个外连接的结果
Q1
第一季度
SELECT dname, d.deptno, e.ename, e.mgr, d.loc
FROM dept d
LEFT OUTER JOIN emp e
on d.deptno = e.deptno
and loc in ('NEW YORK','BOSTON' )
;
DNAME DEPTNO ENAME MGR LOC
-------------- ---------- ---------- ---------- -------------
ACCOUNTING 10 CLARK 7839 NEW YORK
ACCOUNTING 10 KING NEW YORK
ACCOUNTING 10 MILLER 7782 NEW YORK
RESEARCH 20 DALLAS
SALES 30 CHICAGO
OPERATIONS 40 BOSTON
====
====
Q2
SELECT dname, d.deptno, e.ename, e.mgr, d.loc
FROM dept d
LEFT OUTER JOIN emp e
on d.deptno = e.deptno
where loc in ('NEW YORK','BOSTON' )
;
DNAME DEPTNO ENAME MGR LOC
-------------- ---------- ---------- ---------- -------------
ACCOUNTING 10 CLARK 7839 NEW YORK
ACCOUNTING 10 KING NEW YORK
ACCOUNTING 10 MILLER 7782 NEW YORK
OPERATIONS 40 BOSTON
The first example, Q1 shows is an example of "joining on a constant". Essentially, the filter condition is applied prior to performing the outer join. So you eliminate rows, which are subsequently added back as part of the outer join. It's not necessarily wrong, but is that the query that you really asked for? Often it is the results shown in Q2 that are required, where the filter is applied after the (outer) join.
Q1 显示的第一个示例是“加入常量”的示例。本质上,过滤条件是在执行外连接之前应用的。因此,您消除了行,这些行随后作为外连接的一部分重新添加。这不一定是错误的,但这是您真正要求的查询吗?通常需要在 Q2 中显示的结果,其中在(外部)连接之后应用过滤器。
There is also a performance implication too, for large data sets. In many cases, joining on a constant has to be resolved internally by the optimizer by creating a lateral view, which can usually only be optimized via a nested loop join rather than a hash join
对于大型数据集,也有性能影响。在许多情况下,连接常量必须由优化器通过创建横向视图在内部解决,这通常只能通过嵌套循环连接而不是散列连接进行优化
For developers who are familiar with the Oracle outer join syntax, the query would probably have been written as
对于熟悉 Oracle 外连接语法的开发人员,查询可能会写成
SELECT dname, d.deptno, e.ename, e.mgr, d.loc
FROM dept d
,emp e
where d.deptno = e.deptno(+)
and loc in ('NEW YORK','BOSTON' )
This query is semantically equivalent as Q2 above.
此查询在语义上与上面的 Q2 等效。
So in summary, it's extremely important the that you understand the different between the JOIN clause and the WHERE clause when writing ANSI outer joins.
所以总而言之,在编写 ANSI 外连接时,了解 JOIN 子句和 WHERE 子句之间的区别非常重要。