SQL Server 中的 INNER JOIN 与 LEFT JOIN 性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2726657/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
INNER JOIN vs LEFT JOIN performance in SQL Server
提问by Anonymous
I've created SQL command that uses INNER JOIN on 9 tables, anyway this command takes a very long time (more than five minutes). So my folk suggested me to change INNER JOIN to LEFT JOIN because the performance of LEFT JOIN is better, despite what I know. After I changed it, the speed of query got significantly improved.
我已经创建了在 9 个表上使用 INNER JOIN 的 SQL 命令,无论如何这个命令需要很长时间(超过五分钟)。所以我的朋友建议我将 INNER JOIN 改为 LEFT JOIN 因为 LEFT JOIN 的性能更好,尽管我知道。改了之后,查询速度明显提高了。
I would like to know why LEFT JOIN is faster than INNER JOIN?
我想知道为什么 LEFT JOIN 比 INNER JOIN 快?
My SQL command look like below:
SELECT * FROM A INNER JOIN B ON ... INNER JOIN C ON ... INNER JOIN D
and so on
我的 SQL 命令如下所示:
SELECT * FROM A INNER JOIN B ON ... INNER JOIN C ON ... INNER JOIN D
等等
Update:This is brief of my schema.
更新:这是我的架构简介。
FROM sidisaleshdrmly a -- NOT HAVE PK AND FK
INNER JOIN sidisalesdetmly b -- THIS TABLE ALSO HAVE NO PK AND FK
ON a.CompanyCd = b.CompanyCd
AND a.SPRNo = b.SPRNo
AND a.SuffixNo = b.SuffixNo
AND a.dnno = b.dnno
INNER JOIN exFSlipDet h -- PK = CompanyCd, FSlipNo, FSlipSuffix, FSlipLine
ON a.CompanyCd = h.CompanyCd
AND a.sprno = h.AcctSPRNo
INNER JOIN exFSlipHdr c -- PK = CompanyCd, FSlipNo, FSlipSuffix
ON c.CompanyCd = h.CompanyCd
AND c.FSlipNo = h.FSlipNo
AND c.FSlipSuffix = h.FSlipSuffix
INNER JOIN coMappingExpParty d -- NO PK AND FK
ON c.CompanyCd = d.CompanyCd
AND c.CountryCd = d.CountryCd
INNER JOIN coProduct e -- PK = CompanyCd, ProductSalesCd
ON b.CompanyCd = e.CompanyCd
AND b.ProductSalesCd = e.ProductSalesCd
LEFT JOIN coUOM i -- PK = UOMId
ON h.UOMId = i.UOMId
INNER JOIN coProductOldInformation j -- PK = CompanyCd, BFStatus, SpecCd
ON a.CompanyCd = j.CompanyCd
AND b.BFStatus = j.BFStatus
AND b.ProductSalesCd = j.ProductSalesCd
INNER JOIN coProductGroup1 g1 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup1Cd
ON e.ProductGroup1Cd = g1.ProductGroup1Cd
INNER JOIN coProductGroup2 g2 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup2Cd
ON e.ProductGroup1Cd = g2.ProductGroup1Cd
回答by Aaronaught
A LEFT JOIN
is absolutely not faster than an INNER JOIN
. In fact, it's slower; by definition, an outer join (LEFT JOIN
or RIGHT JOIN
) has to do all the work of an INNER JOIN
plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.
ALEFT JOIN
绝对不比 an 快INNER JOIN
。事实上,它更慢;根据定义,外连接(LEFT JOIN
或RIGHT JOIN
)必须完成所有的工作,INNER JOIN
加上空扩展结果的额外工作。预计还会返回更多行,从而进一步增加总执行时间,因为结果集的大小更大。
(And even if a LEFT JOIN
werefaster in specificsituations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN
, so you cannot simply go replacing all instances of one with the other!)
(而且即使LEFT JOIN
是在更快的特定情况下,由于一些难以想象的因素汇合,它不是功能上等同于INNER JOIN
,所以你不能简单地去更换一个与其他的所有实例!)
Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.
很可能您的性能问题出在其他地方,例如没有正确索引候选键或外键。9 张桌子需要加入很多,所以几乎可以在任何地方放慢速度。如果您发布架构,我们可能会提供更多详细信息。
Edit:
编辑:
Reflecting further on this, I could think of one circumstance under which a LEFT JOIN
might be faster than an INNER JOIN
, and that is when:
进一步思考这一点,我可以想到一种情况,在这种情况下 aLEFT JOIN
可能比 an 快INNER JOIN
,那就是:
- Some of the tables are verysmall (say, under 10 rows);
- The tables do not have sufficient indexes to cover the query.
- 一些表是非常小的(比如说,在10行);
- 表没有足够的索引来覆盖查询。
Consider this example:
考虑这个例子:
CREATE TABLE #Test1
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test1 (ID, Name) VALUES (1, 'One')
INSERT #Test1 (ID, Name) VALUES (2, 'Two')
INSERT #Test1 (ID, Name) VALUES (3, 'Three')
INSERT #Test1 (ID, Name) VALUES (4, 'Four')
INSERT #Test1 (ID, Name) VALUES (5, 'Five')
CREATE TABLE #Test2
(
ID int NOT NULL PRIMARY KEY,
Name varchar(50) NOT NULL
)
INSERT #Test2 (ID, Name) VALUES (1, 'One')
INSERT #Test2 (ID, Name) VALUES (2, 'Two')
INSERT #Test2 (ID, Name) VALUES (3, 'Three')
INSERT #Test2 (ID, Name) VALUES (4, 'Four')
INSERT #Test2 (ID, Name) VALUES (5, 'Five')
SELECT *
FROM #Test1 t1
INNER JOIN #Test2 t2
ON t2.Name = t1.Name
SELECT *
FROM #Test1 t1
LEFT JOIN #Test2 t2
ON t2.Name = t1.Name
DROP TABLE #Test1
DROP TABLE #Test2
If you run this and view the execution plan, you'll see that the INNER JOIN
query does indeed cost more than the LEFT JOIN
, because it satisfies the two criteria above. It's because SQL Server wants to do a hash match for the INNER JOIN
, but does nested loops for the LEFT JOIN
; the former is normallymuch faster, but since the number of rows is so tiny andthere's no index to use, the hashing operation turns out to be the most expensive part of the query.
如果您运行它并查看执行计划,您会看到INNER JOIN
查询确实比 花费更多LEFT JOIN
,因为它满足上述两个条件。这是因为 SQL Server 想要对 进行哈希匹配INNER JOIN
,但对 进行嵌套循环LEFT JOIN
;前者通常要快得多,但由于行的数量是如此的渺小和没有索引的使用,散列运算结果是查询的最昂贵的部分。
You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it's O(N) vs. O(1) for the hashtable.
您可以通过使用您最喜欢的编程语言编写程序来对具有 5 个元素的列表与具有 5 个元素的哈希表执行大量查找,从而看到相同的效果。由于大小,哈希表版本实际上更慢。但是将它增加到 50 个元素,或 5000 个元素,列表版本会变慢,因为它是哈希表的 O(N) 与 O(1)。
But change this query to be on the ID
column instead of Name
and you'll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN
version is able to replace one of the clustered index scans with a seek - meaning that this will literally be an order of magnitudefaster with a large number of rows.
但是将此查询更改为ID
列上而不是列上Name
,您将看到一个非常不同的故事。在这种情况下,它为两个查询嵌套循环,但INNER JOIN
版本能够取代聚簇索引扫描的一个与寻求-这意味着这简直就一个数量级有大量行的速度更快。
So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server mightsometimes choose a worse execution plan for an INNER JOIN
than a LEFT JOIN
.
所以结论或多或少是我上面几段提到的;这几乎可以肯定是索引或索引覆盖问题,可能与一个或多个非常小的表相结合。在这些情况下,SQL Server有时可能会选择INNER JOIN
比LEFT JOIN
.
回答by dbenham
There is one important scenario that can lead to an outer join being faster than an inner join that has not been discussed yet.
有一个重要的场景会导致外连接比内连接更快,这一点尚未讨论。
When using an outer join, the optimizer is always free to drop the outer joined table from the execution plan if the join columns are the PK of the outer table, and none of the columns are selected from the outer table. For example SELECT A.* FROM A LEFT OUTER JOIN B ON A.KEY=B.KEY
and B.KEY is the PK for B. Both Oracle (I believe I was using release 10) and Sql Server (I used 2008 R2) prune table B from the execution plan.
使用外部连接时,如果连接列是外部表的 PK,并且没有从外部表中选择任何列,则优化器始终可以自由地从执行计划中删除外部连接表。例如SELECT A.* FROM A LEFT OUTER JOIN B ON A.KEY=B.KEY
,B.KEY 是 B 的 PK。Oracle(我相信我使用的是版本 10)和 Sql Server(我使用的是 2008 R2)从执行计划中修剪表 B。
The same is not necessarily true for an inner join: SELECT A.* FROM A INNER JOIN B ON A.KEY=B.KEY
may or may not require B in the execution plan depending on what constraints exist.
内连接不一定是这样:SELECT A.* FROM A INNER JOIN B ON A.KEY=B.KEY
根据存在的约束,在执行计划中可能需要也可能不需要 B。
If A.KEY is a nullable foreign key referencing B.KEY, then the optimizer cannot drop B from the plan because it must confirm that a B row exists for every A row.
如果 A.KEY 是引用 B.KEY 的可空外键,则优化器不能从计划中删除 B,因为它必须确认每个 A 行都存在 B 行。
If A.KEY is a mandatory foreign key referencing B.KEY, then the optimizer is free to drop B from the plan because the constraints guarantee the existence of the row. But just because the optimizer can drop the table from the plan, doesn't mean it will. SQL Server 2008 R2 does NOT drop B from the plan. Oracle 10 DOES drop B from the plan. It is easy to see how the outer join will out-perform the inner join on SQL Server in this case.
如果 A.KEY 是引用 B.KEY 的强制外键,那么优化器可以自由地从计划中删除 B,因为约束保证了该行的存在。但仅仅因为优化器可以从计划中删除表,并不意味着它会。SQL Server 2008 R2 不会从计划中删除 B。Oracle 10 确实从计划中删除了 B。在这种情况下,很容易看出外连接在 SQL Server 上的性能如何优于内连接。
This is a trivial example, and not practical for a stand-alone query. Why join to a table if you don't need to?
这是一个微不足道的示例,对于独立查询并不实用。如果不需要,为什么要加入表?
But this could be a very important design consideration when designing views. Frequently a "do-everything" view is built that joins everything a user might need related to a central table. (Especially if there are naive users doing ad-hoc queries that do not understand the relational model) The view may include all the relevent columns from many tables. But the end users might only access columns from a subset of the tables within the view. If the tables are joined with outer joins, then the optimizer can (and does) drop the un-needed tables from the plan.
但这在设计视图时可能是一个非常重要的设计考虑因素。通常会构建一个“全能”视图,将用户可能需要的所有内容与中央表相关联。(特别是如果有不了解关系模型的天真的用户进行临时查询时)该视图可能包括许多表中的所有相关列。但是最终用户可能只能访问视图中表子集的列。如果表与外部连接连接,那么优化器可以(并且确实)从计划中删除不需要的表。
It is critical to make sure that the view using outer joins gives the correct results. As Aaronaught has said - you cannot blindly substitute OUTER JOIN for INNER JOIN and expect the same results. But there are times when it can be useful for performance reasons when using views.
确保使用外连接的视图给出正确的结果至关重要。正如 Aaronaught 所说 - 您不能盲目地将 OUTER JOIN 替换为 INNER JOIN 并期望得到相同的结果。但有时它在使用视图时出于性能原因可能很有用。
One last note - I haven't tested the impact on performance in light of the above, but in theory it seems you should be able to safely replace an INNER JOIN with an OUTER JOIN if you also add the condition <FOREIGN_KEY> IS NOT NULL to the where clause.
最后一个注意事项 - 我没有根据上述情况测试对性能的影响,但从理论上讲,如果您还添加条件 <FOREIGN_KEY> IS NOT NULL,您似乎应该能够安全地将 INNER JOIN 替换为 OUTER JOIN到 where 子句。
回答by Kvasi
If everything works as it should it shouldn't, BUT we all know everything doesn't work the way it should especially when it comes to the query optimizer, query plan caching and statistics.
如果一切正常运行,但我们都知道一切都无法正常运行,尤其是在查询优化器、查询计划缓存和统计方面。
First I would suggest rebuilding index and statistics, then clearing the query plan cache just to make sure that's not screwing things up. However I've experienced problems even when that's done.
首先,我建议重建索引和统计信息,然后清除查询计划缓存,以确保不会把事情搞砸。但是,即使完成了,我也遇到了问题。
I've experienced some cases where a left join has been faster than a inner join.
我经历过一些左连接比内连接快的情况。
The underlying reason is this: If you have two tables and you join on a column with an index (on both tables). The inner join will produce the same result no matter if you loop over the entries in the index on table one and match with index on table two as if you would do the reverse: Loop over entries in the index on table two and match with index in table one. The problem is when you have misleading statistics, the query optimizer will use the statistics of the index to find the table with least matching entries (based on your other criteria). If you have two tables with 1 million in each, in table one you have 10 rows matching and in table two you have 100000 rows matching. The best way would be to do an index scan on table one and matching 10 times in table two. The reverse would be an index scan that loops over 100000 rows and tries to match 100000 times and only 10 succeed. So if the statistics isn't correct the optimizer might choose the wrong table and index to loop over.
根本原因是:如果您有两个表,并且您在一个带有索引的列上(在两个表上)加入。无论是否循环遍历表 1 上的索引中的条目并与表 2 上的索引匹配,内连接都会产生相同的结果,就好像您将执行相反的操作一样:循环遍历表 2 上的索引中的条目并与索引匹配在表一中。问题是当您有误导性的统计信息时,查询优化器将使用索引的统计信息来查找匹配条目最少的表(基于您的其他条件)。如果您有两个表,每个表中有 100 万,则在表一中,您有 10 行匹配,而在表二中,您有 100000 行匹配。最好的方法是对表一进行索引扫描并在表二中匹配 10 次。相反,索引扫描会循环超过 100000 行并尝试匹配 100000 次,但只有 10 次成功。因此,如果统计信息不正确,优化器可能会选择错误的表和索引进行循环。
If the optimizer chooses to optimize the left join in the order it is written it will perform better than the inner join.
如果优化器选择按照写入顺序优化左连接,它将比内连接执行得更好。
BUT, the optimizer may also optimize a left join sub-optimally as a left semi join. To make it choose the one you want you can use the force order hint.
但是,优化器也可以将左连接优化为左半连接。为了让它选择你想要的,你可以使用强制命令提示。
回答by Francisco Pires
Try both queries (the one with inner and left join) with OPTION (FORCE ORDER)
at the end and post the results. OPTION (FORCE ORDER)
is a query hint that forces the optimizer to build the execution plan with the join order you provided in the query.
在最后尝试两个查询(具有内连接和左连接的查询)OPTION (FORCE ORDER)
并发布结果。OPTION (FORCE ORDER)
是一个查询提示,它强制优化器使用您在查询中提供的连接顺序构建执行计划。
If INNER JOIN
starts performing as fast as LEFT JOIN
, it's because:
如果INNER JOIN
开始执行速度与 一样快LEFT JOIN
,那是因为:
- In a query composed entirely by
INNER JOIN
s, the join order doesn't matter. This gives freedom for the query optimizer to order the joins as it sees fit, so the problem might rely on the optimizer. - With
LEFT JOIN
, that's not the case because changing the join order will alter the results of the query. This means the engine must follow the join order you provided on the query, which might be better than the optimized one.
- 在完全由
INNER JOIN
s组成的查询中,连接顺序无关紧要。这使查询优化器可以自由地按照它认为合适的方式对连接进行排序,因此问题可能依赖于优化器。 - 对于
LEFT JOIN
,情况并非如此,因为更改连接顺序会改变查询的结果。这意味着引擎必须遵循您在查询中提供的连接顺序,这可能比优化的更好。
Don't know if this answers your question but I was once in a project that featured highly complex queries making calculations, which completely messed up the optimizer. We had cases where a FORCE ORDER
would reduce the execution time of a query from 5 minutes to 10 seconds.
不知道这是否能回答您的问题,但我曾经参与过一个以高度复杂的查询进行计算的项目,这完全弄乱了优化器。我们遇到过这样的情况,aFORCE ORDER
会将查询的执行时间从 5 分钟减少到 10 秒。
回答by J.O.
Have done a number of comparisons between left outer and inner joins and have not been able to find a consisten difference. There are many variables. Am working on a reporting database with thousands of tables many with a large number of fields, many changes over time (vendor versions and local workflow) . It is not possible to create all of the combinations of covering indexes to meet the needs of such a wide variety of queries and handle historical data. Have seen inner queries kill server performance because two large (millions to tens of millions of rows) tables are inner joined both pulling a large number of fields and no covering index exists.
已经在左外连接和内连接之间进行了许多比较,但未能找到一致的差异。有很多变数。我正在处理一个包含数千个表的报告数据库,其中许多表包含大量字段,随着时间的推移发生了许多变化(供应商版本和本地工作流程)。创建覆盖索引的所有组合来满足如此广泛的查询和处理历史数据的需求是不可能的。已经看到内部查询会杀死服务器性能,因为两个大(数百万到数千万行)表是内部连接的,它们都拉出大量字段并且不存在覆盖索引。
The biggest issue though, doesn't seem to appeaer in the discussions above. Maybe your database is well designed with triggers and well designed transaction processing to ensure good data. Mine frequently has NULL values where they aren't expected. Yes the table definitions could enforce no-Nulls but that isn't an option in my environment.
不过,最大的问题似乎并未出现在上述讨论中。也许您的数据库设计良好,带有触发器和设计良好的事务处理以确保良好的数据。我的经常有不期望的 NULL 值。是的,表定义可以强制执行无空值,但这在我的环境中不是一个选项。
So the question is... do you design your query only for speed, a higher priority for transaction processing that runs the same code thousands of times a minute. Or do you go for accuracy that a left outer join will provide. Remember that inner joins must find matches on both sides, so an unexpected NULL will not only remove data from the two tables but possibly entire rows of information. And it happens so nicely, no error messages.
所以问题是......你设计你的查询只是为了速度,为每分钟运行数千次相同代码的事务处理提供更高的优先级。或者您是否追求左外连接将提供的准确性。请记住,内部联接必须在两侧找到匹配项,因此意外的 NULL 不仅会从两个表中删除数据,而且可能会删除整行信息。它发生得非常好,没有错误消息。
You can be very fast as getting 90% of the needed data and not discover the inner joins have silently removed information. Sometimes inner joins can be faster, but I don't believe anyone making that assumption unless they have reviewed the execution plan. Speed is important, but accuracy is more important.
您可以非常快地获取 90% 的所需数据,而不会发现内部联接已悄悄删除信息。有时内部联接可以更快,但我不相信任何人会做出这种假设,除非他们已经了执行计划。速度很重要,但准确性更重要。
回答by eddiegroves
Your performance problems are more likely to be because of the number of joins you are doing and whether the columns you are joining on have indexes or not.
您的性能问题更有可能是因为您正在执行的联接数量以及您所联接的列是否具有索引。
Worst case you could easily be doing 9 whole table scans for each join.
最坏的情况是,您可以轻松地为每个连接进行 9 次全表扫描。
回答by MarredCheese
Outer joins can offer superior performance when used in views.
外连接在视图中使用时可以提供卓越的性能。
Say you have a query that involves a view, and that view is comprised of 10 tables joined together. Say your query only happens to use columns from 3 out of those 10 tables.
假设您有一个涉及视图的查询,该视图由 10 个连接在一起的表组成。假设您的查询仅使用这 10 个表中的 3 个中的列。
If those 10 tables had been inner-joinedtogether, then the query optimizer would have to join them all even though your query itself doesn't need 7 out of 10 of the tables. That's because the inner joins themselves might filter down the data, making them essential to compute.
如果这 10 个表已被内部连接在一起,那么即使您的查询本身不需要 10 个表中的 7 个,查询优化器也必须将它们全部连接起来。那是因为内部联接本身可能会过滤掉数据,使它们成为计算的必要条件。
If those 10 tables had been outer-joinedtogether instead, then the query optimizer would only actually join the ones that were necessary: 3 out of 10 of them in this case. That's because the joins themselves are no longer filtering the data, and thus unused joins can be skipped.
如果这 10 个表被外部联接在一起,那么查询优化器实际上只会联接那些必要的表:在这种情况下,其中 10 个表中的 3 个。这是因为连接本身不再过滤数据,因此可以跳过未使用的连接。
Source: http://www.sqlservercentral.com/blogs/sql_coach/2010/07/29/poor-little-misunderstood-views/
来源:http: //www.sqlservercentral.com/blogs/sql_coach/2010/07/29/poor-little-misunderstood-views/
回答by Buzzzzzzz
I found something interesting in SQL server when checking if inner joins are faster than left joins.
在检查内部联接是否比左联接更快时,我在 SQL Server 中发现了一些有趣的东西。
If you dont include the items of the left joined table, in the select statement, the left join will be faster than the same query with inner join.
如果不包括左联接表的项目,在select语句中,左联接将比使用内联接的相同查询更快。
If you do include the left joined table in the select statement, the inner join with the same query was equal or faster than the left join.
如果在 select 语句中包含左联接表,则具有相同查询的内联接等于或快于左联接。
回答by Jiulin Teng
From my comparisons, I find that they have the exact same execution plan. There're three scenarios:
从我的比较中,我发现它们具有完全相同的执行计划。有以下三种情况:
If and when they return the same results, they have the same speed. However, we must keep in mind that they are not the same queries, and that LEFT JOIN will possibly return more results (when some ON conditions aren't met) --- this is why it's usually slower.
When the main table (first non-const one in the execution plan) has a restrictive condition (WHERE id = ?) and the corresponding ON condition is on a NULL value, the "right" table is not joined --- this is when LEFT JOIN is faster.
As discussed in Point 1, usually INNER JOIN is more restrictive and returns fewer results and is therefore faster.
如果当它们返回相同的结果时,它们具有相同的速度。但是,我们必须记住,它们不是相同的查询,并且 LEFT JOIN 可能会返回更多结果(当不满足某些 ON 条件时)——这就是它通常较慢的原因。
当主表(执行计划中的第一个非常量表)具有限制条件(WHERE id = ?)并且相应的 ON 条件为 NULL 值时,“右”表未连接 --- 这是当LEFT JOIN 更快。
如第 1 点所述,通常 INNER JOIN 限制性更强,返回的结果更少,因此速度更快。
Both use (the same) indices.
两者都使用(相同的)索引。