performance 子选择与外连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/47433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
subselect vs outer join
提问by shsteimer
Consider the following 2 queries:
考虑以下 2 个查询:
select tblA.a,tblA.b,tblA.c,tblA.d
from tblA
where tblA.a not in (select tblB.a from tblB)
select tblA.a,tblA.b,tblA.c,tblA.d
from tblA left outer join tblB
on tblA.a = tblB.a where tblB.a is null
Which will perform better? My assumption is that in general the join will be better except in cases where the subselect returns a very small result set.
哪个会表现更好?我的假设是,通常连接会更好,除非子选择返回非常小的结果集。
回答by Tom
RDBMSs "rewrite" queries to optimize them, so it depends on system you're using, and I would guess they end up giving the same performance on most "good" databases.
RDBMS“重写”查询以优化它们,因此这取决于您使用的系统,我猜它们最终会在大多数“好”数据库上提供相同的性能。
I suggest picking the one that is clearer and easier to maintain, for my money, that's the first one. It's much easier to debug the subquery as it can be run independently to check for sanity.
我建议选择一个更清晰、更容易维护的,就我的钱而言,这是第一个。调试子查询要容易得多,因为它可以独立运行以检查健全性。
回答by Andy Irving
non-correlated sub queries are fine. you should go with what describes the data you're wanting. as has been noted, this likely gets rewritten into the same plan, but isn't guaranteed to! what's more, if table A and B are not 1:1 you will get duplicate tuples from the join query (as the IN clause performs an implicit DISTINCT sort), so it's always best to code what you want and actually think about the outcome.
不相关的子查询很好。您应该使用描述您想要的数据的内容。如前所述,这可能会被重写为同一个计划,但不能保证!此外,如果表 A 和 B 不是 1:1,您将从连接查询中获得重复的元组(因为 IN 子句执行隐式 DISTINCT 排序),因此最好编写您想要的代码并实际考虑结果。
回答by Piotr Anders
Well, it depends on the datasets. From my experience, if You have small dataset then go for a NOT IN if it's large go for a LEFT JOIN. The NOT IN clause seems to be very slow on large datasets.
嗯,这取决于数据集。根据我的经验,如果您的数据集很小,那么如果数据集很大,则选择 NOT IN,则选择 LEFT JOIN。NOT IN 子句在大型数据集上似乎很慢。
One other thing I might add is that the explain plans might be misleading. I've seen several queries where explain was sky high and the query run under 1s. On the other hand I've seen queries with excellent explain plan and they could run for hours.
我可能要补充的另一件事是解释计划可能具有误导性。我见过几个查询,其中解释非常高,查询运行时间低于 1 秒。另一方面,我已经看到具有出色解释计划的查询,它们可以运行数小时。
So all in all do test on your data and see for yourself.
因此,总而言之,请对您的数据进行测试并亲自查看。
回答by andy47
I second Tom's answer that you should pick the one that is easier to understand and maintain.
我支持 Tom 的回答,您应该选择更易于理解和维护的答案。
The query plan of any query in any database cannot be predicted because you haven't given us indexes or data distributions. The only way to predict which is faster is to run them against yourdatabase.
任何数据库中任何查询的查询计划都无法预测,因为您没有给我们提供索引或数据分布。预测哪个更快的唯一方法是针对您的数据库运行它们。
As a rule of thumb I tend to use sub-selects when I do not need to include any columns from tblB in my select clause. I would definitely go for a sub-select when I want to use the 'in' predicate (and usually for the 'not in' that you included in the question), for the simple reason that these are easier to understand when you or someone else has come back and change them.
根据经验,当我不需要在选择子句中包含来自 tblB 的任何列时,我倾向于使用子选择。当我想使用“in”谓词(通常用于您在问题中包含的“not in”)时,我肯定会进行子选择,原因很简单,当您或某人使用这些谓词时更容易理解其他人已经回来改变他们。
回答by Martynnw
The first query will be faster in SQL Server which I think is slighty counter intuitive - Sub queries seemlike they should be slower. In some cases (as data volumes increase) an existsmay be faster than an in.
第一个查询在 SQL Server 中会更快,我认为这有点反直觉 - 子查询似乎应该更慢。在某些情况下(随着数据量的增加) anexists可能比in.
回答by Amy B
It should be noted that these queries will produce different results if TblB.a is not unique.
应该注意的是,如果 TblB.a 不是唯一的,这些查询将产生不同的结果。
回答by aku
From my observations, MSSQL server produces same query plan for these queries.
根据我的观察,MSSQL 服务器为这些查询生成相同的查询计划。
回答by Mike Polen
I created a simple query similar to the ones in the question on MSSQL2005 and the explain plans were different. The first query appears to be faster. I am not a SQL expert but the estimated explain plan had 37% for query 1 and 63% for the query 2. It appears that the biggest cost for query 2 is the join. Both queries had two table scans.
我创建了一个类似于 MSSQL2005 问题中的简单查询,并且解释计划不同。第一个查询似乎更快。我不是 SQL 专家,但估计的解释计划有 37% 的查询 1 和 63% 的查询 2。查询 2 的最大成本似乎是连接。两个查询都有两次表扫描。

