何时使用 SQL 子查询与标准连接?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4799820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When to use SQL sub-queries versus a standard join?
提问by Brad Krusemark
I am working on rewriting some poorly written SQL queries and they are over-utilizing sub-queries. I am looking for best-practices regarding the use of sub-queries.
我正在重写一些写得不好的 SQL 查询,它们过度使用了子查询。我正在寻找关于使用子查询的最佳实践。
Any help would be appreciated.
任何帮助,将不胜感激。
回答by Mark Byers
Subqueries are usually fine unless they are dependent subqueries(also known as correlated subqueries). If you are only using independent subqueries and they are using appropriate indexes then they should run quickly. If you have a dependent subquery you might run into performance problems because a dependent subquery typically needs to be run once for each row in the outer query. So if your outer query has 1000 rows, the subquery will be run 1000 times. On the other hand an independent subquery typically only needs to be evaluated once.
子查询通常很好,除非它们是依赖子查询(也称为相关子查询)。如果您只使用独立的子查询并且它们使用适当的索引,那么它们应该可以快速运行。如果您有一个依赖子查询,您可能会遇到性能问题,因为依赖子查询通常需要为外部查询中的每一行运行一次。因此,如果您的外部查询有 1000 行,则子查询将运行 1000 次。另一方面,独立子查询通常只需要评估一次。
If you're not sure what is meant by a subquery being dependent or independent here's a rule of thumb - if you can take the subquery, remove it from its context, run it, and get a result set then it's an independent subquery
.
如果您不确定子查询依赖或独立是什么意思,这里有一个经验法则 - 如果您可以获取子查询,从其上下文中删除它,运行它并获得结果集,那么它是一个independent subquery
.
If you get a syntax error because it refers to some tables outside of the subquery then its a dependent subquery
.
如果因为它引用了子查询之外的某些表而出现语法错误,那么它的dependent subquery
.
The general rule of course has a few exceptions. For example:
当然,一般规则也有一些例外。例如:
- Many optimizers can take a dependent subquery and find a way to run it efficiently as a JOIN. For example an NOT EXISTS query might result in an ANTI JOIN query plan, so it will not necessarily be any slower than writing the query with a JOIN.
- MySQL has a bugwhere an independent subquery inside an IN expression is incorrectly identified as a dependent subquery and so a suboptimal query plan is used. This is apparently fixed in the very newest versions of MySQL.
- 许多优化器可以采用依赖子查询并找到一种方法将其作为 JOIN 有效地运行。例如,NOT EXISTS 查询可能会导致 ANTI JOIN 查询计划,因此它不一定比使用 JOIN 编写查询慢。
- MySQL 有一个错误,即 IN 表达式中的独立子查询被错误地标识为依赖子查询,因此使用了次优查询计划。这显然在最新版本的 MySQL 中得到了修复。
If performance is an issue then measure your specific queries and see what works best for you.
如果性能是一个问题,那么衡量您的特定查询,看看什么最适合您。
回答by RichardTheKiwi
There is no silver bullet here. Each and every usage has to be independently assessed. There are some cases where correlated subqueries are plain inefficient, this one below is better written as a JOIN
这里没有银弹。每次使用都必须独立评估。在某些情况下,相关子查询的效率很低,下面的这个最好写成 JOIN
select nickname, (select top 1 votedate from votes where user_id=u.id order by 1 desc)
from users u
On the other hand, EXISTS and NOT EXISTS queries will win out over JOINs.
另一方面,EXISTS 和 NOT EXISTS 查询将胜过 JOIN。
select ...
where NOT EXISTS (.....)
Is normally faster than
通常比
select ...
FROM A LEFT JOIN B
where B.ID is null
Yet even these generalizations can be untrue for any particular schema and data distribution.
然而,对于任何特定的模式和数据分布,即使是这些概括也可能是不正确的。
回答by Wes Hardaker
Unfortunately the answer greatly depends on the sql server you're using. In theory, joins are better from a pure-relational-theory point of view. They let the server do the right thing under the hood and gives them more control and thus in the end can be faster. Ifthe server is implemented well. In practice, some SQL servers perform better if you trick it into optimizing it's queries through sub-queries and the like.
不幸的是,答案很大程度上取决于您使用的 sql 服务器。从理论上讲,从纯关系理论的角度来看,联接更好。他们让服务器在幕后做正确的事情,并给他们更多的控制权,因此最终可以更快。如果服务器执行得好。在实践中,如果您通过子查询等方式诱使某些 SQL 服务器优化其查询,则其性能会更好。