SQL JOIN 与 IN 性能?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1200295/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL JOIN vs IN performance?
提问by Polaris878
I have a case where using a JOIN or an IN will give me the correct results... Which typically has better performance and why? How much does it depend on what database server you are running? (FYI I am using MSSQL)
我有一个案例,使用 JOIN 或 IN 会给我正确的结果......哪个通常具有更好的性能,为什么?它在多大程度上取决于您运行的数据库服务器?(仅供参考,我正在使用 MSSQL)
回答by Quassnoi
Generally speaking, IN
and JOIN
are different queries that can yield different results.
一般来说,IN
和JOIN
不同的查询,可以产生不同的结果。
SELECT a.*
FROM a
JOIN b
ON a.col = b.col
is not the same as
不一样
SELECT a.*
FROM a
WHERE col IN
(
SELECT col
FROM b
)
, unless b.col
is unique.
, 除非b.col
是唯一的。
However, this is the synonym for the first query:
但是,这是第一个查询的同义词:
SELECT a.*
FROM a
JOIN (
SELECT DISTINCT col
FROM b
)
ON b.col = a.col
If the joining column is UNIQUE
and marked as such, both these queries yield the same plan in SQL Server
.
如果连接列被UNIQUE
标记为这样,则这两个查询在 中产生相同的计划SQL Server
。
If it's not, then IN
is faster than JOIN
on DISTINCT
.
如果不是,则IN
比JOIN
on快DISTINCT
。
See this article in my blog for performance details:
有关性能详细信息,请参阅我博客中的这篇文章:
回答by cletus
Funny you mention that, I did a blog post on this very subject.
有趣的是,你提到了这一点,我写了一篇关于这个主题的博客文章。
See Oracle vs MySQL vs SQL Server: Aggregation vs Joins
请参阅Oracle 与 MySQL 与 SQL Server:聚合与联接
Short answer: you have to test it and individual databases vary a lot.
简短回答:您必须对其进行测试,并且各个数据库差异很大。
回答by marc_s
That's rather hard to say - in order to really find out which one works better, you'd need to actually profile the execution times.
这很难说 - 为了真正找出哪个更有效,您需要实际分析执行时间。
As a general rule of thumb, I think if you have indices on your foreign key columns, and if you're using only (or mostly) INNER JOIN conditions, then the JOIN will be slightly faster.
作为一般经验法则,我认为如果您的外键列上有索引,并且如果您只(或大部分)使用 INNER JOIN 条件,那么 JOIN 会稍微快一点。
But as soon as you start using OUTER JOIN, or if you're lacking foreign key indexes, the IN might be quicker.
但是一旦您开始使用 OUTER JOIN,或者如果您缺少外键索引,IN 可能会更快。
Marc
马克
回答by AdamSane
A interesting writeup on the logical differences: SQL Server: JOIN vs IN vs EXISTS - the logical difference
关于逻辑差异的有趣文章:SQL Server:JOIN vs IN vs EXISTS - 逻辑差异
I am pretty sure that assuming that the relations and indexes are maintained a Join will perform better overall (more effort goes into working with that operation then others). If you think about it conceptually then its the difference between 2 queries and 1 query.
我很确定,假设保持关系和索引,Join 的整体性能会更好(与其他操作相比,使用该操作需要付出更多努力)。如果您从概念上考虑它,那么它就是 2 个查询和 1 个查询之间的区别。
You need to hook it up to the Query Analyzer and try it and see the difference. Also look at the Query Execution Plan and try to minimize steps.
您需要将它连接到查询分析器并尝试一下,看看有什么不同。还要查看查询执行计划并尽量减少步骤。
回答by S.Roeper
This Thread is pretty old but still mentioned often. For my personal taste it is a bit incomplete, because there is another way to ask the database with the EXISTS keyword which I found to be faster more often than not.
这个线程已经很老了,但仍然经常被提及。就我个人的喜好而言,它有点不完整,因为还有另一种方法可以使用 EXISTS 关键字询问数据库,我发现这种方法通常更快。
So if you are only interested in values from table a you can use this query:
因此,如果您只对表 a 中的值感兴趣,则可以使用以下查询:
SELECT a.*
FROM a
WHERE EXISTS (
SELECT *
FROM b
WHERE b.col = a.col
)
The difference might be huge if col is not indexed, because the db does not have to find all records in b which have the same value in col, it only has to find the very first one. If there is no index on b.col and a lot of records in b a table scan might be the consequence. With IN or a JOIN this would be a full table scan, with EXISTS this would be only a partial table scan (until the first matching record is found).
如果 col 没有被索引,差异可能会很大,因为 db 不必在 b 中找到所有在 col 中具有相同值的记录,它只需要找到第一个。如果 b.col 上没有索引并且 ba 表扫描中的大量记录可能是结果。使用 IN 或 JOIN 这将是全表扫描,使用 EXISTS 这将只是部分表扫描(直到找到第一个匹配记录)。
If there a lots of records in b which have the same col value you will also waste a lot of memory for reading all these records into a temporary space just to find that your condition is satisfied. With exists this can be usually avoided.
如果 b 中有很多具有相同 col 值的记录,你也会浪费大量内存来将所有这些记录读入一个临时空间,只是为了发现你的条件得到满足。有了存在,这通常可以避免。
I have often found EXISTS faster then IN even if there is an index. It depends on the database system (the optimizer), the data and last not least on the type of index which is used.
即使有索引,我也经常发现 EXISTS 比 IN 更快。它取决于数据库系统(优化器)、数据,最后还取决于所使用的索引类型。
回答by uriDium
Each database's implementation but you can probably guess that they all solve common problems in more or less the same way. If you are using MSSQL have a look at the execution plan that is generated. You can do this by turning on the profiler and executions plans. This will give you a text version when you run the command.
每个数据库的实现,但您可能会猜到它们都或多或少地以相同的方式解决常见问题。如果您使用的是 MSSQL,请查看生成的执行计划。您可以通过打开分析器和执行计划来做到这一点。这将在您运行命令时为您提供文本版本。
I am not sure what version of MSSQL you are using but you can get a graphical one in SQL Server 2000 in the query analyzer. I am sure that this functionality is lurking some where in SQL Server Studio Manager in later versions.
我不确定您使用的是哪个版本的 MSSQL,但您可以在查询分析器的 SQL Server 2000 中获得图形版本。我确信此功能潜伏在更高版本的 SQL Server Studio 管理器中。
Have a look at the exeuction plan. As far as possible avoid table scans unless of course your table is small in which case a table scan is faster than using an index. Read up on the different join operations that each different scenario produces.
看看执行计划。尽可能避免表扫描,除非您的表当然很小,在这种情况下,表扫描比使用索引更快。阅读每个不同场景产生的不同连接操作。
回答by Joel Coehoorn
The optimizer should be smart enough to give you the same result either way for normal queries. Check the execution plan and they should give you the same thing. If they don't, I would normally consider the JOIN to be faster. All systems are different, though, so you should profile the code on your system to be sure.
优化器应该足够聪明,可以为普通查询提供相同的结果。检查执行计划,他们应该给你同样的东西。如果他们不这样做,我通常会认为 JOIN 更快。但是,所有系统都不同,因此您应该对系统上的代码进行分析以确保安全。