SQL 使用 JOIN 或使用 EXISTS 可以获得更好的性能吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/227037/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 00:02:07  来源:igfitidea点击:

Can I get better performance using a JOIN or using EXISTS?

sqlperformance

提问by

I have two tables Institutions and Results and I want to see if there are any results for institutions that way I can exclude the ones that don't have results.

我有两个表制度和结果,我想看看是否有任何机构的结果,这样我就可以排除没有结果的机构。

Can I get better performance using a JOIN or using EXISTS?

使用 JOIN 或使用 EXISTS 可以获得更好的性能吗?

Thank you,
-Nimesh

谢谢,-
尼梅什

回答by Keith

Depending on the statement, statistics and DB server it may make no difference - the same optimised query plan may be produced.

根据语句、统计信息和数据库服务器,它可能没有区别 - 可能会生成相同的优化查询计划。

There are basically 3 ways that DBs join tables under the hood:

数据库在幕后连接表的方式基本上有 3 种:

  • Nested loop - for one table much bigger than the second. Every row in the smaller table is checked for every row in the larger.

  • Merge - for two tables in the same sort order. Both are run through in order and matched up where they correspond.

  • Hash - everything else. Temporary tables are used to build up the matches.

  • 嵌套循环 - 一张桌子比第二张大得多。将针对较大表中的每一行检查较小表中的每一行。

  • 合并 - 用于相同排序顺序的两个表。两者都按顺序运行并在它们对应的地方匹配。

  • 哈希 - 其他一切。临时表用于建立匹配。

By using exists you may effectively force the query plan to do a nested loop. This may be the quickest way, but really you want the query planner to decide.

通过使用exists,您可以有效地强制查询计划执行嵌套循环。这可能是最快的方法,但您确实希望查询计划程序做出决定。

I would say that you need to write both SQL statements and compare the query plans. You may find that they change quite a bit depending on what data you have.

我会说您需要编写两个 SQL 语句并比较查询计划。您可能会发现它们会根据您拥有的数据而发生很大变化。

For instance if [Institutions] and [Results] are similar sizes and both are clustered on InstitutionID a merge join would be quickest. If [Results] is much bigger than [Institutions] a nested loop may be quicker.

例如,如果 [Institutions] 和 [Results] 的大小相似并且都聚集在 InstitutionID 上,则合并连接将是最快的。如果 [Results] 比 [Institutions] 大得多,则嵌套循环可能会更快。

回答by Bob Probst

It depends.

这取决于。

Ultimately the 2 serve entirely different purposes.

最终,两者的用途完全不同。

You JOIN 2 tables to access related records. If you don't need to access the data in the related records then you have no need to join them.

您加入 2 个表以访问相关记录。如果您不需要访问相关记录中的数据,则无需加入它们。

EXISTS can be used to determine if a token exists in a given dataset but won't allow you to access the related records.

EXISTS 可用于确定给定数据集中是否存在令牌,但不允许您访问相关记录。

Post an example of the 2 methods you have in mind and I might be able to give you a better idea.

发布您想到的两种方法的示例,我可能会给您一个更好的主意。



With your two tables Institutions and Results if you want a list of institutions that have results, this query will be most efficient:

如果你想要一个有结果的机构列表,使用你的两个表制度和结果,这个查询将是最有效的:

select Institutions.institution_name 
from Institutions
inner join Results on (Institutions.institution_id = Results.institution_id)

If you have an institution_id and just want to know if it has results, using EXISTS might be faster:

如果您有一个机构 ID 并且只想知道它是否有结果,使用 EXISTS 可能会更快:

if exists(select 1 from Results where institution_id = 2)
  print "institution_id 2 has results"
else
  print "institution_id 2 does not have results"

回答by JosephStyons

It depends on your optimizer. I tried the below two in Oracle 10g and 11g. In 10g, the second one was slightly faster. In 11g, they were identical.

这取决于您的优化器。我在 Oracle 10g 和 11g 中尝试了以下两个。在 10g 中,第二个稍微快一些。在 11g 中,它们是相同的。

However, #1 is really a misuse of the EXISTS clause. Use joins to find matches.

然而,#1 实际上是对 EXISTS 子句的滥用。使用连接查找匹配项。

select *
from
  table_one t1
where exists (
             select *
             from table_two t2
             where t2.id_field = t1.id_field
             )
order by t1.id_field desc


select t1.*
from 
  table_one t1
 ,table_two t2
where t1.id_field = t2.id_field
order by t1.id_field desc 

回答by Kon

Whether there's a performance difference or not, you need to use what's more appropriate for your purpose. Your purpose is to get a list of Institutions (not Results - you don't need that extra data). So select Institutions that have no Results... translation - use EXISTS.

无论是否存在性能差异,您都需要使用更适合您的目的。您的目的是获取机构列表(不是结果 - 您不需要额外的数据)。所以选择没有结果的机构......翻译 - 使用 EXISTS。

回答by Kon

I'd say a JOIN is slower, because your query execution stops as soon as an EXISTS call finds something, while a JOIN will continue until the very end.

我会说 JOIN 较慢,因为只要 EXISTS 调用找到某些内容,您的查询执行就会停止,而 JOIN 将持续到最后。

EDIT: But it depends on the query. This is something that should be judged on a case-by-case basis.

编辑:但这取决于查询。这应该根据具体情况进行判断。

回答by Barry Brown

Are you using EXISTS as part of a correlated subquery? If so, the join will almost always be faster.

您是否使用 EXISTS 作为相关子查询的一部分?如果是这样,连接几乎总是更快。

Your database should have ways of benchmarking queries. Use them to see which query runs faster.

您的数据库应该具有对查询进行基准测试的方法。使用它们来查看哪个查询运行得更快。

回答by Dave Costa

Actually, from your vague description of the problem, it sounds to me like a NOT IN query is the most obvious way to code it:

实际上,从您对问题的模糊描述来看,在我看来,NOT IN 查询是最明显的编码方式:

SELECT *
  FROM Institutions
  WHERE InstitutionID NOT IN (
     SELECT DISTINCT InstitutionID
       FROM Results
     )

回答by Tom H

A LEFT OUTER JOIN will tend to perform better than a NOT EXISTS**, but in your case you want to do EXISTS and using a simple INNER JOIN doesn't exactly replicate the EXISTS behavior. If you have multiple Results for an Institution, doing the INNER JOIN will return multiple rows for that institution. You could get around that by using DISTINCT, but then the EXISTS will probably be better for performance anyway.

LEFT OUTER JOIN 的性能往往比 NOT EXISTS** 好,但在您的情况下,您想要执行 EXISTS 并且使用简单的 INNER JOIN 并不能完全复制 EXISTS 行为。如果一个机构有多个结果,执行 INNER JOIN 将返回该机构的多行。您可以通过使用 DISTINCT 来解决这个问题,但是无论如何 EXISTS 可能会更好地提高性能。

** For those not familiar with this method:

** 对于不熟悉此方法的人:

SELECT
     MyTable.MyTableID
FROM
     dbo.MyTable T1
LEFT OUTER JOIN dbo.MyOtherTable T2 ON
     T2.MyTableID = T1.MyTableID
WHERE
     T2.MyOtherTableID IS NULL

is equivalent to

相当于

SELECT
     MyTable.MyTableID
FROM
     dbo.MyTable T1
WHERE NOT EXISTS (SELECT * FROM MyOtherTable T2 WHERE T2.MyTableID = T1.MyTableID)

assuming that MyOtherTableID is a NOT NULL column. The first method generally performs faster than the NOT EXISTS method though.

假设 MyOtherTableID 是一个 NOT NULL 列。不过,第一种方法通常比 NOT EXISTS 方法执行得更快。

回答by Noah Yetter

If the RESULTS table has more than one row per INSTITUTION, EXISTS()has the added benefit of not requiring you to select distinct Institutions.

如果 RESULTS 表中的每一行超过一行INSTITUTIONEXISTS()则无需您选择不同的机构。

As for performance, I have seen joins, IN(), and EXISTS()each be fastest in a variety of uses. To find the best method for your purposes you must test.

至于性能,我已经看到joins, IN(), and EXISTS()每个在各种用途中都是最快的。为了找到最适合您的目的的方法,您必须进行测试。

回答by Pk9

In cases like above the Exists statement works faster than that of Joins. Exists will give you a single record and will save the time also. In case of joins the number of records will be more and all the records must be used.

在上述情况下,Exists 语句的运行速度比 Joins 语句快。Exists 会给你一个单一的记录,也可以节省时间。在连接的情况下,记录数会更多,并且必须使用所有记录。