LEFT OUTER JOIN 与 NOT EXISTS 上的 SQL 性能

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6777910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 11:24:28  来源:igfitidea点击:

SQL performance on LEFT OUTER JOIN vs NOT EXISTS

sqlsql-server

提问by kefeizhou

If I want to find a set of entries in table A but not in table B, I can use either LEFT OUTER JOIN or NOT EXISTS. I've heard SQL Server is geared towards ANSI and in some case LEFT OUTER JOINs are far more efficient than NOT EXISTS. Will ANSI JOIN perform better in this case? and are join operators more efficient than NOT EXISTS in general on SQL Server?

如果我想在表 A 但不在表 B 中找到一组条目,我可以使用 LEFT OUTER JOIN 或 NOT EXISTS。我听说 SQL Server 面向 ANSI,在某些情况下,LEFT OUTER JOIN 比 NOT EXISTS 高效得多。在这种情况下 ANSI JOIN 会表现得更好吗?并且通常在 SQL Server 上连接运算符比 NOT EXISTS 更有效?

回答by JNK

Joe's link is a good starting point. Quassnoi covers this too.

乔的链接是一个很好的起点。 Quassnoi 也涵盖了这一点。

In general,if your fields are properly indexed, OR if you expect to filter out more records (i.e. have a lots of rows EXISTin the subquery) NOT EXISTSwill perform better.

一般来说,如果你的字段被正确索引,或者如果你希望过滤掉更多的记录(即EXIST子查询中有很多行)NOT EXISTS会表现得更好。

EXISTSand NOT EXISTSboth short circuit - as soon as a record matches the criteria it's either included or filtered out and the optimizer moves on to the next record.

EXISTS并且NOT EXISTS都短路 - 一旦记录与标准匹配,它就会被包含或过滤掉,优化器就会移动到下一条记录。

LEFT JOINwill join ALL RECORDSregardless of whether they match or not, then filter out all non-matching records. If your tables are large and/or you have multiple JOINcriteria, this can be very very resource intensive.

LEFT JOIN无论是否匹配,都会加入所有记录,然后过滤掉所有不匹配的记录。如果您的表很大和/或您有多个JOIN条件,这可能会非常占用资源。

I normally try to use NOT EXISTSand EXISTSwhere possible. For SQL Server, INand NOT INare semantically equivalent and may be easier to write. These are among the only operators you will find in SQL Server that are guaranteed to short circuit.

我通常会尝试使用NOT EXISTSEXISTS在可能的情况下使用。对于SQL Server,INNOT IN在语义上等效,也更容易写。 这些是您可以在 SQL Server 中找到的唯一可以保证短路的运算符。

回答by Tom H

Personally, I think that this one gets a big old, "It Depends". I've seen instances where each method has outperformed the other.

就我个人而言,我认为这个已经很老了,“这取决于”。我见过每种方法都优于另一种方法的实例。

Your best bet is to test both and see which performs better. If it's a situation where the tables will always be small and performance isn't as crucial then I'd just go with whichever is the clearest to you (that's usually NOT EXISTSfor most people) and move on.

最好的办法是测试两者,看看哪个表现更好。如果在这种情况下表总是很小并且性能不是那么重要,那么我只会选择对您来说最清楚的(通常NOT EXISTS对大多数人来说)然后继续前进。

回答by N30

This blog entrygives examples of various ways ( NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPTand NOT EXISTS) to achieve same results and proves that Not Exists ( Left Anti Semi Join) is the best options in both cold cache and warm cache scenarios.

这篇博客文章给出了实现相同结果的各种方法(NOT INOUTER APPLYLEFT OUTER JOINEXCEPTNOT EXISTS)的示例,并证明了 Not Exists(Left Anti Semi Join)是冷缓存和热缓存中的最佳选择场景。

回答by user3257560

I've been wondering how we can use the index on the table we are deleting from in these cases that the OP describes.

我一直想知道在 OP 描述的这些情况下,我们如何使用要从中删除的表上的索引。

Say we have:

假设我们有:

 table EMPLOYEE (emp_id int, name varchar) 
and
 table EMPLOYEE_LOCATION (emp_id int, loc_id int)

In my real world example my tables are much wider and contain 1million + rows, I have simplified the schema for example purpose.

在我的真实世界示例中,我的表要宽得多并且包含 100 万行以上,为了示例目的,我简化了模式。

If I want to delete the rows from EMPLOYEE_LOCATION that don't have corresponding emp_id's in EMPLOYEE I can obviously use the Left outer technique or the NOT IN but I was wondering...

如果我想从 EMPLOYEE_LOCATION 中删除在 EMPLOYEE 中没有相应 emp_id 的行,我显然可以使用左外技术或 NOT IN 但我想知道......

If both tables have indexes with leading column of emp_id then would it be worthwhile trying to use them?

如果两个表都有前导列 emp_id 的索引,那么尝试使用它们是否值得?

Perhaps I could pull the emp_id's from EMPLOYEE, the emp_id's from EMPLOYEE_LOCATION into a temp table and get the emp_id's from the temp tables that I want to delete.

也许我可以将 EMPLOYEE 中的 emp_id、EMPLOYEE_LOCATION 中的 emp_id 提取到临时表中,然后从我想删除的临时表中获取 emp_id。

I could then cycle round these emp_id's and actually use the index like so:

然后我可以循环这些 emp_id 并实际使用索引,如下所示:

loop for each emp_id X to delete -- (this would be a cursor)
 DELETE EMPLOYEE_LOCATION WHERE emp_id = X

I know there is overhead with the cursor but in my real example I am dealing with huge tables so I think explicitly using the index is desirable.

我知道游标有开销,但在我的真实示例中,我正在处理巨大的表,所以我认为明确使用索引是可取的。

回答by pimbrouwers

Answeron dba.stackexchange

上dba.stackexchange

An exception I've noticed to the NOT EXISTSbeing superior (however marginally) to LEFT JOIN ... WHERE IS NULLis when using Linked Servers.

我注意到的一个例外是使用Linked Servers时的NOT EXISTS优越性(但略有不同)。LEFT JOIN ... WHERE IS NULL

From examining the execution plans, it appears that NOT EXISTSoperator gets executed in a nested loop fashion. Whereby it is executed on a per row basis (which I suppose makes sense).

从检查执行计划来看,NOT EXISTS运算符似乎是以嵌套循环方式执行的。因此它是按行执行的(我认为这是有道理的)。

Example execution plan demonstrating this behaviour: enter image description here

演示此行为的示例执行计划: 在此处输入图片说明