SQL 不存在与不存在

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/173041/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:46:59  来源:igfitidea点击:

NOT IN vs NOT EXISTS

sqlsql-servernotin

提问by ilitirit

Which of these queries is the faster?

这些查询中哪个更快?

NOT EXISTS:

不存在:

SELECT ProductID, ProductName 
FROM Northwind..Products p
WHERE NOT EXISTS (
    SELECT 1 
    FROM Northwind..[Order Details] od 
    WHERE p.ProductId = od.ProductId)

Or NOT IN:

或不在:

SELECT ProductID, ProductName 
FROM Northwind..Products p
WHERE p.ProductID NOT IN (
    SELECT ProductID 
    FROM Northwind..[Order Details])

The query execution plan says they both do the same thing. If that is the case, which is the recommended form?

查询执行计划说他们都做同样的事情。如果是这种情况,推荐的形式是什么?

This is based on the NorthWind database.

这是基于 NorthWind 数据库。

[Edit]

[编辑]

Just found this helpful article: http://weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

刚刚找到这篇有用的文章:http: //weblogs.sqlteam.com/mladenp/archive/2007/05/18/60210.aspx

I think I'll stick with NOT EXISTS.

我想我会坚持不存在。

回答by Martin Smith

I always default to NOT EXISTS.

我总是默认为NOT EXISTS.

The execution plans may be the same at the moment but if either column is altered in the future to allow NULLs the NOT INversion will need to do more work (even if no NULLs are actually present in the data) and the semantics of NOT INif NULLs arepresent are unlikely to be the ones you want anyway.

该执行计划可以是相同的时刻,但如果任一列在未来改变,以允许NULLS上的NOT IN版本需要做更多的工作(即使没有NULLs为实际存在的数据)和语义NOT IN如果NULL小号存在无论如何都不太可能是你想要的。

When neither Products.ProductIDor [Order Details].ProductIDallow NULLs the NOT INwill be treated identically to the following query.

当 noneProducts.ProductID[Order Details].ProductIDallow NULLsNOT IN将被视为与以下查询相同。

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId) 

The exact plan may vary but for my example data I get the following.

确切的计划可能会有所不同,但对于我的示例数据,我得到以下信息。

Neither NULL

都不是 NULL

A reasonably common misconception seems to be that correlated sub queries are always "bad" compared to joins. They certainly can be when they force a nested loops plan (sub query evaluated row by row) but this plan includes an anti semi join logical operator. Anti semi joins are not restricted to nested loops but can use hash or merge (as in this example) joins too.

一个相当普遍的误解似乎是,与连接相比,相关子查询总是“糟糕”。当它们强制执行嵌套循环计划(逐行评估子查询)时,它们当然可以,但该计划包括反半连接逻辑运算符。反半连接不限于嵌套循环,但也可以使用散列或合并(如本例中)连接。

/*Not valid syntax but better reflects the plan*/ 
SELECT p.ProductID,
       p.ProductName
FROM   Products p
       LEFT ANTI SEMI JOIN [Order Details] od
         ON p.ProductId = od.ProductId 

If [Order Details].ProductIDis NULL-able the query then becomes

如果[Order Details].ProductIDNULL-able 则查询变为

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId)
       AND NOT EXISTS (SELECT *
                       FROM   [Order Details]
                       WHERE  ProductId IS NULL) 

The reason for this is that the correct semantics if [Order Details]contains any NULLProductIds is to return no results. See the extra anti semi join and row count spool to verify this that is added to the plan.

这样做的原因是,如果[Order Details]包含任何NULLProductIds ,则正确的语义是不返回任何结果。请参阅额外的反半连接和行计数假脱机以验证添加到计划中的这一点。

One NULL

一个 NULL

If Products.ProductIDis also changed to become NULL-able the query then becomes

如果Products.ProductID也更改为NULL-able 则查询变为

SELECT ProductID,
       ProductName
FROM   Products p
WHERE  NOT EXISTS (SELECT *
                   FROM   [Order Details] od
                   WHERE  p.ProductId = od.ProductId)
       AND NOT EXISTS (SELECT *
                       FROM   [Order Details]
                       WHERE  ProductId IS NULL)
       AND NOT EXISTS (SELECT *
                       FROM   (SELECT TOP 1 *
                               FROM   [Order Details]) S
                       WHERE  p.ProductID IS NULL) 

The reason for that one is because a NULLProducts.ProductIdshould not be returned in the results exceptif the NOT INsub query were to return no results at all (i.e. the [Order Details]table is empty). In which case it should. In the plan for my sample data this is implemented by adding another anti semi join as below.

其原因之一是因为NULLProducts.ProductId不应该在返回的结果只是如果NOT IN子查询是在所有返回任何结果(即[Order Details]表是空的)。在这种情况下它应该。在我的示例数据计划中,这是通过添加另一个反半连接来实现的,如下所示。

Both NULL

都为 NULL

The effect of this is shown in the blog post already linked by Buckley. In the example there the number of logical reads increase from around 400 to 500,000.

Buckley 已经链接的博客文章中显示了这种效果。在示例中,逻辑读取的数量从大约 400 增加到 500,000。

Additionally the fact that a single NULLcan reduce the row count to zero makes cardinality estimation very difficult. If SQL Server assumes that this will happen but in fact there were no NULLrows in the data the rest of the execution plan may be catastrophically worse, if this is just part of a larger query, with inappropriate nested loops causing repeated execution of an expensive sub tree for example.

此外,单个NULL可以将行数减少到零的事实使得基数估计变得非常困难。如果 SQL Server 假设会发生这种情况但实际上NULL数据中没有行,则执行计划的其余部分可能会灾难性地更糟,如果这只是较大查询的一部分,不适当的嵌套循环会导致重复执行昂贵的子树为例

This is not the only possible execution plan for a NOT INon a NULL-able column however. This article shows another onefor a query against the AdventureWorks2008database.

然而,这并不是NOT INa- NULLable 列上唯一可能的执行计划。这篇文章展示了另一个针对AdventureWorks2008数据库的查询。

For the NOT INon a NOT NULLcolumn or the NOT EXISTSagainst either a nullable or non nullable column it gives the following plan.

对于NOT IN上一NOT NULL列或NOT EXISTS反对任何一个可以为空或非空列它提供了以下方案。

Not EXists

不存在

When the column changes to NULL-able the NOT INplan now looks like

当列更改为NULL-able 时,NOT IN计划现在看起来像

Not In - Null

不在 - 空

It adds an extra inner join operator to the plan. This apparatus is explained here. It is all there to convert the previous single correlated index seek on Sales.SalesOrderDetail.ProductID = <correlated_product_id>to two seeks per outer row. The additional one is on WHERE Sales.SalesOrderDetail.ProductID IS NULL.

它为计划添加了一个额外的内部连接运算符。此处解释该装置。只需将先前的单个相关索引查找转换为Sales.SalesOrderDetail.ProductID = <correlated_product_id>每个外部行的两个查找即可。另一个在WHERE Sales.SalesOrderDetail.ProductID IS NULL.

As this is under an anti semi join if that one returns any rows the second seek will not occur. However if Sales.SalesOrderDetaildoes not contain any NULLProductIDs it will double the number of seek operations required.

由于这是在反半连接下,如果该连接返回任何行,则不会发生第二次查找。但是,如果Sales.SalesOrderDetail不包含任何NULLProductIDs,它将使所需的查找操作数量增加一倍。

回答by buckley

Also be aware that NOT IN is not equivalent to NOT EXISTS when it comes to null.

还要注意,当涉及到 null 时,NOT IN 不等同于 NOT EXISTS。

This post explains it very well

这个帖子解释的很好

http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

http://sqlinthewild.co.za/index.php/2010/02/18/not-exists-vs-not-in/

When the subquery returns even one null, NOT IN will not match any rows.

The reason for this can be found by looking at the details of what the NOT IN operation actually means.

Let's say, for illustration purposes that there are 4 rows in the table called t, there's a column called ID with values 1..4

WHERE SomeValue NOT IN (SELECT AVal FROM t)

is equivalent to

WHERE SomeValue != (SELECT AVal FROM t WHERE ID=1)
AND SomeValue != (SELECT AVal FROM t WHERE ID=2)
AND SomeValue != (SELECT AVal FROM t WHERE ID=3)
AND SomeValue != (SELECT AVal FROM t WHERE ID=4)

Let's further say that AVal is NULL where ID = 4. Hence that != comparison returns UNKNOWN. The logical truth table for AND states that UNKNOWN and TRUE is UNKNOWN, UNKNOWN and FALSE is FALSE. There is no value that can be AND'd with UNKNOWN to produce the result TRUE

Hence, if any row of that subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL and no records will be returned

当子查询返回一个空值时,NOT IN 将不匹配任何行。

其原因可以通过查看 NOT IN 操作实际含义的详细信息找到。

比方说,为了便于说明,表中有 4 行名为 t,有一列名为 ID 的值为 1..4

WHERE SomeValue NOT IN (SELECT AVal FROM t)

相当于

WHERE SomeValue != (SELECT AVal FROM t WHERE ID=1)
AND SomeValue != (SELECT AVal FROM t WHERE ID=2)
AND SomeValue != (SELECT AVal FROM t WHERE ID=3)
AND SomeValue != (SELECT AVal FROM t WHERE ID=4)

让我们进一步说 AVal 为 NULL,其中 ID = 4。因此 != 比较返回 UNKNOWN。AND 的逻辑真值表指出 UNKNOWN 和 TRUE 是 UNKNOWN,UNKNOWN 和 FALSE 是 FALSE。没有值可以与 UNKNOWN 进行 AND 运算以产生结果 TRUE

因此,如果该子查询的任何行返回 NULL,则整个 NOT IN 运算符将评估为 FALSE 或 NULL,并且不会返回任何记录

回答by John Millikin

If the execution planner says they're the same, they're the same. Use whichever one will make your intention more obvious -- in this case, the second.

如果执行计划者说它们是一样的,那么它们就是一样的。使用任何一个会让你的意图更明显的——在这种情况下,第二个。

回答by James Curran

Actually, I believe this would be the fastest:

实际上,我相信这将是最快的:

SELECT ProductID, ProductName 
    FROM Northwind..Products p  
          outer join Northwind..[Order Details] od on p.ProductId = od.ProductId)
WHERE od.ProductId is null

回答by Yella Chalamala

I have a table which has about 120,000 records and need to select only those which does not exist (matched with a varchar column) in four other tables with number of rows approx 1500, 4000, 40000, 200. All the involved tables have unique index on the concerned Varcharcolumn.

我有一个包含大约 120,000 条记录的表,只需要在其他四个表中选择那些不存在(与 varchar 列匹配)的表,行数约为 1500、4000、40000、200。所有涉及的表都有唯一索引在相关Varchar栏目上。

NOT INtook about 10 mins, NOT EXISTStook 4 secs.

NOT IN花了大约 10 分钟,NOT EXISTS花了 4 秒。

I have a recursive query which might had some untuned section which might have contributed to the 10 mins, but the other option taking 4 secs explains, atleast to me that NOT EXISTSis far better or at least that INand EXISTSare not exactly the same and always worth a check before going ahead with code.

我有一个递归查询,其中可能有一些未调整的部分,这可能会导致 10 分钟,但另一个需要 4 秒的选项解释说,至少对我NOT EXISTS来说要好得多,或者至少是这样INEXISTS并且不完全相同并且总是值得一试在继续代码之前检查。

回答by Jeffrey L Whitledge

In your specific example they are the same, because the optimizer has figured out what you are trying to do is the same in both examples. But it is possible that in non-trivial examples the optimizer may not do this, and in that case there are reasons to prefer one to other on occasion.

在您的具体示例中,它们是相同的,因为优化器已经确定您要尝试执行的操作在两个示例中都是相同的。但在非平凡的例子中,优化器可能不会这样做,在这种情况下,有时有理由选择一个而不是另一个。

NOT INshould be preferred if you are testing multiple rows in your outer select. The subquery inside the NOT INstatement can be evaluated at the beginning of the execution, and the temporary table can be checked against each value in the outer select, rather than re-running the subselect every time as would be required with the NOT EXISTSstatement.

NOT IN如果您在外部选择中测试多行,则应该是首选。NOT IN语句内的子查询可以在执行开始时进行评估,并且可以根据外部选择中的每个值检查临时表,而不是每次都重新运行子选择,就像NOT EXISTS语句所要求的那样。

If the subquery mustbe correlated with the outer select, then NOT EXISTSmay be preferable, since the optimizer may discover a simplification that prevents the creation of any temporary tables to perform the same function.

如果子查询必须与外部选择相关,那么NOT EXISTS可能更可取,因为优化器可能会发现一种简化,阻止创建任何临时表来执行相同的功能。

回答by ravish.hacker

I was using

我正在使用

SELECT * from TABLE1 WHERE Col1 NOT IN (SELECT Col1 FROM TABLE2)

and found that it was giving wrong results (By wrong I mean no results). As there was a NULL in TABLE2.Col1.

并发现它给出了错误的结果(错误是指没有结果)。因为 TABLE2.Col1 中有一个 NULL。

While changing the query to

将查询更改为

SELECT * from TABLE1 T1 WHERE NOT EXISTS (SELECT Col1 FROM TABLE2 T2 WHERE T1.Col1 = T2.Col2)

gave me the correct results.

给了我正确的结果。

Since then I have started using NOT EXISTS every where.

从那以后,我开始在任何地方使用 NOT EXISTS。

回答by Onga Leo-Yoda Vellem

They are very similar but not really the same.

它们非常相似,但并不完全相同。

In terms of efficiency, I've found the left join is nullstatement more efficient (when an abundance of rows are to be selected that is)

在效率方面,我发现left join is null语句更有效(当要选择大量行时)

回答by onedaywhen

If the optimizer says they are the same then consider the human factor. I prefer to see NOT EXISTS :)

如果优化器说它们相同,则考虑人为因素。我更喜欢看 NOT EXISTS :)

回答by Vlad Mihalcea

This is a very good question, so I decided to write a very detailed articleabout this topic on my blog.

这是一个很好的问题,所以我决定在我的博客上写一篇关于这个主题的非常详细的文章

Database table model

数据库表模型

Let's assume we have the following two tables in our database, that form a one-to-many table relationship.

假设我们的数据库中有以下两个表,它们形成一对多的表关系。

SQL EXISTS tables

SQL EXISTS 表

The studenttable is the parent, and the student_gradeis the child table since it has a student_id Foreign Key column referencing the id Primary Key column in the student table.

student表是父表,而student_grade是子表,因为它有一个 student_id 外键列引用学生表中的 id 主键列。

The student tablecontains the following two records:

student table包含以下两个记录:

| id | first_name | last_name | admission_score |
|----|------------|-----------|-----------------|
| 1  | Alice      | Smith     | 8.95            |
| 2  | Bob        | Johnson   | 8.75            |

And, the student_gradetable stores the grades the students received:

并且,该student_grade表存储了学生获得的成绩:

| id | class_name | grade | student_id |
|----|------------|-------|------------|
| 1  | Math       | 10    | 1          |
| 2  | Math       | 9.5   | 1          |
| 3  | Math       | 9.75  | 1          |
| 4  | Science    | 9.5   | 1          |
| 5  | Science    | 9     | 1          |
| 6  | Science    | 9.25  | 1          |
| 7  | Math       | 8.5   | 2          |
| 8  | Math       | 9.5   | 2          |
| 9  | Math       | 9     | 2          |
| 10 | Science    | 10    | 2          |
| 11 | Science    | 9.4   | 2          |

SQL EXISTS

SQL 存在

Let's say we want to get all students that have received a 10 grade in Math class.

假设我们想要让所有在数学课上获得 10 分的学生。

If we are only interested in the student identifier, then we can run a query like this one:

如果我们只对学生标识符感兴趣,那么我们可以运行如下查询:

SELECT
    student_grade.student_id
FROM
    student_grade
WHERE
    student_grade.grade = 10 AND
    student_grade.class_name = 'Math'
ORDER BY
    student_grade.student_id

But, the application is interested in displaying the full name of a student, not just the identifier, so we need info from the studenttable as well.

但是,应用程序感兴趣的是显示 a 的全名student,而不仅仅是标识符,因此我们还需要student表中的信息。

In order to filter the studentrecords that have a 10 grade in Math, we can use the EXISTS SQL operator, like this:

为了过滤student数学中有 10 分的记录,我们可以使用 EXISTS SQL 运算符,如下所示:

SELECT
    id, first_name, last_name
FROM
    student
WHERE EXISTS (
    SELECT 1
    FROM
        student_grade
    WHERE
        student_grade.student_id = student.id AND
        student_grade.grade = 10 AND
        student_grade.class_name = 'Math'
)
ORDER BY id

When running the query above, we can see that only the Alice row is selected:

运行上面的查询时,我们可以看到只选择了 Alice 行:

| id | first_name | last_name |
|----|------------|-----------|
| 1  | Alice      | Smith     |

The outer query selects the studentrow columns we are interested in returning to the client. However, the WHERE clause is using the EXISTS operator with an associated inner subquery.

外部查询选择student我们有兴趣返回给客户端的行列。但是,WHERE 子句将 EXISTS 运算符与关联的内部子查询一起使用。

The EXISTS operator returns true if the subquery returns at least one record and false if no row is selected. The database engine does not have to run the subquery entirely. If a single record is matched, the EXISTS operator returns true, and the associated other query row is selected.

如果子查询至少返回一条记录,则 EXISTS 运算符返回 true,如果未选择任何行,则返回 false。数据库引擎不必完全运行子查询。如果匹配单个记录,则 EXISTS 运算符返回 true,并选择关联的其他查询行。

The inner subquery is correlated because the student_id column of the student_gradetable is matched against the id column of the outer student table.

内部子查询是相关的,因为student_grade表的 student_id 列与外部 student 表的 id 列匹配。

SQL NOT EXISTS

SQL 不存在

Let's consider we want to select all students that have no grade lower than 9. For this, we can use NOT EXISTS, which negates the logic of the EXISTS operator.

假设我们要选择所有成绩不低于 9 的学生。为此,我们可以使用 NOT EXISTS,它否定 EXISTS 运算符的逻辑。

Therefore, the NOT EXISTS operator returns true if the underlying subquery returns no record. However, if a single record is matched by the inner subquery, the NOT EXISTS operator will return false, and the subquery execution can be stopped.

因此,如果基础子查询没有返回记录,则 NOT EXISTS 运算符返回 true。但是,如果单个记录与内部子查询匹配,则 NOT EXISTS 运算符将返回 false,并且可以停止子查询执行。

To match all student records that have no associated student_grade with a value lower than 9, we can run the following SQL query:

要匹配所有没有关联 student_grade 值小于 9 的学生记录,我们可以运行以下 SQL 查询:

SELECT
    id, first_name, last_name
FROM
    student
WHERE NOT EXISTS (
    SELECT 1
    FROM
        student_grade
    WHERE
        student_grade.student_id = student.id AND
        student_grade.grade < 9
)
ORDER BY id

When running the query above, we can see that only the Alice record is matched:

运行上面的查询时,我们可以看到只有 Alice 记录匹配:

| id | first_name | last_name |
|----|------------|-----------|
| 1  | Alice      | Smith     |

So, the advantage of using the SQL EXISTS and NOT EXISTS operators is that the inner subquery execution can be stopped as long as a matching record is found.

因此,使用 SQL EXISTS 和 NOT EXISTS 运算符的优点是只要找到匹配的记录,就可以停止内部子查询的执行。