SQL 使用 Exists 1 或 Exists * 的子查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1597442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:05:29  来源:igfitidea点击:

Subquery using Exists 1 or Exists *

sqlsql-servertsql

提问by Raj More

I used to write my EXISTS checks like this:

我曾经这样写我的 EXISTS 检查:

IF EXISTS (SELECT * FROM TABLE WHERE Columns=@Filters)
BEGIN
   UPDATE TABLE SET ColumnsX=ValuesX WHERE Where Columns=@Filters
END

One of the DBA's in a previous life told me that when I do an EXISTSclause, use SELECT 1instead of SELECT *

前世的一位 DBA 告诉我,当我做一个EXISTS子句时,使用SELECT 1而不是SELECT *

IF EXISTS (SELECT 1 FROM TABLE WHERE Columns=@Filters)
BEGIN
   UPDATE TABLE SET ColumnsX=ValuesX WHERE Columns=@Filters
END

Does this really make a difference?

这真的有区别吗?

回答by Matt Rogish

No, SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.

不,SQL Server 很聪明,并且知道它正在用于 EXISTS,并且不会向系统返回任何数据。

Quoth Microsoft: http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4

引用微软:http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud =4

The select list of a subquery introduced by EXISTS almost always consists of an asterisk (*). There is no reason to list column names because you are just testing whether rows that meet the conditions specified in the subquery exist.

EXISTS 引入的子查询的选择列表几乎总是由星号 (*) 组成。没有理由列出列名,因为您只是在测试满足子查询中指定条件的行是否存在。

To check yourself, try running the following:

要检查自己,请尝试运行以下命令:

SELECT whatever
  FROM yourtable
 WHERE EXISTS( SELECT 1/0
                 FROM someothertable 
                WHERE a_valid_clause )

If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.

如果它实际上是在用 SELECT 列表做某事,它会抛出一个 div 零错误。它没有。

EDIT: Note, the SQL Standard actually talks about this.

编辑:注意,SQL 标准实际上谈到了这一点。

ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

ANSI SQL 1992 标准,第 191 页http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

3) Case:
a) If the <select list>"*" is simply contained in a <subquery>that is immediately contained in an <exists predicate>, then the <select list>is equivalent to a <value expression>that is an arbitrary <literal>.

3) 情况:
a) 如果<select list>“*”仅包含在<subquery>直接包含在 an中的 a中<exists predicate>,则 the<select list>等价于 a<value expression>是任意的<literal>

回答by Martin Smith

The reason for this misconception is presumably because of the belief that it will end up reading all columns. It is easy to see that this is not the case.

这种误解的原因大概是因为相信它最终会阅读所有列。很容易看出事实并非如此。

CREATE TABLE T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)

CREATE NONCLUSTERED INDEX NarrowIndex ON T(Y)

IF EXISTS (SELECT * FROM T)
    PRINT 'Y'

Gives plan

给出方案

Plan

计划

This shows that SQL Server was able to use the narrowest index available to check the result despite the fact that the index does not include all columns. The index access is under a semi join operator which means that it can stop scanning as soon as the first row is returned.

这表明 SQL Server 能够使用可用的最窄索引来检查结果,尽管该索引不包括所有列。索引访问是在半连接运算符下进行的,这意味着它可以在返回第一行后立即停止扫描。

So it is clear the above belief is wrong.

所以很明显,上述信念是错误的。

However Conor Cunningham from the Query Optimiser team explains herethat he typically uses SELECT 1in this case as it can make a minor performance difference in the compilationof the query.

然而,来自 Query Optimiser 团队的 Conor Cunningham 在这里解释说,他通常SELECT 1在这种情况下使用它,因为它可以在查询编译中产生微小的性能差异。

The QP will take and expand all *'s early in the pipeline and bind them to objects (in this case, the list of columns). It will then remove unneeded columns due to the nature of the query.

So for a simple EXISTSsubquery like this:

SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2)The *will be expanded to some potentially big column list and then it will be determined that the semantics of the EXISTSdoes not require any of those columns, so basically all of them can be removed.

"SELECT 1" will avoid having to examine any unneeded metadata for that table during query compilation.

However, at runtime the two forms of the query will be identical and will have identical runtimes.

QP 将*在管道的早期获取并扩展 all并将它们绑定到对象(在本例中为列列表)。由于查询的性质,它将删除不需要的列。

所以对于EXISTS像这样的简单子查询:

SELECT col1 FROM MyTable WHERE EXISTS (SELECT * FROM Table2 WHERE MyTable.col1=Table2.col2)*将扩展到一些潜在的大列的列表,然后将确定的语义 EXISTS不需要任何这些列的,所以基本上所有的人都可以被删除。

" SELECT 1" 将避免在查询编译期间检查该表的任何不需要的元数据。

但是,在运行时,这两种形式的查询将是相同的,并且具有相同的运行时。

I tested four possible ways of expressing this query on an empty table with various numbers of columns. SELECT 1vs SELECT *vs SELECT Primary_Keyvs SELECT Other_Not_Null_Column.

我在具有不同列数的空表上测试了四种可能的表达方式。SELECT 1vs SELECT *vs SELECT Primary_Keyvs SELECT Other_Not_Null_Column.

I ran the queries in a loop using OPTION (RECOMPILE)and measured the average number of executions per second. Results below

我使用OPTION (RECOMPILE)并测量了每秒的平均执行次数在循环中运行查询。结果如下

enter image description here

在此处输入图片说明

+-------------+----------+---------+---------+--------------+
| Num of Cols |    *     |    1    |   PK    | Not Null col |
+-------------+----------+---------+---------+--------------+
| 2           | 2043.5   | 2043.25 | 2073.5  | 2067.5       |
| 4           | 2038.75  | 2041.25 | 2067.5  | 2067.5       |
| 8           | 2015.75  | 2017    | 2059.75 | 2059         |
| 16          | 2005.75  | 2005.25 | 2025.25 | 2035.75      |
| 32          | 1963.25  | 1967.25 | 2001.25 | 1992.75      |
| 64          | 1903     | 1904    | 1936.25 | 1939.75      |
| 128         | 1778.75  | 1779.75 | 1799    | 1806.75      |
| 256         | 1530.75  | 1526.5  | 1542.75 | 1541.25      |
| 512         | 1195     | 1189.75 | 1203.75 | 1198.5       |
| 1024        | 694.75   | 697     | 699     | 699.25       |
+-------------+----------+---------+---------+--------------+
| Total       | 17169.25 | 17171   | 17408   | 17408        |
+-------------+----------+---------+---------+--------------+

As can be seen there is no consistent winner between SELECT 1and SELECT *and the difference between the two approaches is negligible. The SELECT Not Null coland SELECT PKdo appear slightly faster though.

可以看出SELECT 1SELECT *和之间没有一致的赢家,两种方法之间的差异可以忽略不计。该SELECT Not Null colSELECT PK你稍快,虽然出现。

All four of the queries degrade in performance as the number of columns in the table increases.

随着表中列数的增加,所有四个查询的性能都会下降。

As the table is empty this relationship does seem only explicable by the amount of column metadata. For COUNT(1)it is easy to see that this gets rewritten to COUNT(*)at some point in the process from the below.

由于表是空的,这种关系似乎只能通过列元数据的数量来解释。因为COUNT(1)很容易看出,这COUNT(*)在下面的过程中的某个时刻被重写。

SET SHOWPLAN_TEXT ON;

GO

SELECT COUNT(1)
FROM master..spt_values

Which gives the following plan

这给出了以下计划

  |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1004],0)))
       |--Stream Aggregate(DEFINE:([Expr1004]=Count(*)))
            |--Index Scan(OBJECT:([master].[dbo].[spt_values].[ix2_spt_values_nu_nc]))

Attaching a debugger to the SQL Server process and randomly breaking whilst executing the below

将调试器附加到 SQL Server 进程并在执行以下操作时随机中断

DECLARE @V int 

WHILE (1=1)
    SELECT @V=1 WHERE EXISTS (SELECT 1 FROM ##T) OPTION(RECOMPILE)

I found that in the cases where the table has 1,024 columns most of the time the call stack looks like something like the below indicating that it is indeed spending a large proportion of the time loading column metadata even when SELECT 1is used (For the case where the table has 1 column randomly breaking didn't hit this bit of the call stack in 10 attempts)

我发现在大多数情况下表有 1,024 列的情况下,调用堆栈看起来像下面这样,表明即使在SELECT 1使用时,它确实花费了很大一部分时间加载列元数据(对于表有 1 列随机中断在 10 次尝试中没有命中调用堆栈的这一位)

sqlservr.exe!CMEDAccess::GetProxyBaseIntnl()  - 0x1e2c79 bytes  
sqlservr.exe!CMEDProxyRelation::GetColumn()  + 0x57 bytes   
sqlservr.exe!CAlgTableMetadata::LoadColumns()  + 0x256 bytes    
sqlservr.exe!CAlgTableMetadata::Bind()  + 0x15c bytes   
sqlservr.exe!CRelOp_Get::BindTree()  + 0x98 bytes   
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CRelOp_FromList::BindTree()  + 0x5c bytes  
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CRelOp_QuerySpec::BindTree()  + 0xbe bytes 
sqlservr.exe!COptExpr::BindTree()  + 0x58 bytes 
sqlservr.exe!CScaOp_Exists::BindScalarTree()  + 0x72 bytes  
... Lines omitted ...
msvcr80.dll!_threadstartex(void * ptd=0x0031d888)  Line 326 + 0x5 bytes C
kernel32.dll!_BaseThreadStart@8()  + 0x37 bytes 

This manual profiling attempt is backed up by the VS 2012 code profiler which shows a very different selection of functions consuming the compilation time for the two cases (Top 15 Functions 1024 columnsvs Top 15 Functions 1 column).

这种手动分析尝试得到了 VS 2012 代码分析器的支持,它显示了两种情况下消耗编译时间的非常不同的函数选择(前 15 个函数 1024 列前 15 个函数 1 列)。

Both the SELECT 1and SELECT *versions wind up checking column permissions and fail if the user is not granted access to all columns in the table.

如果用户未被授予访问表中所有列的权限,则SELECT 1SELECT *版本都会检查列权限并失败。

An example I cribbed from a conversation on the heap

我从堆上的对话中摘录的一个例子

CREATE USER blat WITHOUT LOGIN;
GO
CREATE TABLE dbo.T
(
X INT PRIMARY KEY,
Y INT,
Z CHAR(8000)
)
GO

GRANT SELECT ON dbo.T TO blat;
DENY SELECT ON dbo.T(Z) TO blat;
GO
EXECUTE AS USER = 'blat';
GO

SELECT 1
WHERE  EXISTS (SELECT 1
               FROM   T); 
/*  ↑↑↑↑ 
Fails unexpectedly with 

The SELECT permission was denied on the column 'Z' of the 
           object 'T', database 'tempdb', schema 'dbo'.*/

GO
REVERT;
DROP USER blat
DROP TABLE T

So one might speculate that the minor apparent difference when using SELECT some_not_null_colis that it only winds up checking permissions on that specific column (though still loads the metadata for all). However this doesn't seem to fit with the facts as the percentage difference between the two approaches if anything gets smaller as the number of columns in the underlying table increases.

因此,人们可能会推测使用时的细微明显差异SELECT some_not_null_col是它只会检查该特定列的权限(尽管仍然加载所有元数据)。然而,这似乎与事实不符,因为随着基础表中的列数增加,两种方法之间的百分比差异会变小。

In any event I won't be rushing out and changing all my queries to this form as the difference is very minor and only apparent during query compilation. Removing the OPTION (RECOMPILE)so that subsequent executions can use a cached plan gave the following.

在任何情况下,我都不会急于将所有查询更改为这种形式,因为差异非常小并且仅在查询编译期间才明显。删除OPTION (RECOMPILE)以便后续执行可以使用缓存计划给出以下内容。

enter image description here

在此处输入图片说明

+-------------+-----------+------------+-----------+--------------+
| Num of Cols |     *     |     1      |    PK     | Not Null col |
+-------------+-----------+------------+-----------+--------------+
| 2           | 144933.25 | 145292     | 146029.25 | 143973.5     |
| 4           | 146084    | 146633.5   | 146018.75 | 146581.25    |
| 8           | 143145.25 | 144393.25  | 145723.5  | 144790.25    |
| 16          | 145191.75 | 145174     | 144755.5  | 146666.75    |
| 32          | 144624    | 145483.75  | 143531    | 145366.25    |
| 64          | 145459.25 | 146175.75  | 147174.25 | 146622.5     |
| 128         | 145625.75 | 143823.25  | 144132    | 144739.25    |
| 256         | 145380.75 | 147224     | 146203.25 | 147078.75    |
| 512         | 146045    | 145609.25  | 145149.25 | 144335.5     |
| 1024        | 148280    | 148076     | 145593.25 | 146534.75    |
+-------------+-----------+------------+-----------+--------------+
| Total       | 1454769   | 1457884.75 | 1454310   | 1456688.75   |
+-------------+-----------+------------+-----------+--------------+

The test script I used can be found here

我使用的测试脚本可以在这里找到

回答by HLGEM

Best way to know is to performance test both versions and check out the execution plan for both versions. Pick a table with lots of columns.

最好的了解方法是对两个版本进行性能测试并查看两个版本的执行计划。选择一个有很多列的表。

回答by Cade Roux

There is no difference in SQL Server and it has never been a problem in SQL Server. The optimizer knows that they are the same. If you look at the execution plans, you will see that they are identical.

在 SQL Server 中没有区别,在 SQL Server 中从来没有问题。优化器知道它们是相同的。如果您查看执行计划,您会发现它们是相同的。

回答by Larry Lustig

Personally I find it very, very hard to believe that they don't optimize to the same query plan. But the only way to know in your particular situation is to test it. If you do, please report back!

我个人觉得很难相信他们没有针对相同的查询计划进行优化。但是在您的特定情况下了解的唯一方法是对其进行测试。如果你这样做了,请回来报告!

回答by orjan

Not any real difference but there might be a very small performance hit. As a rule of thumb you should not ask for more data than you need.

没有任何真正的区别,但可能会对性能造成很小的影响。根据经验,您不应该要求比您需要的更多的数据。