Count(*) 与 Count(1) - SQL Server

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1221559/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 03:04:07  来源:igfitidea点击:

Count(*) vs Count(1) - SQL Server

sqlsql-serverperformance

提问by super9

Just wondering if any of you people use Count(1)over Count(*)and if there is a noticeable difference in performance or if this is just a legacy habit that has been brought forward from days gone past?

只是想知道你们中是否有人Count(1)过度使用Count(*),性能是否有明显差异,或者这是否只是过去几天带来的遗留习惯?

The specific database is SQL Server 2005.

具体的数据库是SQL Server 2005.

采纳答案by gbn

There is no difference.

没有区别。

Reason:

原因:

Books on-linesays "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"

网上书上说“ COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )

"1" is a non-null expression: so it's the same as COUNT(*). The optimizer recognizes it for what it is: trivial.

"1" 是一个非空表达式:所以它与COUNT(*). 优化器认识到它是什么:微不足道。

The same as EXISTS (SELECT * ...or EXISTS (SELECT 1 ...

EXISTS (SELECT * ...或相同EXISTS (SELECT 1 ...

Example:

例子:

SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID

SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID

Same IO, same plan, the works

相同的 IO,相同的计划,作品

Edit, Aug 2011

编辑,2011 年 8 月

Similar question on DBA.SE.

DBA.SE 上的类似问题

Edit, Dec 2011

编辑,2011 年 12 月

COUNT(*)is mentioned specifically in ANSI-92(look for "Scalar expressions 125")

COUNT(*)ANSI-92中特别提到(寻找“ Scalar expressions 125”)

Case:

a) If COUNT(*) is specified, then the result is the cardinality of T.

案件:

a) 如果指定了 COUNT(*),则结果为 T 的基数。

That is, the ANSI standard recognizes it as bleeding obvious what you mean. COUNT(1)has been optimized out by RDBMS vendors becauseof this superstition. Otherwise it would be evaluated as per ANSI

也就是说,ANSI 标准认为它很明显你的意思。由于这种迷信COUNT(1),RDBMS 供应商已对其进行了优化。否则,它将按照 ANSI 进行评估

b) Otherwise, let TX be the single-column table that is the result of applying the <value expression> to each row of T and eliminating null values. If one or more null values are eliminated, then a completion condition is raised: warning-

b) 否则,令 TX 为将 <value expression> 应用于 T 的每一行并消除空值的结果的单列表。如果消除了一个或多个空值,则会引发完成条件:警告-

回答by Quassnoi

In SQL Server, these statements yield the same plans.

在 SQL Server 中,这些语句产生相同的计划。

Contrary to the popular opinion, in Oracle they do too.

与流行的观点相反,在 Oracle 中他们也这样做。

SYS_GUID()in Oracle is quite computation intensive function.

SYS_GUID()在 Oracle 中是相当计算密集型的功能。

In my test database, t_evenis a table with 1,000,000rows

在我的测试数据库中,t_even是一个带有1,000,000行的表

This query:

这个查询:

SELECT  COUNT(SYS_GUID())
FROM    t_even

runs for 48seconds, since the function needs to evaluate each SYS_GUID()returned to make sure it's not a NULL.

运行48几秒钟,因为该函数需要评估每个SYS_GUID()返回的值以确保它不是NULL.

However, this query:

但是,这个查询:

SELECT  COUNT(*)
FROM    (
        SELECT  SYS_GUID()
        FROM    t_even
        )

runs for but 2seconds, since it doen't even try to evaluate SYS_GUID()(despite *being argument to COUNT(*))

只运行2几秒钟,因为它甚至不尝试评估SYS_GUID()(尽管*是 的论据COUNT(*)

回答by Tony Andrews

Clearly, COUNT(*)and COUNT(1)will alwaysreturn the same result. Therefore, if one were slower than the other it would effectively be due to an optimiser bug. Since both forms are used very frequently in queries, it would make no sense for a DBMS to allow such a bug to remain unfixed. Hence you will find that the performance of both forms is (probably) identical in all major SQL DBMSs.

显然,COUNT(*)并且COUNT(1)始终返回相同的结果。因此,如果一个比另一个慢,那实际上是由于优化器错误。由于这两种形式在查询中都非常频繁地使用,因此 DBMS 允许这样的错误保持未修复是没有意义的。因此,您会发现两种形式的性能(可能)在所有主要 SQL DBMS 中都是相同的。

回答by Conor Cunningham MSFT

I work on the SQL Server team and I can hopefully clarify a few points in this thread (I had not seen it previously, so I am sorry the engineering team has not done so previously).

我在 SQL Server 团队工作,我希望可以澄清这个线程中的几点(我以前没有见过,所以很抱歉工程团队以前没有这样做)。

First, there is no semantic difference between select count(1) from tablevs. select count(*) from table. They return the same results in all cases (and it is a bug if not). As noted in the other answers, select count(column) from tableis semantically different and does not always return the same results as count(*).

首先,select count(1) from tableselect count(*) from table.之间没有语义差异。它们在所有情况下都返回相同的结果(如果不是,则是一个错误)。正如其他答案中所述,select count(column) from table在语义上是不同的,并不总是返回与count(*).

Second, with respect to performance, there are two aspects that would matter in SQL Server (and SQL Azure): compilation-time work and execution-time work. The Compilation time work is a trivially small amount of extra work in the current implementation. There is an expansion of the * to all columns in some cases followed by a reduction back to 1 column being output due to how some of the internal operations work in binding and optimization. I doubt it would show up in any measurable test, and it would likely get lost in the noise of all the other things that happen under the covers (such as auto-stats, xevent sessions, query store overhead, triggers, etc.). It is maybe a few thousand extra CPU instructions. So, count(1) does a tiny bit less work during compilation (which will usually happen once and the plan is cached across multiple subsequent executions). For execution time, assuming the plans are the same there should be no measurable difference. (One of the earlier examples shows a difference - it is most likely due to other factors on the machine if the plan is the same).

其次,关于性能,在 SQL Server(和 SQL Azure)中有两个方面很重要:编译时工作和执行时工作。在当前的实现中,编译时间工作是一个微不足道的额外工作。在某些情况下,将 * 扩展到所有列,然后由于某些内部操作在绑定和优化中的工作方式而减少回 1 列输出。我怀疑它会出现在任何可衡量的测试中,并且它可能会被隐藏在幕后发生的所有其他事情的噪音中迷失(例如自动统计、xevent 会话、查询存储开销、触发器等)。它可能是几千个额外的 CPU 指令。所以,count(1) 在编译期间做的工作少一点(这通常会发生一次,并且计划在多个后续执行中缓存)。对于执行时间,假设计划相同,应该没有可测量的差异。(较早的示例之一显示了差异 - 如果计划相同,很可能是由于机器上的其他因素造成的)。

As to how the plan can potentially be different. These are extremely unlikely to happen, but it is potentially possible in the architecture of the current optimizer. SQL Server's optimizer works as a search program (think: computer program playing chess searching through various alternatives for different parts of the query and costing out the alternatives to find the cheapest plan in reasonable time). This search has a few limits on how it operates to keep query compilation finishing in reasonable time. For queries beyond the most trivial, there are phases of the search and they deal with tranches of queries based on how costly the optimizer thinks the query is to potentially execute. There are 3 main search phases, and each phase can run more aggressive(expensive) heuristics trying to find a cheaper plan than any prior solution. Ultimately, there is a decision process at the end of each phase that tries to determine whether it should return the plan it found so far or should it keep searching. This process uses the total time taken so far vs. the estimated cost of the best plan found so far. So, on different machines with different speeds of CPUs it is possible (albeit rare) to get different plans due to timing out in an earlier phase with a plan vs. continuing into the next search phase. There are also a few similar scenarios related to timing out of the last phase and potentially running out of memory on very, very expensive queries that consume all the memory on the machine (not usually a problem on 64-bit but it was a larger concern back on 32-bit servers). Ultimately, if you get a different plan the performance at runtime would differ. I don't think it is remotely likely that the difference in compilation time would EVER lead to any of these conditions happening.

至于计划可能如何不同。这些情况极不可能发生,但在当前优化器的架构中是有可能发生的。SQL Server 的优化器用作搜索程序(想想:计算机程序下棋搜索查询的不同部分的各种替代方案,并计算出替代方案的成本,以在合理的时间内找到最便宜的计划)。此搜索在如何操作以保持查询编译在合理时间内完成方面存在一些限制。对于超出最琐碎的查询,存在搜索阶段,它们根据优化器认为查询可能执行的成本来处理查询的批次。有 3 个主要搜索阶段,每个阶段都可以运行更积极(昂贵)的启发式方法,试图找到比任何先前解决方案更便宜的计划。最终,在每个阶段结束时都有一个决策过程,试图确定它是否应该返回迄今为止找到的计划,还是应该继续搜索。此过程使用到目前为止所用的总时间与迄今为止找到的最佳计划的估计成本。因此,在具有不同 CPU 速度的不同机器上,由于在较早阶段超时与继续进入下一个搜索阶段,可能(尽管很少)获得不同的计划。还有一些类似的场景与最后一个阶段的超时有关,并且可能会在非常非常昂贵的查询中耗尽内存,这些查询消耗了机器上的所有内存(通常在 64 位上不是问题,但这是一个更大的问题)回到 32 位服务器上)。最终,如果您获得不同的计划,运行时的性能会有所不同。我不

Net-net: Please use whichever of the two you want as none of this matters in any practical form. (There are far, far larger factors that impact performance in SQL beyond this topic, honestly).

Net-net:请使用您想要的两个中的任何一个,因为这在任何实际形式中都无关紧要。(老实说,除了本主题之外,还有很多更大的因素会影响 SQL 中的性能)。

I hope this helps. I did write a book chapter about how the optimizer works but I don't know if its appropriate to post it here (as I get tiny royalties from it still I believe). So, instead of posting that I'll post a link to a talk I gave at SQLBits in the UK about how the optimizer works at a high level so you can see the different main phases of the search in a bit more detail if you want to learn about that. Here's the video link: https://sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer

我希望这有帮助。我确实写了一本书关于优化器如何工作的章节,但我不知道将它发布在这里是否合适(因为我仍然相信我从中获得了微薄的版税)。因此,我将发布一个链接,指向我在英国 SQLBits 上发表的关于优化器如何在高层次上工作的演讲的链接,以便您可以更详细地了解搜索的不同主要阶段(如果您愿意)了解这一点。这是视频链接:https: //sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer

回答by onedaywhen

In the SQL-92 Standard, COUNT(*)specifically means "the cardinality of the table expression" (could be a base table, `VIEW, derived table, CTE, etc).

在 SQL-92 标准中,COUNT(*)特指“表表达式的基数”(可以是基表、`VIEW、派生表、CTE 等)。

I guess the idea was that COUNT(*)is easy to parse. Using any other expression requires the parser to ensure it doesn't reference any columns (COUNT('a')where ais a literal and COUNT(a)where ais a column can yield different results).

我想这个想法很COUNT(*)容易解析。使用任何其他表达式需要解析器确保它不引用任何列(COUNT('a')wherea是文字和COUNT(a)wherea是列可以产生不同的结果)。

In the same vein, COUNT(*)can be easily picked out by a human coder familiar with the SQL Standards, a useful skill when working with more than one vendor's SQL offering.

同样,COUNT(*)熟悉 SQL 标准的人类编码人员可以很容易地挑选出,这是使用多个供应商的 SQL 产品时的一项有用技能。

Also, in the special case SELECT COUNT(*) FROM MyPersistedTable;, the thinking is the DBMS is likely to hold statistics for the cardinality of the table.

此外,在特殊情况下SELECT COUNT(*) FROM MyPersistedTable;,思想是 DBMS 可能会保存表基数的统计信息。

Therefore, because COUNT(1)and COUNT(*)are semantically equivalent, I use COUNT(*).

因此,因为COUNT(1)COUNT(*)在语义上是等价的,所以我使用COUNT(*).

回答by Nakul Chaudhary

COUNT(*)and COUNT(1)are same in case of result and performance.

COUNT(*)并且COUNT(1)在结果和性能的情况下相同。

回答by Richard

I would expect the optimiser to ensure there is no real difference outside weird edge cases.

我希望优化器能够确保在奇怪的边缘情况之外没有真正的区别。

As with anything, the only real way to tell is to measure your specific cases.

与任何事情一样,唯一真正的判断方法是衡量您的具体情况。

That said, I've always used COUNT(*).

也就是说,我一直使用COUNT(*).

回答by Thorsten Kettner

As this question comes up again and again, here is one more answer. I hope to add something for beginners wondering about "best practice" here.

随着这个问题一次又一次地出现,这里再给出一个答案。我希望在这里为想知道“最佳实践”的初学者添加一些东西。

SELECT COUNT(*) FROM somethingcounts records which is an easy task.

SELECT COUNT(*) FROM something计数记录这是一项简单的任务。

SELECT COUNT(1) FROM somethingretrieves a 1 per record and than counts the 1s that are not null, which is essentially counting records, only more complicated.

SELECT COUNT(1) FROM something检索每条记录的 1,然后计算非空的 1,这本质上是计算记录,只是更复杂。

Having said this: Good dbms notice that the second statement will result in the same count as the first statement and re-interprete it accordingly, as not to do unnecessary work. So usually both statements will result in the same execution plan and take the same amount of time.

话虽如此:好的 dbms 注意第二个语句将导致与第一个语句相同的计数并相应地重新解释它,以免做不必要的工作。所以通常这两个语句会产生相同的执行计划并花费相同的时间。

However from the point of readability you should use the first statement. You want to count records, so count records, not expressions. Use COUNT(expression) only when you want to count non-null occurences of something.

但是,从可读性的角度来看,您应该使用第一个语句。你想计算记录数,所以计算记录数,而不是表达式。仅当您想计算某事的非空出现时才使用 COUNT(expression)。

回答by RBT

I ran a quick test on SQL Server 2012 on an 8 GB RAM hyper-v box. You can see the results for yourself. I was not running any other windowed application apart from SQL Server Management Studio while running these tests.

我在 8 GB RAM hyper-v 机器上对 SQL Server 2012 进行了快速测试。您可以自己查看结果。在运行这些测试时,除了 SQL Server Management Studio 之外,我没有运行任何其他窗口化应用程序。

My table schema:

我的表架构:

CREATE TABLE [dbo].[employee](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Total number of records in Employeetable: 178090131 (~ 178 million rows)

Employee表中记录总数:178090131(~1.78 亿行)

First Query:

第一个查询:

Set Statistics Time On
Go    
Select Count(*) From Employee
Go    
Set Statistics Time Off
Go

Result of First Query:

第一次查询的结果:

 SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 35 ms.

 (1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 10766 ms,  elapsed time = 70265 ms.
 SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

Second Query:

第二个查询:

    Set Statistics Time On
    Go    
    Select Count(1) From Employee
    Go    
    Set Statistics Time Off
    Go

Result of Second Query:

第二次查询的结果:

 SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 11031 ms,  elapsed time = 70182 ms.
 SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

You can notice there is a difference of 83 (= 70265 - 70182) milliseconds which can easily be attributed to exact system condition at the time queries are run. Also I did a single run, so this difference will become more accurate if I do several runs and do some averaging. If for such a huge data-set the difference is coming less than 100 milliseconds, then we can easily conclude that the two queries do not have any performance difference exhibited by the SQL Server Engine.

您会注意到有 83 (= 70265 - 70182) 毫秒的差异,这很容易归因于运行查询时的确切系统条件。我也做了一次运行,所以如果我做几次运行并做一些平均,这种差异会变得更准确。如果对于如此庞大的数据集,差异小于 100 毫秒,那么我们可以很容易地得出结论,SQL Server 引擎所表现出的两个查询没有任何性能差异。

Note: RAM hits close to 100% usage in both the runs. I restarted SQL Server service before starting both the runs.

注意:在两次运行中,RAM 的使用率都接近 100%。在开始两次运行之前,我重新启动了 SQL Server 服务。

回答by Eyal Z.

SET STATISTICS TIME ON

select count(1) from MyTable (nolock) -- table containing 1 million records. 

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 36 ms.

SQL Server 执行时间:
CPU 时间 = 31 毫秒,已用时间 = 36 毫秒。

select count(*) from MyTable (nolock) -- table containing 1 million records. 

SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 37 ms.

SQL Server 执行时间:
CPU 时间 = 46 毫秒,已用时间 = 37 毫秒。

I've ran this hundreds of times, clearing cache every time.. The results vary from time to time as server load varies, but almost always count(*)has higher cpu time.

我已经运行了数百次,每次都清除缓存。随着服务器负载的变化,结果会不时变化,但几乎总是count(*)有更高的 CPU 时间。