SQL COUNT(*) vs. COUNT(1) vs. COUNT(pk):哪个更好?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2710621/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
COUNT(*) vs. COUNT(1) vs. COUNT(pk): which is better?
提问by zneak
I often find these three variants:
我经常发现这三种变体:
SELECT COUNT(*) FROM Foo;
SELECT COUNT(1) FROM Foo;
SELECT COUNT(PrimaryKey) FROM Foo;
As far as I can see, they all do the same thing, and I find myself using the three in my codebase. However, I don't like to do the same thing different ways. To which one should I stick? Is any one of them better than the two others?
据我所知,它们都做同样的事情,我发现自己在我的代码库中使用了这三个。但是,我不喜欢以不同的方式做同样的事情。我应该坚持哪一个?他们中的任何一个都比其他两个更好吗?
采纳答案by Michael Buen
Bottom Line
底线
Use either COUNT(field)
or COUNT(*)
, and stick with it consistently, and if your database allows COUNT(tableHere)
or COUNT(tableHere.*)
, use that.
使用COUNT(field)
或COUNT(*)
,并始终坚持使用它,如果您的数据库允许COUNT(tableHere)
或COUNT(tableHere.*)
,请使用它。
In short, don't use COUNT(1)
for anything. It's a one-trick pony, which rarely does what you want, and in those rare cases is equivalent to count(*)
简而言之,不要COUNT(1)
用于任何事情。这是一匹一招式的小马,它很少做你想做的事,在那些罕见的情况下相当于count(*)
Use count(*)
for counting
使用count(*)
计数
Use *
for all your queries that need to count everything, even for joins, use *
使用*
您的所有疑问需要计数的一切,甚至连接,使用*
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
But don't use COUNT(*)
for LEFT joins, as that will return 1 even if the subordinate table doesn't match anything from parent table
但是不要COUNT(*)
用于 LEFT 连接,因为即使从属表与父表中的任何内容都不匹配,它也会返回 1
SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Don't be fooled by those advising that when using *
in COUNT, it fetches entire row from your table, saying that *
is slow. The *
on SELECT COUNT(*)
and SELECT *
has no bearing to each other, they are entirely different thing, they just share a common token, i.e. *
.
不要被那些建议*
在 COUNT 中使用时,它从你的表中获取整行的人所愚弄,说这*
很慢。该*
上SELECT COUNT(*)
和SELECT *
无轴承对方,他们是完全不同的事情,他们只是共用一个道理,即*
。
An alternate syntax
替代语法
In fact, if it is not permitted to name a field as same as its table name, RDBMS language designer could give COUNT(tableNameHere)
the same semantics as COUNT(*)
. Example:
事实上,如果不允许将字段命名为与其表名相同的名称,RDBMS 语言设计者可以给出COUNT(tableNameHere)
与COUNT(*)
. 例子:
For counting rows we could have this:
为了计算行数,我们可以这样:
SELECT COUNT(emp) FROM emp
And they could make it simpler:
他们可以让它更简单:
SELECT COUNT() FROM emp
And for LEFT JOINs, we could have this:
对于 LEFT JOIN,我们可以这样:
SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
But they cannot do that (COUNT(tableNameHere)
) since SQL standard permits naming a field with the same name as its table name:
但是他们不能这样做 ( COUNT(tableNameHere)
) 因为 SQL 标准允许命名一个与它的表名同名的字段:
CREATE TABLE fruit -- ORM-friendly name
(
fruit_id int NOT NULL,
fruit varchar(50), /* same name as table name,
and let's say, someone forgot to put NOT NULL */
shape varchar(50) NOT NULL,
color varchar(50) NOT NULL
)
Counting with null
用 null 计数
And also, it is not a good practice to make a field nullable if its name matches the table name. Say you have values 'Banana', 'Apple', NULL, 'Pears' on fruit
field. This will not count all rows, it will only yield 3, not 4
而且,如果字段名称与表名称匹配,则将字段设为可空也不是一个好习惯。假设您在fruit
字段上有值 'Banana'、'Apple'、NULL、'Pears' 。这不会计算所有行,它只会产生 3,而不是 4
SELECT count(fruit) FROM fruit
Though some RDBMS do that sort of principle (for counting the table's rows, it accepts table name as COUNT's parameter), this will work in Postgresql (if there is no subordinate
field in any of the two tables below, i.e. as long as there is no name conflict between field name and table name):
尽管一些 RDBMS 执行这种原则(为了计算表的行数,它接受表名作为 COUNT 的参数),这将在 Postgresql 中工作(如果subordinate
下面两个表中的任何一个中都没有字段,即只要没有字段名和表名之间的名称冲突):
SELECT boss.boss_id, COUNT(subordinate)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
But that could cause confusion later if we will add a subordinate
field in the table, as it will count the field(which could be nullable), not the table rows.
但是,如果我们subordinate
在表中添加一个字段,这可能会导致稍后混淆,因为它将计算字段(可能为空),而不是表行。
So to be on the safe side, use:
因此,为了安全起见,请使用:
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
count(1)
: The one-trick pony
count(1)
: 一招小马
In particular to COUNT(1)
, it is a one-trick pony, it works well only on one table query:
特别是 to COUNT(1)
,它是一招小马,它只适用于一个表查询:
SELECT COUNT(1) FROM tbl
But when you use joins, that trick won't work on multi-table queries without its semantics being confused, and in particular you cannot write:
但是当你使用连接时,这个技巧在没有语义被混淆的情况下不适用于多表查询,特别是你不能写:
-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
So what's the meaning of COUNT(1) here?
那么这里的 COUNT(1) 是什么意思呢?
SELECT boss.boss_id, COUNT(1)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Is it this...?
这是……?
-- counting all the subordinates only
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Or this...?
或这个...?
-- or is that COUNT(1) will also count 1 for boss regardless if boss has a subordinate
SELECT boss.boss_id, COUNT(*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
By careful thought, you can infer that COUNT(1)
is the same as COUNT(*)
, regardless of type of join. But for LEFT JOINs result, we cannot mold COUNT(1)
to work as: COUNT(subordinate.boss_id)
, COUNT(subordinate.*)
通过仔细考虑,您可以推断出COUNT(1)
与 相同COUNT(*)
,无论连接类型如何。但对于LEFT JOIN的结果,我们不能塑造COUNT(1)
工作为:COUNT(subordinate.boss_id)
,COUNT(subordinate.*)
So just use either of the following:
所以只需使用以下任一方法:
-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.boss_id)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Works on Postgresql, it's clear that you want to count the cardinality of the set
适用于Postgresql,很明显你要计算集合的基数
-- count the subordinates that belongs to boss
SELECT boss.boss_id, COUNT(subordinate.*)
FROM boss
LEFT JOIN subordinate on subordinate.boss_id = boss.boss_id
GROUP BY boss.id
Another way to count the cardinality of the set, very English-like (just don't make a column with a name same as its table name) : http://www.sqlfiddle.com/#!1/98515/7
另一种计算集合基数的方法,非常像英文(只是不要创建名称与其表名相同的列):http: //www.sqlfiddle.com/#!1/98515/7
select boss.boss_name, count(subordinate)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name
You cannot do this: http://www.sqlfiddle.com/#!1/98515/8
你不能这样做:http: //www.sqlfiddle.com/#!1/98515/8
select boss.boss_name, count(subordinate.1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name
You can do this, but this produces wrong result: http://www.sqlfiddle.com/#!1/98515/9
你可以这样做,但这会产生错误的结果:http: //www.sqlfiddle.com/#!1/98515/9
select boss.boss_name, count(1)
from boss
left join subordinate on subordinate.boss_code = boss.boss_code
group by boss.boss_name
回答by Jonathan Leffler
Two of them always produce the same answer:
其中两个总是产生相同的答案:
COUNT(*)
counts the number of rowsCOUNT(1)
also counts the number of rows
COUNT(*)
计算行数COUNT(1)
还计算行数
Assuming the pk
is a primary key and that no nulls are allowed in the values, then
假设pk
是一个主键并且值中不允许有空值,那么
COUNT(pk)
also counts the number of rows
COUNT(pk)
还计算行数
However, if pk
is not constrained to be not null, then it produces a different answer:
但是,如果pk
不限制为非空,则会产生不同的答案:
COUNT(possibly_null)
counts the number of rows with non-null values in the columnpossibly_null
.COUNT(DISTINCT pk)
also counts the number of rows (because a primary key does not allow duplicates).COUNT(DISTINCT possibly_null_or_dup)
counts the number of distinct non-null values in the columnpossibly_null_or_dup
.COUNT(DISTINCT possibly_duplicated)
counts the number of distinct (necessarily non-null) values in the columnpossibly_duplicated
when that has theNOT NULL
clause on it.
COUNT(possibly_null)
计算列中具有非空值的行数possibly_null
。COUNT(DISTINCT pk)
还计算行数(因为主键不允许重复)。COUNT(DISTINCT possibly_null_or_dup)
计算列中不同的非空值的数量possibly_null_or_dup
。COUNT(DISTINCT possibly_duplicated)
possibly_duplicated
当列上有NOT NULL
子句时,计算列中不同(必须是非空)值的数量。
Normally, I write COUNT(*)
; it is the original recommended notation for SQL. Similarly, with the EXISTS
clause, I normally write WHERE EXISTS(SELECT * FROM ...)
because that was the original recommend notation. There should be no benefit to the alternatives; the optimizer should see through the more obscure notations.
通常,我写COUNT(*)
; 它是 SQL 的原始推荐表示法。同样,对于EXISTS
子句,我通常会这样写,WHERE EXISTS(SELECT * FROM ...)
因为那是最初的推荐符号。替代品应该没有任何好处;优化器应该看穿更晦涩的符号。
回答by Jarod Elliott
This will depend on the type of database you are using as well as the type of table in some cases.
这将取决于您使用的数据库类型以及某些情况下的表类型。
For example, using MySQL, count(*)
will be fast under a MyISAM table but slow under an InnoDB. Under InnoDB you should use count(1)
or count(pk)
.
例如,使用 MySQL,count(*)
在 MyISAM 表下会很快,但在 InnoDB 下会很慢。在 InnoDB 下,您应该使用count(1)
或count(pk)
。
回答by gbn
Books on linesays "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )
"
网上书上说“ COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )
”
"1" is a non-null expression so it's the same as COUNT(*)
.
The optimiser recognises it as trivialso gives the same plan. A PK is unique and non-null (in SQL Server at least) so COUNT(PK)
= COUNT(*)
"1" 是一个非空表达式,所以它与COUNT(*)
. 优化器认为它是微不足道的,因此给出了相同的计划。PK 是唯一且非空的(至少在 SQL Server 中)所以COUNT(PK)
= COUNT(*)
This is a similar myth to EXISTS (SELECT * ...
or EXISTS (SELECT 1 ...
这是一个类似的神话EXISTS (SELECT * ...
或EXISTS (SELECT 1 ...
And see the ANSI 92 spec, section 6.5, General Rules, case 1
并参见ANSI 92 规范,第 6.5 节,一般规则,案例 1
a) If COUNT(*) is specified, then the result is the cardinality
of T.
b) Otherwise, let TX be the single-column table that is the
result of applying the <value expression> to each row of T
and eliminating null values. If one or more null values are
eliminated, then a completion condition is raised: warning-
null value eliminated in set function.
回答by ZeissS
At least on Oracle they are all the same: http://www.oracledba.co.uk/tips/count_speed.htm
至少在 Oracle 上它们都是一样的:http: //www.oracledba.co.uk/tips/count_speed.htm
回答by arunmur
I feel the performance characteristics change from one DBMS to another. It's all on how they choose to implement it. Since I have worked extensively on Oracle, I'll tell from that perspective.
我觉得性能特征从一个 DBMS 变化到另一个。这完全取决于他们选择如何实施它。由于我在 Oracle 方面做了大量工作,所以我将从这个角度来讲述。
COUNT(*)
- Fetches entire row into result set before passing on to the count function, count function will aggregate 1 if the row is not null
COUNT(*)
- 在传递给 count 函数之前将整行提取到结果集中,如果该行不为 null,count 函数将聚合 1
COUNT(1)
- Will not fetch any row, instead count is called with a constant value of 1 for each row in the table when the WHERE
matches.
COUNT(1)
- 不会获取任何行,而是在WHERE
匹配时为表中的每一行调用 count 的常量值 1 。
COUNT(PK)
- The PK in Oracle is indexed. This means Oracle has to read only the index. Normally one row in the index B+ tree is many times smaller than the actual row. So considering the disk IOPS rate, Oracle can fetch many times more rows from Index with a single block transfer as compared to entire row. This leads to higher throughput of the query.
COUNT(PK)
- Oracle 中的 PK 已编入索引。这意味着 Oracle 必须只读取索引。通常索引 B+ 树中的一行比实际行小很多倍。因此,考虑到磁盘 IOPS 率,与整行相比,Oracle 可以通过单个块传输从索引中获取多倍的行。这会导致更高的查询吞吐量。
From this you can see the first count is the slowest and the last count is the fastest in Oracle.
从这里你可以看到在 Oracle 中,第一个计数是最慢的,最后一个计数是最快的。