在 MySQL 中,SELECT DISTINCT 或 GROUP BY 哪个更快?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/581521/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's faster, SELECT DISTINCT or GROUP BY in MySQL?
提问by vava
If I have a table
如果我有一张桌子
CREATE TABLE users (
id int(10) unsigned NOT NULL auto_increment,
name varchar(255) NOT NULL,
profession varchar(255) NOT NULL,
employer varchar(255) NOT NULL,
PRIMARY KEY (id)
)
and I want to get all unique values of profession
field, what would be faster (or recommended):
我想获得profession
字段的所有唯一值,什么会更快(或推荐):
SELECT DISTINCT u.profession FROM users u
or
或者
SELECT u.profession FROM users u GROUP BY u.profession
?
?
采纳答案by SquareCog
They are essentially equivalent to each other (in fact this is how some databases implement DISTINCT
under the hood).
它们本质上是等价的(实际上这是一些数据库DISTINCT
在幕后实现的方式)。
If one of them is faster, it's going to be DISTINCT
. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BY
is not taking advantage of any group members, just their keys. DISTINCT
makes this explicit, so you can get away with a slightly dumber optimizer.
如果其中之一更快,它将是DISTINCT
. 这是因为,尽管两者相同,但查询优化器必须发现您GROUP BY
没有利用任何组成员,而只是利用他们的键这一事实。DISTINCT
使这一点明确,因此您可以使用稍微笨拙的优化器。
When in doubt, test!
如有疑问,请测试!
回答by Quassnoi
If you have an index on profession
, these two are synonyms.
如果您在 上有索引profession
,则这两个是同义词。
If you don't, then use DISTINCT
.
如果没有,请使用DISTINCT
.
GROUP BY
in MySQL
sorts results. You can even do:
GROUP BY
在MySQL
排序结果中。你甚至可以这样做:
SELECT u.profession FROM users u GROUP BY u.profession DESC
and get your professions sorted in DESC
order.
并按DESC
顺序排列您的职业。
DISTINCT
creates a temporary table and uses it for storing duplicates. GROUP BY
does the same, but sortes the distinct results afterwards.
DISTINCT
创建一个临时表并将其用于存储重复项。GROUP BY
做同样的事情,但之后对不同的结果进行排序。
So
所以
SELECT DISTINCT u.profession FROM users u
is faster, if you don't have an index on profession
.
更快,如果你没有索引profession
。
回答by daniel.gindi
All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column. Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...
对于单列上的 DISTINCT 与单列上的 GROUP BY 的情况,上述所有答案都是正确的。每个数据库引擎都有自己的实现和优化,如果您关心很小的差异(在大多数情况下),那么您必须针对特定服务器和特定版本进行测试!由于实现可能会改变......
BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.
但是,如果您在查询中选择了不止一列,则 DISTINCT 本质上是不同的!因为在这种情况下,它将比较所有行的所有列,而不仅仅是一列。
So if you have something like:
所以如果你有类似的东西:
// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins
// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id
It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.
认为 DISTINCT 关键字通过您指定的第一列区分行是一个常见的错误,但 DISTINCT 是这种方式的通用关键字。
So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!
因此,您必须小心不要将上述答案视为适用于所有情况的正确答案......您可能会感到困惑并得到错误的结果,而您想要的只是优化!
回答by Tim
Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!
如果可以,请选择最简单和最短的 - DISTINCT 似乎更符合您的要求,因为它只会为您提供您需要的确切答案!
回答by Ranjith
Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null..
Group by 比 Distinct 开销大,因为 Group by 对结果进行排序,而 distinct 避免它。但是,如果您想让 group by 产生与 distinct give order by null相同的结果..
SELECT DISTINCT u.profession FROM users u
is equal to
等于
SELECT u.profession FROM users u GROUP BY u.profession order by null
回答by OptilabWorker
well distinct can be slower than group by on some occasions in postgres (dont know about other dbs).
在 postgres 的某些情况下,well distinct 可能比 group by 慢(不知道其他 dbs)。
tested example:
测试示例:
postgres=# select count(*) from (select distinct i from g) a;
count
10001
(1 row)
Time: 1563,109 ms
postgres=# select count(*) from (select i from g group by i) a;
count
10001
(1 row)
Time: 594,481 ms
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I
http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I
so be careful ... :)
所以要小心... :)
回答by amartynov
It seems that the queries are not exactly the same. At least for MySQL.
似乎查询并不完全相同。至少对于 MySQL。
Compare:
相比:
- describe select distinct productname from northwind.products
- describe select productname from northwind.products group by productname
- 描述从 northwind.products 中选择不同的产品名称
- 按产品名称描述从 northwind.products 组中选择产品名称
The second query gives additionally "Using filesort" in Extra.
第二个查询在 Extra 中另外给出了“使用文件排序”。
回答by Carlos
In MySQL, "Group By
" uses an extra step: filesort
. I realize DISTINCT
is faster than GROUP BY
, and that was a surprise.
在MySQL 中,“ Group By
” 使用了一个额外的步骤:filesort
. 我意识到DISTINCT
比 快GROUP BY
,这是一个惊喜。
回答by Grumpy
After heavy testing we came to the conclusion that GROUP BY is faster
经过大量测试,我们得出的结论是 GROUP BY 更快
SELECT sql_no_cache
opnamegroep_intern
FROM telwerken
WHERE opnemergroep
IN (7,8,9,10,11,12,13) group by opnamegroep_intern
SELECT sql_no_cache opnamegroep_intern FROM telwerken
WHERE opnemergroep
IN (7,8,9,10,11,12,13) group by opnamegroep_intern
635 totaal 0.0944 seconds Weergave van records 0 - 29 ( 635 totaal, query duurde 0.0484 sec)
635 总计 0.0944 秒 Weergave van 记录 0 - 29(总计 635,查询 duurde 0.0484 秒)
SELECT sql_no_cache
distinct (opnamegroep_intern)
FROM telwerken
WHERE opnemergroep
IN (7,8,9,10,11,12,13)
SELECT sql_no_cache distinct (opnamegroep_intern) FROM telwerken
WHERE opnemergroep
IN (7,8,9,10,11,12,13)
635 totaal 0.2117 seconds ( almost 100% slower ) Weergave van records 0 - 29 ( 635 totaal, query duurde 0.3468 sec)
635 总计 0.2117 秒(几乎 100% 慢) Weergave van 记录 0 - 29(总计 635,查询 duurde 0.3468 秒)
回答by kolunar
Here is a simple approach which will print the 2 different elapsed time for each query.
这是一个简单的方法,它将为每个查询打印 2 个不同的经过时间。
DECLARE @t1 DATETIME;
DECLARE @t2 DATETIME;
SET @t1 = GETDATE();
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SET @t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, @t1, @t2) AS varchar);
SET @t1 = GETDATE();
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET @t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, @t1, @t2) AS varchar);
OR try SET STATISTICS TIME (Transact-SQL)
或尝试SET STATISTICS TIME (Transact-SQL)
SET STATISTICS TIME ON;
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET STATISTICS TIME OFF;
It simply displays the number of milliseconds required to parse, compile, and execute each statement as below:
它只是显示解析、编译和执行每个语句所需的毫秒数,如下所示:
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 2 ms.