在 MySQL 中，SELECT DISTINCT 或 GROUP BY 哪个更快？

Question

提问by vava

If I have a table

如果我有一张桌子

CREATE TABLE users (
  id int(10) unsigned NOT NULL auto_increment,
  name varchar(255) NOT NULL,
  profession varchar(255) NOT NULL,
  employer varchar(255) NOT NULL,
  PRIMARY KEY  (id)
)

and I want to get all unique values of professionfield, what would be faster (or recommended):

我想获得profession字段的所有唯一值，什么会更快（或推荐）：

SELECT DISTINCT u.profession FROM users u

or

或者

SELECT u.profession FROM users u GROUP BY u.profession

?

Answer 1

采纳答案by SquareCog

They are essentially equivalent to each other (in fact this is how some databases implement DISTINCTunder the hood).

它们本质上是等价的（实际上这是一些数据库DISTINCT在幕后实现的方式）。

If one of them is faster, it's going to be DISTINCT. This is because, although the two are the same, a query optimizer would have to catch the fact that your GROUP BYis not taking advantage of any group members, just their keys. DISTINCTmakes this explicit, so you can get away with a slightly dumber optimizer.

如果其中之一更快，它将是DISTINCT. 这是因为，尽管两者相同，但查询优化器必须发现您GROUP BY没有利用任何组成员，而只是利用他们的键这一事实。DISTINCT使这一点明确，因此您可以使用稍微笨拙的优化器。

When in doubt, test!

如有疑问，请测试！

Answer 2

回答by Quassnoi

If you have an index on profession, these two are synonyms.

如果您在上有索引profession，则这两个是同义词。

If you don't, then use DISTINCT.

如果没有，请使用DISTINCT.

GROUP BYin MySQLsorts results. You can even do:

GROUP BY在MySQL排序结果中。你甚至可以这样做：

SELECT u.profession FROM users u GROUP BY u.profession DESC

and get your professions sorted in DESCorder.

并按DESC顺序排列您的职业。

DISTINCTcreates a temporary table and uses it for storing duplicates. GROUP BYdoes the same, but sortes the distinct results afterwards.

DISTINCT创建一个临时表并将其用于存储重复项。GROUP BY做同样的事情，但之后对不同的结果进行排序。

So

所以

SELECT DISTINCT u.profession FROM users u

is faster, if you don't have an index on profession.

更快，如果你没有索引profession。

Answer 3

回答by daniel.gindi

All of the answers above are correct, for the case of DISTINCT on a single column vs GROUP BY on a single column. Every db engine has its own implementation and optimizations, and if you care about the very little difference (in most cases) then you have to test against specific server AND specific version! As implementations may change...

对于单列上的 DISTINCT 与单列上的 GROUP BY 的情况，上述所有答案都是正确的。每个数据库引擎都有自己的实现和优化，如果您关心很小的差异（在大多数情况下），那么您必须针对特定服务器和特定版本进行测试！由于实现可能会改变......

BUT, if you select more than one column in the query, then the DISTINCT is essentially different! Because in this case it will compare ALL columns of all rows, instead of just one column.

但是，如果您在查询中选择了不止一列，则 DISTINCT 本质上是不同的！因为在这种情况下，它将比较所有行的所有列，而不仅仅是一列。

So if you have something like:

所以如果你有类似的东西：

// This will NOT return unique by [id], but unique by (id,name)
SELECT DISTINCT id, name FROM some_query_with_joins

// This will select unique by [id].
SELECT id, name FROM some_query_with_joins GROUP BY id

It is a common mistake to think that DISTINCT keyword distinguishes rows by the first column you specified, but the DISTINCT is a general keyword in this manner.

认为 DISTINCT 关键字通过您指定的第一列区分行是一个常见的错误，但 DISTINCT 是这种方式的通用关键字。

So people you have to be careful not to take the answers above as correct for all cases... You might get confused and get the wrong results while all you wanted was to optimize!

因此，您必须小心不要将上述答案视为适用于所有情况的正确答案......您可能会感到困惑并得到错误的结果，而您想要的只是优化！

Answer 4

回答by Tim

Go for the simplest and shortest if you can -- DISTINCT seems to be more what you are looking for only because it will give you EXACTLY the answer you need and only that!

如果可以，请选择最简单和最短的 - DISTINCT 似乎更符合您的要求，因为它只会为您提供您需要的确切答案！

Answer 5

回答by Ranjith

Group by is expensive than Distinct since Group by does a sort on the result while distinct avoids it. But if you want to make group by yield the same result as distinct give order by null..

Group by 比 Distinct 开销大，因为 Group by 对结果进行排序，而 distinct 避免它。但是，如果您想让 group by 产生与 distinct give order by null相同的结果..

SELECT DISTINCT u.profession FROM users u

is equal to

等于

SELECT u.profession FROM users u GROUP BY u.profession order by null

Answer 6

回答by OptilabWorker

well distinct can be slower than group by on some occasions in postgres (dont know about other dbs).

在 postgres 的某些情况下，well distinct 可能比 group by 慢（不知道其他 dbs）。

tested example:

测试示例：

postgres=# select count(*) from (select distinct i from g) a;

count 

10001
(1 row)

Time: 1563,109 ms

postgres=# select count(*) from (select i from g group by i) a;

count
10001
(1 row)

Time: 594,481 ms

http://www.pgsql.cz/index.php/PostgreSQL_SQL_Tricks_I

so be careful ... :)

所以要小心... :)

Answer 7

回答by amartynov

It seems that the queries are not exactly the same. At least for MySQL.

似乎查询并不完全相同。至少对于 MySQL。

Compare:

相比：

describe select distinct productname from northwind.products
describe select productname from northwind.products group by productname

描述从 northwind.products 中选择不同的产品名称
按产品名称描述从 northwind.products 组中选择产品名称

The second query gives additionally "Using filesort" in Extra.

第二个查询在 Extra 中另外给出了“使用文件排序”。

Answer 8

回答by Carlos

In MySQL, "Group By" uses an extra step: filesort. I realize DISTINCTis faster than GROUP BY, and that was a surprise.

在MySQL 中，“ Group By” 使用了一个额外的步骤：filesort. 我意识到DISTINCT比快GROUP BY，这是一个惊喜。

Answer 9

回答by Grumpy

After heavy testing we came to the conclusion that GROUP BY is faster

经过大量测试，我们得出的结论是 GROUP BY 更快

SELECT sql_no_cache opnamegroep_intern FROM telwerkenWHERE opnemergroepIN (7,8,9,10,11,12,13) group by opnamegroep_intern

635 totaal 0.0944 seconds Weergave van records 0 - 29 ( 635 totaal, query duurde 0.0484 sec)

635 总计 0.0944 秒 Weergave van 记录 0 - 29（总计 635，查询 duurde 0.0484 秒）

SELECT sql_no_cache distinct (opnamegroep_intern) FROM telwerkenWHERE opnemergroepIN (7,8,9,10,11,12,13)

635 totaal 0.2117 seconds ( almost 100% slower ) Weergave van records 0 - 29 ( 635 totaal, query duurde 0.3468 sec)

635 总计 0.2117 秒（几乎 100% 慢） Weergave van 记录 0 - 29（总计 635，查询 duurde 0.3468 秒）

Answer 10

回答by kolunar

Here is a simple approach which will print the 2 different elapsed time for each query.

这是一个简单的方法，它将为每个查询打印 2 个不同的经过时间。

DECLARE @t1 DATETIME;
DECLARE @t2 DATETIME;

SET @t1 = GETDATE();
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SET @t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, @t1, @t2) AS varchar);

SET @t1 = GETDATE();
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET @t2 = GETDATE();
PRINT 'Elapsed time (ms): ' + CAST(DATEDIFF(millisecond, @t1, @t2) AS varchar);

OR try SET STATISTICS TIME (Transact-SQL)

或尝试SET STATISTICS TIME (Transact-SQL)

SET STATISTICS TIME ON;
SELECT DISTINCT u.profession FROM users u; --Query with DISTINCT
SELECT u.profession FROM users u GROUP BY u.profession; --Query with GROUP BY
SET STATISTICS TIME OFF;

It simply displays the number of milliseconds required to parse, compile, and execute each statement as below:

它只是显示解析、编译和执行每个语句所需的毫秒数，如下所示：

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 2 ms.

在 MySQL 中，SELECT DISTINCT 或 GROUP BY 哪个更快？

提问by vava

采纳答案by SquareCog

回答by Quassnoi

回答by daniel.gindi

回答by Tim

回答by Ranjith

回答by OptilabWorker

回答by amartynov

回答by Carlos

回答by Grumpy

回答by kolunar

相关推荐

最近更新

标签

在 MySQL 中，SELECT DISTINCT 或 GROUP BY 哪个更快？

提问by vava

采纳答案by SquareCog

回答by Quassnoi

回答by daniel.gindi

回答by Tim

回答by Ranjith

回答by OptilabWorker

回答by amartynov

回答by Carlos

回答by Grumpy

回答by kolunar

相关推荐

MySQL 拆分逗号分隔的字符串 --> FUNCTION db.CHARINDEX 不存在

如何操纵 MySQL 全文搜索相关性以使一个字段比另一个字段更“有价值”？

MySQL MySQL内部连接查询以获取其他表中不存在的记录

MySQL 如何在 UPDATE 语句中连接两个表？

相关推荐

最近更新

标签