MySQL sql group by 与不同

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/426723/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:36:10  来源:igfitidea点击:

sql group by versus distinct

mysqlsql-serverperformancegroup-bydistinct

提问by mson

Why would someone use a group by versus distinct when there are no aggregations done in the query?

当查询中没有进行聚合时,为什么有人会使用 group by 和 distinct ?

Also, does someone know the group by versus distinct performance considerations in MySQL and SQL Server. I'm guessing that SQL Server has a better optimizer and they might be close to equivalent there, but in MySQL, I expect a significant performance advantage to distinct.

此外,是否有人知道 MySQL 和 SQL Server 中的分组依据与不同的性能考虑因素。我猜测 SQL Server 有一个更好的优化器,它们可能接近于等效的优化器,但在 MySQL 中,我预计会有显着的性能优势。

I'm interested in dba answers.

我对 dba 的答案很感兴趣。

EDIT:

编辑:

Bill's post is interesting, but not applicable. Let me be more specific...

比尔的帖子很有趣,但不适用。让我更具体...

select a, b, c 
from table x
group by a, b,c

versus

相对

select distinct a,b,c
from table x

采纳答案by Cowan

A little (VERY little) empirical data from MS SQL Server, on a couple of random tables from our DB.

来自 MS SQL Server 的一些(非常少)经验数据,来自我们数据库的几个随机表。

For the pattern:

对于图案:

SELECT col1, col2 FROM table GROUP BY col1, col2

and

SELECT DISTINCT col1, col2 FROM table 

When there's no covering index for the query, both ways produced the following query plan:

当查询没有覆盖索引时,两种方式都会生成以下查询计划:

|--Sort(DISTINCT ORDER BY:([table].[col1] ASC, [table].[col2] ASC))
   |--Clustered Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]))

and when there was a covering index, both produced:

当有覆盖索引时,两者都产生:

|--Stream Aggregate(GROUP BY:([table].[col1], [table].[col2]))
   |--Index Scan(OBJECT:([db].[dbo].[table].[IX_some_index]), ORDERED FORWARD)

so from that very small sample SQL Server certainly treats both the same.

因此,从那个非常小的示例中,SQL Server 肯定会一视同仁。

回答by Bill Karwin

GROUP BYmaps groups of rows to one row, per distinct value in specificcolumns, which don't even necessarily have to be in the select-list.

GROUP BY根据特定列中的每个不同值,将行组映射到一行,这些值甚至不必在选择列表中。

SELECT b, c, d FROM table1 GROUP BY a;

This query is legal SQL (correction:only in MySQL; actually it's not standard SQL and not supported by other brands). MySQL accepts it, and it trusts that you know what you're doing, selecting b, c, and din an unambiguous way because they're functional dependenciesof a.

这个查询是合法的SQL(更正:仅在MySQL中;实际上它不是标准SQL,其他品牌不支持)。MySQL的接受它,并相信,你知道你在做什么,选择bcd,因为他们是一个明确的方式函数依赖a

However, Microsoft SQL Server and other brands don't allow this query, because it can't determine the functional dependencies easily. edit:Instead, standard SQL requires you to follow the Single-Value Rule, i.e. every column in the select-list must either be named in the GROUP BYclause or else be an argument to a set function.

但是,Microsoft SQL Server 和其他品牌不允许此查询,因为它无法轻松确定功能依赖关系。 编辑:相反,标准 SQL 要求您遵循Single-Value Rule,即选择列表中的每一列都必须在GROUP BY子句中命名或作为 set 函数的参数。

Whereas DISTINCTalways looks at all columns in the select-list, and only those columns. It's a common misconception that DISTINCTallows you to specify the columns:

DISTINCT始终查看选择列表中的所有列,并且仅查看这些列。这是一个常见的误解,它DISTINCT允许您指定列:

SELECT DISTINCT(a), b, c FROM table1;

Despite the parentheses making DISTINCTlook like function call, it is not. It's a query option and a distinct value in any of the three fields of the select-list will lead to a distinct row in the query result. One of the expressions in this select-list has parentheses around it, but this won't affect the result.

尽管括号DISTINCT看起来像函数调用,但事实并非如此。它是一个查询选项,选择列表的三个字段中的任何一个中的不同值将导致查询结果中的不同行。此选择列表中的表达式之一带有括号,但这不会影响结果。

回答by karl

In MySQL I've found using a GROUP BY is often better in performance than DISTINCT.

在 MySQL 中,我发现使用 GROUP BY 通常比 DISTINCT 具有更好的性能。

Doing an "EXPLAIN SELECT DISTINCT" shows "Using where; Using temporary " MySQL will create a temporary table.

执行“EXPLAIN SELECT DISTINCT”显示“使用 where;使用临时” MySQL 将创建一个临时表。

vs a "EXPLAIN SELECT a,b, c from T1, T2 where T2.A=T1.A GROUP BY a" just shows "Using where"

vs a "EXPLAIN SELECT a,b, c from T1, T2 where T2.A=T1.A GROUP BY a" 只显示 "Using where"

回答by Andre Gallo

Both would generate the same query plan in MS SQL Server.... If you have MS SQL Server you could just enable the actual execution plan to see which one is better for your needs ...

两者都会在 MS SQL Server 中生成相同的查询计划......如果你有 MS SQL Server,你可以启用实际的执行计划来查看哪个更适合你的需求......

Please have a look at those posts:

请看看这些帖子:

http://blog.sqlauthority.com/2007/03/29/sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by/

http://blog.sqlauthority.com/2007/03/29/sql-server-difference-between-distinct-and-group-by-distinct-vs-group-by/

http://www.sqlmag.com/Article/ArticleID/24282/sql_server_24282.html

http://www.sqlmag.com/Article/ArticleID/24282/sql_server_24282.html

回答by Booji Boy

If you really are looking for distinct values, the distinct makes the source code more readable (like if it's part of a stored procedure) If I'm writing ad-hoc queries I'll usually start with the group by, even if I have no aggregations because I'll often end up putting them on.

如果你真的在寻找不同的值,distinct 会使源代码更具可读性(就像它是存储过程的一部分)如果我正在编写临时查询,我通常会从 group by 开始,即使我有没有聚合,因为我经常会穿上它们。