MySQL“分组依据”和“排序依据”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1066453/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:35:45  来源:igfitidea点击:

MySQL "Group By" and "Order By"

mysqlsqlgroup-bysql-order-byaggregate-functions

提问by John Kurlak

I want to be able to select a bunch of rows from a table of e-mails and group them by the from sender. My query looks like this:

我希望能够从电子邮件表中选择一堆行,并按发件人对它们进行分组。我的查询如下所示:

SELECT 
    `timestamp`, `fromEmail`, `subject`
FROM `incomingEmails` 
GROUP BY LOWER(`fromEmail`) 
ORDER BY `timestamp` DESC

The query almost works as I want it — it selects records grouped by e-mail. The problem is that the subject and timestamp don't correspond to the most recent record for a particular e-mail address.

该查询几乎可以按我的要求工作——它选择按电子邮件分组的记录。问题是主题和时间戳与特定电子邮件地址的最新记录不对应。

For example, it might return:

例如,它可能会返回:

fromEmail: [email protected], subject: hello
fromEmail: [email protected], subject: welcome

When the records in the database are:

当数据库中的记录为:

fromEmail: [email protected], subject: hello
fromEmail: [email protected], subject: programming question
fromEmail: [email protected], subject: welcome

If the "programming question" subject is the most recent, how can I get MySQL to select that record when grouping the e-mails?

如果“编程问题”主题是最新的,我如何在对电子邮件进行分组时让 MySQL 选择该记录?

回答by b7kich

A simple solution is to wrap the query into a subselect with the ORDER statement firstand applying the GROUP BY later:

一个简单的解决方案是将查询包裹与ORDER语句子选择第一和应用GROUP BY

SELECT * FROM ( 
    SELECT `timestamp`, `fromEmail`, `subject`
    FROM `incomingEmails` 
    ORDER BY `timestamp` DESC
) AS tmp_table GROUP BY LOWER(`fromEmail`)

This is similar to using the join but looks much nicer.

这类似于使用连接,但看起来更好。

Using non-aggregate columns in a SELECT with a GROUP BY clause is non-standard. MySQL will generally return the values of the first row it finds and discard the rest. Any ORDER BY clauses will only apply to the returned column value, not to the discarded ones.

在带有 GROUP BY 子句的 SELECT 中使用非聚合列是非标准的。MySQL 通常会返回它找到的第一行的值并丢弃其余的。任何 ORDER BY 子句仅适用于返回的列值,而不适用于丢弃的列值。

IMPORTANT UPDATESelecting non-aggregate columns used to work in practice but should not be relied upon. Per the MySQL documentation"this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any valuefrom each group, so unless they are the same, the values chosen are indeterminate."

重要更新选择用于在实践中工作但不应依赖的非聚合列。根据MySQL 文档,“这主要在未在 GROUP BY 中命名的每个非聚合列中的所有值对于每个组都相同时很有用。服务器可以自由地从每个组中选择任何值,因此除非它们相同,否则这些值选择是不确定的。”

As of 5.7.5ONLY_FULL_GROUP_BY is enabled by default so non-aggregate columns cause query errors (ER_WRONG_FIELD_WITH_GROUP)

5.7.5ONLY_FULL_GROUP_BY 默认启用,因此非聚合列会导致查询错误 (ER_WRONG_FIELD_WITH_GROUP)

As @mikep points out below the solution is to use ANY_VALUE()from 5.7 and above

正如@mikep 在下面指出的,解决方案是使用5.7 及更高版本的ANY_VALUE()

See http://www.cafewebmaster.com/mysql-order-sort-grouphttps://dev.mysql.com/doc/refman/5.6/en/group-by-handling.htmlhttps://dev.mysql.com/doc/refman/5.7/en/group-by-handling.htmlhttps://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_any-value

http://www.cafewebmaster.com/mysql-order-sort-group https://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html https://dev.mysql .com/doc/refman/5.7/en/group-by-handling.html https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_any-value

回答by Andomar

Here's one approach:

这是一种方法:

SELECT cur.textID, cur.fromEmail, cur.subject, 
     cur.timestamp, cur.read
FROM incomingEmails cur
LEFT JOIN incomingEmails next
    on cur.fromEmail = next.fromEmail
    and cur.timestamp < next.timestamp
WHERE next.timestamp is null
and cur.toUserID = '$userID' 
ORDER BY LOWER(cur.fromEmail)

Basically, you join the table on itself, searching for later rows. In the where clause you state that there cannot be later rows. This gives you only the latest row.

基本上,您加入表格本身,搜索后面的行。在 where 子句中,您声明不能有后面的行。这仅给您最新的行。

If there can be multiple emails with the same timestamp, this query would need refining. If there's an incremental ID column in the email table, change the JOIN like:

如果可以有多个具有相同时间戳的电子邮件,则此查询需要细化。如果 email 表中有一个增量 ID 列,请更改 JOIN 如下:

LEFT JOIN incomingEmails next
    on cur.fromEmail = next.fromEmail
    and cur.id < next.id

回答by Marcus

As pointed in a reply already, the current answer is wrong, because the GROUP BY arbitrarily selects the record from the window.

正如已经在回复中指出的那样,当前的答案是错误的,因为 GROUP BY 从窗口中任意选择记录。

If one is using MySQL 5.6, or MySQL 5.7 with ONLY_FULL_GROUP_BY, the correct (deterministic) query is:

如果使用 MySQL 5.6 或 MySQL 5.7 with ONLY_FULL_GROUP_BY,则正确的(确定性)查询是:

SELECT incomingEmails.*
  FROM (
    SELECT fromEmail, MAX(timestamp) `timestamp`
    FROM incomingEmails
    GROUP BY fromEmail
  ) filtered_incomingEmails
  JOIN incomingEmails USING (fromEmail, timestamp)
GROUP BY fromEmail, timestamp

In order for the query to run efficiently, proper indexing is required.

为了使查询有效运行,需要适当的索引。

Note that for simplification purposes, I've removed the LOWER(), which in most cases, won't be used.

请注意,为了简化起见,我删除了LOWER(),在大多数情况下不会使用。

回答by 11101101b

Do a GROUP BY after the ORDER BY by wrapping your query with the GROUP BY like this:

通过像这样用 GROUP BY 包装您的查询,在 ORDER BY 之后执行 GROUP BY:

SELECT t.* FROM (SELECT * FROM table ORDER BY time DESC) t GROUP BY t.from

回答by noonex

According to SQL standard you cannot use non-aggregate columns in select list. MySQL allows such usage (uless ONLY_FULL_GROUP_BY mode used) but result is not predictable.

根据 SQL 标准,您不能在选择列表中使用非聚合列。MySQL 允许这种用法(无用 ONLY_FULL_GROUP_BY 模式使用)但结果是不可预测的。

ONLY_FULL_GROUP_BY

ONLY_FULL_GROUP_BY

You should first select fromEmail, MIN(read), and then, with second query (or subquery) - Subject.

您应该首先选择 fromEmail、MIN(read),然后选择第二个查询(或子查询) - 主题。

回答by Mike N

I struggled with both these approaches for more complex queries than those shown, because the subquery approach was horribly ineficient no matter what indexes I put on, and because I couldn't get the outer self-join through Hibernate

对于比所示查询更复杂的查询,我在这两种方法中都遇到了困难,因为无论我使用什么索引,子查询方法都非常低效,并且因为我无法通过 Hibernate 获得外部自联接

The best (and easiest) way to do this is to group by something which is constructed to contain a concatenation of the fields you require and then to pull them out using expressions in the SELECT clause. If you need to do a MAX() make sure that the field you want to MAX() over is always at the most significant end of the concatenated entity.

执行此操作的最佳(也是最简单的)方法是按某些内容进行分组,该内容包含您需要的字段的串联,然后使用 SELECT 子句中的表达式将它们拉出。如果你需要做一个 MAX() 确保你想要 MAX() 的字段总是在连接实体的最重要的一端。

The key to understanding this is that the query can only make sense if these other fields are invariant for any entity which satisfies the Max(), so in terms of the sort the other pieces of the concatenation can be ignored. It explains how to do this at the very bottom of this link. http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html

理解这一点的关键是,只有当这些其他字段对于满足 Max() 的任何实体都是不变的时,查询才有意义,因此就排序而言,可以忽略串联的其他部分。它在此链接的最底部解释了如何执行此操作。http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html

If you can get am insert/update event (like a trigger) to pre-compute the concatenation of the fields you can index it and the query will be as fast as if the group by was over just the field you actually wanted to MAX(). You can even use it to get the maximum of multiple fields. I use it to do queries against multi-dimensional trees expresssed as nested sets.

如果您可以获得插入/更新事件(如触发器)来预先计算字段的连接,您可以对其进行索引,并且查询的速度将与 group by 只是您实际想要的字段一样快 MAX( )。您甚至可以使用它来获取多个字段的最大值。我用它来对表示为嵌套集的多维树进行查询。