MySQL - 选择不在 Group By 中的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1023347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL - Selecting a Column not in Group By
提问by colithium
I'm trying to add features to a preexisting application and I came across a MySQL view something like this:
我正在尝试向预先存在的应用程序添加功能,但我遇到了一个 MySQL 视图,如下所示:
SELECT
AVG(table_name.col1),
AVG(table_name.col2),
AVG(table_name.col3),
table_name.personID,
table_name.col4
FROM table_name
GROUP BY table_name.personID;
OK so there's a few aggregate functions. You can select personID because you're grouping by it. But it also is selecting a column that is not in an aggregate function and is not a part of the GROUP BY clause. How is this possible??? Does it just pick a random value because the values definitely aren't unique per group?
好的,所以有一些聚合函数。您可以选择 personID,因为您是按它分组的。但它也选择了不在聚合函数中且不属于 GROUP BY 子句的列。这怎么可能???它是否只是选择一个随机值,因为每个组的值肯定不是唯一的?
Where I come from (MSSQL Server), that's an error. Can someone explain this behavior to me and why it's allowed in MySQL?
我来自哪里(MSSQL Server),这是一个错误。有人可以向我解释这种行为以及为什么它在 MySQL 中是允许的吗?
采纳答案by Bill Karwin
It's true that this feature permits some ambiguous queries, and silently returns a result set with an arbitrary value picked from that column. In practice, it tends to be the value from the row within the group that is physically stored first.
确实,此功能允许一些不明确的查询,并以静默方式返回具有从该列中选取的任意值的结果集。在实践中,它往往是首先物理存储的组内行的值。
These queries aren't ambiguous if you only choose columns that are functionally dependent on the column(s) in the GROUP BY criteria. In other words, if there can be only one distinct value of the "ambiguous" column per value that defines the group, there's no problem. This query would be illegal in Microsoft SQL Server (and ANSI SQL), even though it cannot logically result in ambiguity:
如果您只选择在功能上依赖于 GROUP BY 条件中的列的列,则这些查询不会产生歧义。换句话说,如果定义组的每个值只能有一个“模糊”列的不同值,则没有问题。此查询在 Microsoft SQL Server(和 ANSI SQL)中是非法的,即使它在逻辑上不会导致歧义:
SELECT AVG(table1.col1), table1.personID, persons.col4
FROM table1 JOIN persons ON (table1.personID = persons.id)
GROUP BY table1.personID;
Also, MySQL has an SQL mode to make it behave per the standard: ONLY_FULL_GROUP_BY
此外,MySQL 有一个 SQL 模式,使其按照标准运行: ONLY_FULL_GROUP_BY
FWIW, SQLite also permits these ambiguous GROUP BY clauses, but it chooses the value from the lastrow in the group.?
FWIW,SQLite 也允许这些不明确的 GROUP BY 子句,但它选择组中最后一行的值。?
?At least in the version I tested. What it means to be arbitraryis that either MySQL or SQLite could change their implementation in the future, and have some different behavior. You should therefore not rely on the behavior staying they way it is currently in ambiguous cases like this. It's better to rewrite your queries to be deterministic and not ambiguous. That's why MySQL 5.7 now enables ONLY_FULL_GROUP_BY by default.
? 至少在我测试的版本中。任意性意味着MySQL 或 SQLite 将来可能会更改它们的实现,并具有一些不同的行为。因此,在这种模棱两可的情况下,您不应该依赖当前的行为。最好将您的查询重写为确定性的而不是模棱两可的。这就是 MySQL 5.7 现在默认启用 ONLY_FULL_GROUP_BY 的原因。
回答by colithium
I should have Googled for just a bit longer... It seems I found my answer.
我应该在谷歌上搜索更长的时间......似乎我找到了我的答案。
MySQL extends the use of GROUP BY so that you can use nonaggregated columns or calculations in the SELECT list that do not appear in the GROUP BY clause. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. For example, you do not need to group on customer.name in the following query
In standard SQL, you would have to add customer.name to the GROUP BY clause. In MySQL, the name is redundant.
MySQL 扩展了 GROUP BY 的使用,以便您可以在 SELECT 列表中使用未出现在 GROUP BY 子句中的非聚合列或计算。您可以使用此功能通过避免不必要的列排序和分组来获得更好的性能。例如,您不需要在以下查询中对 customer.name 进行分组
在标准 SQL 中,您必须将 customer.name 添加到 GROUP BY 子句。在 MySQL 中,名称是多余的。
Still, that just seems... wrong.
不过,这似乎……错了。
回答by Csongor Halmai
Let's say you have a query like this:
假设您有一个这样的查询:
SELECT g, v
FROM t
GROUP BY g;
In this case, for each possible value for g
, mysql picks one of the corresponding values of v
.
在这种情况下,对于 的每个可能值g
,mysql 会选择 的相应值之一v
。
However, which one is chosen, depends on some circumstances.
但是,选择哪一个取决于某些情况。
I read somewhere that for each group of g, the first value of v
is kept, in the order how the records were inserted into the table t
.
我在某处读到,对于每组 g,v
按记录插入 table 的顺序保留了第一个值t
。
This is quite ugly because the records in a table should be treated as a setwhere the order of the elements should not matter. This is so "mysql-ish"...
这是相当难看,因为在一个表中的记录应作为一个被视为集,其中元素的顺序并不重要。这太“mysql-ish”了……
If you want to determine which value for v
to keep, you need to apply a subselect for t
like this:
如果要确定v
要保留哪个值,则需要t
像这样应用子选择:
SELECT g, v
FROM (
SELECT *
FROM t
ORDER BY g, v DESC
) q
GROUP BY g;
This way you define which order the records of the subquery are processed by the external query, thus you can trust which value of v
it will pick for the individual values of g
.
通过这种方式,您可以定义外部查询处理子查询记录的顺序,因此您可以相信v
它将为 的各个值选择哪个值g
。
However, if you need some WHERE conditions then be very careful. If you add the WHERE condition to the subquery then it will keep the behaviour, it will always return the value you expect:
但是,如果您需要一些 WHERE 条件,则要非常小心。如果您将 WHERE 条件添加到子查询,那么它将保持行为,它将始终返回您期望的值:
SELECT g, v
FROM (
SELECT *
FROM t
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
GROUP BY g;
This is what you expect, the subselect filters and orders the table. It keeps the records where g
has the given value and the external query returns that g
and the first value for v
.
这正是您所期望的,子选择过滤器并对表进行排序。它保留g
具有给定值的记录,外部查询返回该g
值和 的第一个值v
。
However, if you add the same WHERE condition to the outer query then you get a non-deterministic result:
但是,如果向外部查询添加相同的 WHERE 条件,则会得到不确定的结果:
SELECT g, v
FROM (
SELECT *
FROM t
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g;
Surprisingly, you may get different values for v
when executing the same query again and again which is... strange. The expected behaviour is to get all the records in the appropriate order from the subquery, filtering them in the outer query and then picking the same as it picked in the previous example. But it does not.
令人惊讶的是,v
一次又一次地执行相同的查询时,您可能会得到不同的值,这很奇怪。预期的行为是从子查询中以适当的顺序获取所有记录,在外部查询中过滤它们,然后选择与前一个示例中相同的记录。但事实并非如此。
It picks a value for v
seemingly randomly. The same query returned different values for v
if I executed more (~20) times but the distribution was not uniform.
它选择一个v
看似随机的值。v
如果我执行更多(~20)次但分布不均匀,则相同的查询返回不同的值。
If instead of adding an outer WHERE, you specify a HAVING condition like this:
如果不是添加外部 WHERE,而是指定一个 HAVING 条件,如下所示:
SELECT g, v
FROM (
SELECT *
FROM t1
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g
HAVING g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9';
Then you get a consistent behaviour again.
然后你再次得到一致的行为。
CONCLUSION: I would suggest not to rely on this technique at all. If you really want/need to then avoid WHERE conditions in the outer query. Use it in the inner query if you can or a HAVING clause in the outer query.
结论:我建议根本不要依赖这种技术。如果您真的想要/需要避免外部查询中的 WHERE 条件。如果可以,请在内部查询中使用它,或者在外部查询中使用 HAVING 子句。
I tested it with this data:
我用以下数据对其进行了测试:
CREATE TABLE t1 (
v INT,
g VARCHAR(36)
);
INSERT INTO t1 VALUES (1, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
INSERT INTO t1 VALUES (2, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
in mysql 5.6.41.
在 mysql 5.6.41 中。
Maybe it is just a bug that gets/got fixed in newer versions, please give feedback if you have experience with newer versions.
也许这只是在新版本中得到/修复的错误,如果您对新版本有经验,请提供反馈。
回答by Salih Kiraz
select * from personel where p_id IN(select
min(dbo.personel.p_id)
FROM
personel
GROUP BY dbo.personel.p_adi)