SQL 在 GROUP BY 和 COUNT 之后加入另一个表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3080024/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
JOIN on another table after GROUP BY and COUNT
提问by Craig S
I'm trying to make sense of the right way to use JOIN
, COUNT(*)
, and GROUP BY
to do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BY
that I shouldn't be.
我正在尝试理解使用JOIN
,的正确方法COUNT(*)
,并GROUP BY
进行一个非常简单的查询。我实际上已经让它工作了(见下文),但从我读到的内容来看,我使用了一个GROUP BY
我不应该使用的额外工具。
(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)
(注意:下面的问题不是我的实际问题(它处理更复杂的表格),但我试图提出一个类似的问题)
I have two tables:
我有两个表:
Table: Person
-------------
key name cityKey
1 Alice 1
2 Bob 2
3 Charles 2
4 David 1
Table: City
-------------
key name
1 Albany
2 Berkeley
3 Chico
I'd like to do a query on the People (with some WHERE
clause) that returns
我想对返回的 People(带有一些WHERE
子句)进行查询
- the number of matching people in each city
- the key for the city
- the name of the city.
- 每个城市匹配的人数
- 城市的钥匙
- 城市的名字。
If I do
如果我做
SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person
LEFT JOIN City ON Person.cityKey = City.key
GROUP BY Person.cityKey, City.name
I get the result that I want
我得到了我想要的结果
count cityKey cityName
2 1 Albany
2 2 Berkeley
However, I've readthat throwing in that last part of the GROUP BY
clause (City.name
) just to make it work is wrong.
然而,我已经读到,仅仅为了使它起作用而加入GROUP BY
子句 ( City.name
) 的最后一部分是错误的。
So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.
那么这样做的正确方法是什么?我一直在尝试用谷歌搜索答案,但我觉得有一些基本的东西我没有得到。
回答by Pointy
I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.
我不认为在这种情况下它是“错误的”,因为您在城市名称和城市密钥之间建立了一对一的关系。您可以重写它,以便您加入一个子选择以通过键获取城市的人数,再次到城市表以获取名称,但这是有争议的,这会更好。我猜这是一个风格和观点的问题。
select PC.ct, City.key, City.name
from City
join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
on City.key = PC.key
if my SQL isn't too rusty :-)
如果我的 SQL 不是太生疏:-)
回答by OMG Ponies
...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
...我读过将 GROUP BY 子句 (City.name) 的最后一部分放入其中只是为了使其工作是错误的。
You misunderstand, you got it backwards.
Standard SQL requiresyou to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER
...
你误会了,你把它弄反了。
标准 SQL要求您在 GROUP BY 中指定 SELECT 中提到的所有未包装在聚合函数中的列。如果您不希望 GROUP BY 中的某些列,请将它们包装在聚合函数中。根据数据库,您可以使用分析/窗口功能OVER
...
However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.
但是,MySQL 和 SQLite 提供了“功能”,您可以在其中从 group by 中省略这些列 - 这导致“为什么不将此端口从 MySQL 移植到 fill_in_the_blank 数据库?!” Stackoverflow 和许多其他网站和论坛。
回答by ahsteele
However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.
但是,我读过将 GROUP BY 子句 (City.name) 的最后一部分放入其中只是为了使其工作是错误的。
It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:
这没有错。您必须了解查询优化器如何看待您的查询。解析它的顺序是要求您“将最后一部分放入”。优化器以类似于此顺序的方式查看您的查询:
- the required tables are joined
- the composite dataset is filtered through the WHERE clause
- the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
- they are then filtered again, through the HAVING clause
- finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
- 所需的表已连接
- 复合数据集通过 WHERE 子句过滤
- 剩余的行被 GROUP BY 子句分成组,并聚合
- 然后通过 HAVING 子句再次过滤它们
- 最后通过 SELECT / ORDER BY、UPDATE 或 DELETE 进行操作。
The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.
这里的重点是,并不是 GROUP BY 必须命名 SELECT 中的所有列,但实际上恰恰相反 - SELECT 不能包含任何不在 GROUP BY 中的列。
回答by Andomar
Your query would only work on MySQL, because you group on Person.cityKey
but select city.key
. All other databases would require you to use an aggregate like min(city.key)
, or to add City.key
to the group by
clause.
您的查询仅适用于 MySQL,因为您分组Person.cityKey
但选择city.key
。所有其他数据库都要求您使用类似 的聚合min(city.key)
,或添加City.key
到group by
子句中。
Because the combination of city name and city key is unique, the following are equivalent:
因为城市名称和城市键的组合是唯一的,所以以下是等效的:
select count(person.key), min(city.key), min(city.name)
...
group by person.citykey
Or:
或者:
select count(person.key), city.key, city.name
...
group by person.citykey, city.key, city.name
Or:
或者:
select count(person.key), city.key, max(city.name)
...
group by city.key
All rows in the group will have the same city name and key, so it doesn't matter if you use the max
or min
aggregate.
组中的所有行都将具有相同的城市名称和键,因此使用max
或min
聚合并不重要。
P.S. If you'd like to count only different persons, even if they have multiple rows, try:
PS如果您只想计算不同的人,即使他们有多行,请尝试:
count(DISTINCT person.key)
instead of
代替
count(person.key)