SQL 在 GROUP BY 和 COUNT 之后加入另一个表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3080024/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 06:37:09  来源:igfitidea点击:

JOIN on another table after GROUP BY and COUNT

sqlcountgroup-byleft-joinaggregate-functions

提问by Craig S

I'm trying to make sense of the right way to use JOIN, COUNT(*), and GROUP BYto do a pretty simple query. I've actually gotten it to work (see below) but from what I've read, I'm using an extra GROUP BYthat I shouldn't be.

我正在尝试理解使用JOIN,的正确方法COUNT(*),并GROUP BY进行一个非常简单的查询。我实际上已经让它工作了(见下文),但从我读到的内容来看,我使用了一个GROUP BY我不应该使用的额外工具。

(Note: The problem below isn't my actual problem (which deals with more complicated tables), but I've tried to come up with an analogous problem)

(注意:下面的问题不是我的实际问题(它处理更复杂的表格),但我试图提出一个类似的问题)

I have two tables:

我有两个表:

Table: Person
-------------
key  name     cityKey
1    Alice    1
2    Bob      2
3    Charles  2
4    David    1

Table: City
-------------
key  name
1    Albany
2    Berkeley
3    Chico

I'd like to do a query on the People (with some WHEREclause) that returns

我想对返回的 People(带有一些WHERE子句)进行查询

  • the number of matching people in each city
  • the key for the city
  • the name of the city.
  • 每个城市匹配的人数
  • 城市的钥匙
  • 城市的名字。

If I do

如果我做

SELECT COUNT(Person.key) AS count, City.key AS cityKey, City.name AS cityName
FROM Person 
LEFT JOIN City ON Person.cityKey = City.key 
GROUP BY Person.cityKey, City.name

I get the result that I want

我得到了我想要的结果

count   cityKey   cityName
2       1         Albany
2       2         Berkeley

However, I've readthat throwing in that last part of the GROUP BYclause (City.name) just to make it work is wrong.

然而,我已经读到,仅仅为了使它起作用而加入GROUP BY子句 ( City.name) 的最后一部分是错误的。

So what's the right way to do this? I've been trying to google for an answer, but I feel like there's something fundamental that I'm just not getting.

那么这样做的正确方法是什么?我一直在尝试用谷歌搜索答案,但我觉得有一些基本的东西我没有得到。

回答by Pointy

I don't think that it's "wrong" in this case, because you've got a one-to-one relationship between city name and city key. You could rewrite it such that you join to a sub-select to get the count of persons to cities by key, to the city table again for the name, but it's debatable that that'd be better. It's a matter of style and opinion I guess.

我不认为在这种情况下它是“错误的”,因为您在城市名称和城市密钥之间建立了一对一的关系。您可以重写它,以便您加入一个子选择以通过键获取城市的人数,再次到城市表以获取名称,但这是有争议的,这会更好。我猜这是一个风格和观点的问题。

select PC.ct, City.key, City.name
  from City
  join (select count(Person.key) ct, cityKey key from Person group by cityKey) PC
    on City.key = PC.key

if my SQL isn't too rusty :-)

如果我的 SQL 不是太生疏:-)

回答by OMG Ponies

...I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.

...我读过将 GROUP BY 子句 (City.name) 的最后一部分放入其中只是为了使其工作是错误的。

You misunderstand, you got it backwards.
Standard SQL requiresyou to specify in the GROUP BY all the columns mentioned in the SELECT that are not wrapped in aggregate functions. If you don't want certain columns in the GROUP BY, wrap them in aggregate functions. Depending on the database, you could use the analytic/windowing function OVER...

你误会了,你把它弄反了。
标准 SQL要求您在 GROUP BY 中指定 SELECT 中提到的所有未包装在聚合函数中的列。如果您不希望 GROUP BY 中的某些列,请将它们包装在聚合函数中。根据数据库,您可以使用分析/窗口功能OVER...

However, MySQL and SQLite provide the "feature" where you can omit these columns from the group by - which leads to no end of "why doesn't this port from MySQL to fill_in_the_blank database?!" Stackoverflow and numerous other sites & forums.

但是,MySQL 和 SQLite 提供了“功能”,您可以在其中从 group by 中省略这些列 - 这导致“为什么不将此端口从 MySQL 移植到 fill_in_the_blank 数据库?!” Stackoverflow 和许多其他网站和论坛。

回答by ahsteele

However, I've read that throwing in that last part of the GROUP BY clause (City.name) just to make it work is wrong.

但是,我读过将 GROUP BY 子句 (City.name) 的最后一部分放入其中只是为了使其工作是错误的。

It's not wrong. You have to understand how the Query Optimizer sees your query. The order in which it is parsed is what requires you to "throw the last part in." The optimizer sees your query in something akin to this order:

这没有错。您必须了解查询优化器如何看待您的查询。解析它的顺序是要求您“将最后一部分放入”。优化器以类似于此顺序的方式查看您的查询:

  • the required tables are joined
  • the composite dataset is filtered through the WHERE clause
  • the remaining rows are chopped into groups by the GROUP BY clause, and aggregated
  • they are then filtered again, through the HAVING clause
  • finally operated on, by SELECT / ORDER BY, UPDATE or DELETE.
  • 所需的表已连接
  • 复合数据集通过 WHERE 子句过滤
  • 剩余的行被 GROUP BY 子句分成组,并聚合
  • 然后通过 HAVING 子句再次过滤它们
  • 最后通过 SELECT / ORDER BY、UPDATE 或 DELETE 进行操作。

The point here is that it's not that the GROUP BY has to name all the columns in the SELECT, but in fact it is the opposite - the SELECT cannot include any columns not already in the GROUP BY.

这里的重点是,并不是 GROUP BY 必须命名 SELECT 中的所有列,但实际上恰恰相反 - SELECT 不能包含任何不在 GROUP BY 中的列。

回答by Andomar

Your query would only work on MySQL, because you group on Person.cityKeybut select city.key. All other databases would require you to use an aggregate like min(city.key), or to add City.keyto the group byclause.

您的查询仅适用于 MySQL,因为您分组Person.cityKey但选择city.key。所有其他数据库都要求您使用类似 的聚合min(city.key),或添加City.keygroup by子句中。

Because the combination of city name and city key is unique, the following are equivalent:

因为城市名称和城市键的组合是唯一的,所以以下是等效的:

select    count(person.key), min(city.key), min(city.name)
...
group by  person.citykey

Or:

或者:

select    count(person.key), city.key, city.name
...
group by  person.citykey, city.key, city.name

Or:

或者:

select    count(person.key), city.key, max(city.name)
...
group by  city.key

All rows in the group will have the same city name and key, so it doesn't matter if you use the maxor minaggregate.

组中的所有行都将具有相同的城市名称和键,因此使用maxmin聚合并不重要。

P.S. If you'd like to count only different persons, even if they have multiple rows, try:

PS如果您只想计算不同的人,即使他们有多行,请尝试:

count(DISTINCT person.key)

instead of

代替

count(person.key)