SQL PostgreSQL MAX 和 GROUP BY
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13325583/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL MAX and GROUP BY
提问by Project Dumbo Dev
I have a table with id, yearand count.
我有一张带有id,year和的表count。
I want to get the MAX(count)for each idand keep the yearwhen it happens, so I make this query:
我想获取MAX(count)for eachid并保留year它发生的时间,所以我进行了以下查询:
SELECT id, year, MAX(count)
FROM table
GROUP BY id;
Unfortunately, it gives me an error:
不幸的是,它给了我一个错误:
ERROR: column "table.year" must appear in the GROUP BY clause or be used in an aggregate function
错误:列“table.year”必须出现在 GROUP BY 子句中或用于聚合函数中
So I try:
所以我尝试:
SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;
But then, it doesn't do MAX(count), it just shows the table as it is. I suppose because when grouping by yearand id, it gets the max for the idof that specific year.
但是,它没有做MAX(count),它只是按原样显示表格。我想是因为当按yearand分组时id,它会获得该id特定年份的最大值。
So, how can I write that query? I want to get the id′s MAX(count)and the year when that happens.
那么,我该如何编写该查询呢?我想得到id'sMAX(count)和发生这种情况的年份。
回答by Erwin Brandstetter
The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCTclause:
最短(也可能是最快)的查询是 with DISTINCT ON,它是 SQL 标准DISTINCT子句的 PostgreSQL 扩展:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECTlist. You can spell out column names for clarity:
数字是指SELECT列表中的顺序位置。为清楚起见,您可以拼出列名称:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id, which may or may not be welcome. It's better than "undefined" in any case.
结果由 排序id,这可能受欢迎,也可能不受欢迎。在任何情况下,它都比“未定义”要好。
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop yearfrom the ORDER BY. Or pick the latest year with year DESC.
它还以明确定义的方式打破平局(当多个年份共享相同的最大计数时):选择最早的年份。如果您不在乎,请year从ORDER BY. 或使用 选择最近的一年year DESC。
More explanation, links, a benchmark and possibly faster solutions in this closely related answer:
在这个密切相关的答案中,有更多解释、链接、基准和可能更快的解决方案:
Aside: In a real life query, you wouldn't use some of the column names. idis a non-descriptive anti-pattern for a column name, countis a reserved word in standard SQLand an aggregate function in Postgres.
旁白:在现实生活中的查询中,您不会使用某些列名。id是列名的非描述性反模式,count是标准 SQL 中的保留字和 Postgres 中的聚合函数。
回答by a_horse_with_no_name
select *
from (
select id,
year,
thing,
max(thing) over (partition by id) as max_thing
from the_table
) t
where thing = max_thing
or:
或者:
select t1.id,
t1.year,
t1.thing
from the_table t1
where t1.thing = (select max(t2.thing)
from the_table t2
where t2.id = t1.id);
or
或者
select t1.id,
t1.year,
t1.thing
from the_table t1
join (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
) t on t.id = t1.id and t.max_thing = t1.thing
or (same as the previous with a different notation)
或(与前一个相同,但符号不同)
with max_stuff as (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
)
select t1.id,
t1.year,
t1.thing
from the_table t1
join max_stuff t2
on t1.id = t2.id
and t1.thing = t2.max_thing

