SQL PostgreSQL MAX 和 GROUP BY
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13325583/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL MAX and GROUP BY
提问by Project Dumbo Dev
I have a table with id
, year
and count
.
我有一张带有id
,year
和的表count
。
I want to get the MAX(count)
for each id
and keep the year
when it happens, so I make this query:
我想获取MAX(count)
for eachid
并保留year
它发生的时间,所以我进行了以下查询:
SELECT id, year, MAX(count)
FROM table
GROUP BY id;
Unfortunately, it gives me an error:
不幸的是,它给了我一个错误:
ERROR: column "table.year" must appear in the GROUP BY clause or be used in an aggregate function
错误:列“table.year”必须出现在 GROUP BY 子句中或用于聚合函数中
So I try:
所以我尝试:
SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;
But then, it doesn't do MAX(count)
, it just shows the table as it is. I suppose because when grouping by year
and id
, it gets the max for the id
of that specific year.
但是,它没有做MAX(count)
,它只是按原样显示表格。我想是因为当按year
and分组时id
,它会获得该id
特定年份的最大值。
So, how can I write that query? I want to get the id
′s MAX(count)
and the year when that happens.
那么,我该如何编写该查询呢?我想得到id
'sMAX(count)
和发生这种情况的年份。
回答by Erwin Brandstetter
The shortest (and possibly fastest) query would be with DISTINCT ON
, a PostgreSQL extension of the SQL standard DISTINCT
clause:
最短(也可能是最快)的查询是 with DISTINCT ON
,它是 SQL 标准DISTINCT
子句的 PostgreSQL 扩展:
SELECT DISTINCT ON (1)
id, count, year
FROM tbl
ORDER BY 1, 2 DESC, 3;
The numbers refer to ordinal positions in the SELECT
list. You can spell out column names for clarity:
数字是指SELECT
列表中的顺序位置。为清楚起见,您可以拼出列名称:
SELECT DISTINCT ON (id)
id, count, year
FROM tbl
ORDER BY id, count DESC, year;
The result is ordered by id
, which may or may not be welcome. It's better than "undefined" in any case.
结果由 排序id
,这可能受欢迎,也可能不受欢迎。在任何情况下,它都比“未定义”要好。
It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop year
from the ORDER BY
. Or pick the latest year with year DESC
.
它还以明确定义的方式打破平局(当多个年份共享相同的最大计数时):选择最早的年份。如果您不在乎,请year
从ORDER BY
. 或使用 选择最近的一年year DESC
。
More explanation, links, a benchmark and possibly faster solutions in this closely related answer:
在这个密切相关的答案中,有更多解释、链接、基准和可能更快的解决方案:
Aside: In a real life query, you wouldn't use some of the column names. id
is a non-descriptive anti-pattern for a column name, count
is a reserved word in standard SQLand an aggregate function in Postgres.
旁白:在现实生活中的查询中,您不会使用某些列名。id
是列名的非描述性反模式,count
是标准 SQL 中的保留字和 Postgres 中的聚合函数。
回答by a_horse_with_no_name
select *
from (
select id,
year,
thing,
max(thing) over (partition by id) as max_thing
from the_table
) t
where thing = max_thing
or:
或者:
select t1.id,
t1.year,
t1.thing
from the_table t1
where t1.thing = (select max(t2.thing)
from the_table t2
where t2.id = t1.id);
or
或者
select t1.id,
t1.year,
t1.thing
from the_table t1
join (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
) t on t.id = t1.id and t.max_thing = t1.thing
or (same as the previous with a different notation)
或(与前一个相同,但符号不同)
with max_stuff as (
select id, max(t2.thing) as max_thing
from the_table t2
group by id
)
select t1.id,
t1.year,
t1.thing
from the_table t1
join max_stuff t2
on t1.id = t2.id
and t1.thing = t2.max_thing