SQL PostgreSQL MAX 和 GROUP BY

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13325583/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 12:03:26  来源:igfitidea点击:

PostgreSQL MAX and GROUP BY

sqlpostgresqlgroup-bymaxgreatest-n-per-group

提问by Project Dumbo Dev

I have a table with id, yearand count.

我有一张带有id,year和的表count

I want to get the MAX(count)for each idand keep the yearwhen it happens, so I make this query:

我想获取MAX(count)for eachid并保留year它发生的时间,所以我进行了以下查询:

SELECT id, year, MAX(count)
FROM table
GROUP BY id;

Unfortunately, it gives me an error:

不幸的是,它给了我一个错误:

ERROR: column "table.year" must appear in the GROUP BY clause or be used in an aggregate function

错误:列“table.year”必须出现在 GROUP BY 子句中或用于聚合函数中

So I try:

所以我尝试:

SELECT id, year, MAX(count)
FROM table
GROUP BY id, year;

But then, it doesn't do MAX(count), it just shows the table as it is. I suppose because when grouping by yearand id, it gets the max for the idof that specific year.

但是,它没有做MAX(count),它只是按原样显示表格。我想是因为当按yearand分组时id,它会获得该id特定年份的最大值。

So, how can I write that query? I want to get the id′s MAX(count)and the year when that happens.

那么,我该如何编写该查询呢?我想得到id'sMAX(count)和发生这种情况的年份。

回答by Erwin Brandstetter

The shortest (and possibly fastest) query would be with DISTINCT ON, a PostgreSQL extension of the SQL standard DISTINCTclause:

最短(也可能是最快)的查询是 with DISTINCT ON,它是 SQL 标准DISTINCT子句的 PostgreSQL 扩展:

SELECT DISTINCT ON (1)
       id, count, year
FROM   tbl
ORDER  BY 1, 2 DESC, 3;

The numbers refer to ordinal positions in the SELECTlist. You can spell out column names for clarity:

数字是指SELECT列表中的顺序位置。为清楚起见,您可以拼出列名称:

SELECT DISTINCT ON (id)
       id, count, year
FROM   tbl
ORDER  BY id, count DESC, year;

The result is ordered by id, which may or may not be welcome. It's better than "undefined" in any case.

结果由 排序id,这可能受欢迎,也可能不受欢迎。在任何情况下,它都比“未定义”要好。

It also breaks ties (when multiple years share the same maximum count) in a well defined way: pick the earliest year. If you don't care, drop yearfrom the ORDER BY. Or pick the latest year with year DESC.

它还以明确定义的方式打破平局(当多个年份共享相同的最大计数时):选择最早的年份。如果您不在乎,请yearORDER BY. 或使用 选择最近的一年year DESC

More explanation, links, a benchmark and possibly faster solutions in this closely related answer:

在这个密切相关的答案中,有更多解释、链接、基准和可能更快的解决方案:

Aside: In a real life query, you wouldn't use some of the column names. idis a non-descriptive anti-pattern for a column name, countis a reserved word in standard SQLand an aggregate function in Postgres.

旁白:在现实生活中的查询中,您不会使用某些列名。id是列名的非描述性反模式,count标准 SQL 中保留字和 Postgres 中的聚合函数。

回答by a_horse_with_no_name

select *
from (
  select id, 
         year,
         thing,
         max(thing) over (partition by id) as max_thing
  from the_table
) t
where thing = max_thing

or:

或者:

select t1.id,
       t1.year,
       t1.thing
from the_table t1
where t1.thing = (select max(t2.thing) 
                  from the_table t2
                  where t2.id = t1.id);

or

或者

select t1.id,
       t1.year,
       t1.thing
from the_table t1
  join ( 
    select id, max(t2.thing) as max_thing
    from the_table t2
    group by id
  ) t on t.id = t1.id and t.max_thing = t1.thing

or (same as the previous with a different notation)

或(与前一个相同,但符号不同)

with max_stuff as (
  select id, max(t2.thing) as max_thing
  from the_table t2
  group by id
) 
select t1.id, 
       t1.year,
       t1.thing
from the_table t1
  join max_stuff t2 
    on t1.id = t2.id 
   and t1.thing = t2.max_thing