postgresql 从每组的第一行和最后一行获取值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25170215/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:34:25  来源:igfitidea点击:

Get values from first and last row per group

sqlpostgresqlgroup-byaggregategreatest-n-per-group

提问by user3915795

I'm new to Postgres, coming from MySQL and hoping that one of y'all would be able to help me out.

我是 Postgres 的新手,来自 MySQL,希望你们中的一个人能够帮助我。

I have a table with three columns: name, week, and value. This table has a record of the names, the week at which they recorded the height, and the value of their height. Something like this:

我有三列的表:nameweek,和value。这张表记录了姓名、他们记录身高的周数以及他们的身高值。像这样的东西:

Name  |  Week  | Value
------+--------+-------
John  |  1     | 9
Cassie|  2     | 5
Luke  |  6     | 3
John  |  8     | 14
Cassie|  5     | 7
Luke  |  9     | 5
John  |  2     | 10
Cassie|  4     | 4
Luke  |  7     | 4

What I want is a list per user of the value at the minimum week and the max week. Something like this:

我想要的是每个用户在最小周和最大周的值列表。像这样的东西:

Name  |minWeek | Value |maxWeek | value
------+--------+-------+--------+-------
John  |  1     | 9     | 8      | 14
Cassie|  2     | 5     | 5      | 7
Luke  |  6     | 3     | 9      | 5

In Postgres, I use this query:

在 Postgres 中,我使用这个查询:

select name, week, value
from table t
inner join(
select name, min(week) as minweek
from table
group by name)
ss on t.name = ss.name and t.week = ss.minweek
group by t.name
;

However, I receive an error:

但是,我收到一个错误:

column "w.week" must appear in the GROUP BY clause or be used in an aggregate function
Position: 20

列“w.week”必须出现在 GROUP BY 子句中或用于聚合函数中
位置:20

This worked fine for me in MySQL so I'm wondering what I'm doing wrong here?

这在 MySQL 中对我来说很好用,所以我想知道我在这里做错了什么?

采纳答案by Gordon Linoff

This is a bit of a pain, because Postgres has the nice window functions first_value()and last_value(), but these are not aggregation functions. So, here is one way:

这有点麻烦,因为 Postgres 有很好的窗口函数first_value()last_value(),但这些不是聚合函数。所以,这是一种方法:

select t.name, min(t.week) as minWeek, max(firstvalue) as firstvalue,
       max(t.week) as maxWeek, max(lastvalue) as lastValue
from (select t.*, first_value(value) over (partition by name order by week) as firstvalue,
             last_value(value) over (partition by name order by week) as lastvalue
      from table t
     ) t
group by t.name;

回答by Erwin Brandstetter

There are various simpler and faster ways.

有各种更简单、更快捷的方法。

2x DISTINCT ON

2x DISTINCT ON

SELECT *
FROM  (
   SELECT DISTINCT ON (name)
          name, week AS first_week, value AS first_val
   FROM   tbl
   ORDER  BY name, week
   ) f
JOIN (
   SELECT DISTINCT ON (name)
          name, week AS last_week, value AS last_val
   FROM   tbl
   ORDER  BY name, week DESC
   ) l USING (name);

Or shorter:

或更短:

SELECT *
FROM  (SELECT DISTINCT ON (1) name, week AS first_week, value AS first_val
       FROM   tbl ORDER BY 1,2) f
JOIN  (SELECT DISTINCT ON (1) name, week AS last_week, value AS last_val
       FROM   tbl ORDER BY 1,2 DESC) l USING (name);

Simple and easy to understand. Also fastest in my tests. Detailed explanation for DISTINCT ON:

简单易懂。在我的测试中也是最快的。详细解释DISTINCT ON

first_value()of composite type

first_value()复合型

The aggregate functions min()or max()do not accept composite types as input. You would have to create custom aggregate functions (which is not that hard).
But the window functions first_value()and last_value()do. Building on that we can devise an very simple solutions:

聚合函数min()max()不接受复合类型作为输入。您将不得不创建自定义聚合函数(这并不难)。
但是窗口函数first_value()last_value()do。在此基础上,我们可以设计一个非常简单的解决方案:

Simple query

简单查询

SELECT DISTINCT ON (name)
       name, week AS first_week, value AS first_value
     ,(first_value((week, value)) OVER (PARTITION BY name
                                        ORDER BY week DESC))::text AS l
FROM   tbl t
ORDER  BY name, week;

The output has all data, but the values for the last week are stuffed into an anonymous record. You may need decomposed values.

输出包含所有数据,但上周的值被填充到匿名记录中。您可能需要分解的值。

Decomposed result with opportunistic use of table type

机会性使用表类型的分解结果

For that we need a well-known type that registers the types of contained elements with the system. An adapted table definition would allow for the opportunistic use of the table type itself directly:

为此,我们需要一个众所周知的类型来向系统注册所包含元素的类型。修改后的表定义将允许直接机会性地使用表类型本身:

CREATE TABLE tbl (week int, value int, name text) -- note optimized column order

weekand valuecome first.

weekvalue先来。

SELECT (l).name, first_week, first_val
     , (l).week AS last_week, (l).value AS last_val
FROM (
   SELECT DISTINCT ON (name)
          week AS first_week, value AS first_val
         ,first_value(t) OVER (PARTITION BY name ORDER BY week DESC) AS l
   FROM   tbl t
   ORDER  BY name, week
   ) sub;

Decomposed result from user-defined row type

用户自定义行类型的分解结果

However, that's probably not possible in most cases. Just use a user-defined type from CREATE TYPE(permanent) or from CREATE TEMP TABLE(for ad-hoc use):

但是,在大多数情况下这可能是不可能的。只需使用来自CREATE TYPE(永久)或来自CREATE TEMP TABLE(用于临时使用)的用户定义类型:

CREATE TEMP TABLE nv(last_week int, last_val int);  -- register composite type

SELECT name, first_week, first_val, (l).last_week, (l).last_val
FROM (
   SELECT DISTINCT ON (name)
          name, week AS first_week, value AS first_val
         ,first_value((week, value)::nv) OVER (PARTITION BY name
                                               ORDER BY week DESC) AS l
   FROM   tbl t
   ORDER  BY name, week
   ) sub;

In a local test on Postgres 9.3 with a similar table of 50k rows, eachof these queries was substantially faster than the currently accepted answer. Test with EXPLAIN ANALYZE.

在 Postgres 9.3 的本地测试中,使用类似的 50k 行表,这些查询中的每一个都比当前接受的答案快得多。用 测试EXPLAIN ANALYZE

SQL Fiddledisplaying all.

SQL Fiddle显示所有。