PostgreSQL - “DISTINCT ON”和“GROUP BY”语法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18939240/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:07:04  来源:igfitidea点击:

PostgreSQL - "DISTINCT ON" and "GROUP BY" syntax

sqlpostgresql

提问by Jonathan Vanasco

I realized that a database query was returning unexpected results do to my improper use of "DISTINCT ON" and "GROUP BY"

我意识到数据库查询返回了意外的结果,因为我不正确地使用了“DISTINCT ON”和“GROUP BY”

I'm hoping someone can set me straight on this. The actual query is quite complex , so I'll dumb it down :

我希望有人可以让我直截了当。实际查询非常复杂,所以我将其简化:

I have a table/inner query that consists of an object_id and a timestamp:

我有一个由 object_id 和时间戳组成的表/内部查询:

CREATE TABLE test_select ( object_id INT , event_timestamp timestamp );
COPY test_select (object_id , event_timestamp) FROM stdin (DELIMITER '|');
1           | 2013-01-27 21:01:20
1           | 2012-06-28 14:36:26
1           | 2013-02-21 04:16:48
2           | 2012-06-27 19:53:05
2           | 2013-02-03 17:35:58
3           | 2012-06-14 20:17:00
3           | 2013-02-15 19:03:34
4           | 2012-06-13 13:59:47
4           | 2013-02-23 06:31:16
5           | 2012-07-03 01:45:56
5           | 2012-06-11 21:33:26
\.

I'm trying to select a distinct ID , ordered/deduplicated by the timestamp on reverse chron

我正在尝试选择一个不同的 ID ,按反向 chron 的时间戳排序/重复数据删除

so the results should be [ 4, 1, 3, 2, 5 ]

所以结果应该是 [ 4, 1, 3, 2, 5 ]

I think this does what I need (it seems to ):

我认为这可以满足我的需要(似乎是):

SELECT object_id  
FROM test_select 
GROUP BY object_id 
ORDER BY max(event_timestamp) DESC
;

For testing/auditing purposes , I sometimes want to include the timestamp field. I can't seem to figure out how to include another field with that query.

出于测试/审计目的,我有时想包括时间戳字段。我似乎无法弄清楚如何在该查询中包含另一个字段。

Can anyone point out glaring problems in my sql above, or suggestions on how to include the auditing info ?

任何人都可以指出我上面的 sql 中的明显问题,或者关于如何包含审计信息的建议?

回答by ypercube??

To be able to select all columns and not only object_idand MAX(event_timestamp), you can use DISTINCT ON

为了能够选择所有列,而不仅仅是object_idand MAX(event_timestamp),您可以使用DISTINCT ON

SELECT DISTINCT ON (object_id) 
    object_id, event_timestamp    ---, more columns
FROM test_select 
ORDER BY object_id, event_timestamp DESC ;

If you want the results ordered by event_timestamp DESCand not by object_id, you need to include it in a derived table or a CTE:

如果您希望结果排序为 byevent_timestamp DESC而不是 by object_id,则需要将其包含在派生表或 CTE 中:

SELECT *
FROM 
  ( SELECT DISTINCT ON (object_id) 
        object_id, event_timestamp    ---, more columns
    FROM test_select 
    ORDER BY object_id, event_timestamp DESC 
  ) AS t
ORDER BY event_timestamp DESC ;

Alternatively, you can use window functions, like ROW_NUMBER():

或者,您可以使用窗口函数,例如ROW_NUMBER()

WITH cte AS
  ( SELECT ROW_NUMBER() OVER (PARTITION BY object_id 
                              ORDER BY event_timestamp DESC) 
             AS rn, 
           object_id, event_timestamp    ---, more columns
    FROM test_select 
  )
SELECT object_id, event_timestamp    ---, more columns
FROM cte
WHERE rn = 1
ORDER BY event_timestamp DESC ;

or aggregate MAX()with OVER:

或聚集MAX()OVER

WITH cte AS
  ( SELECT MAX(event_timestamp) OVER (PARTITION BY object_id) 
             AS max_event_timestamp, 
           object_id, event_timestamp    ---, more columns
    FROM test_select 
  )
SELECT object_id, event_timestamp    ---, more columns
FROM cte
WHERE event_timestamp = max_event_timestamp
ORDER BY event_timestamp DESC ;

回答by zero323

It is probably not the best way of dealing with this but you can try using window function:

这可能不是处理此问题的最佳方法,但您可以尝试使用窗口函数:

SELECT DISTINCT object_id, MAX(event_timestamp)
OVER (PARTITION BY object_id)  
FROM test_select ORDER BY max DESC;

From the other hand it works as well:

另一方面,它也有效:

SELECT object_id, MAX(event_timestamp) as max_event_timestamp
FROM test_select 
GROUP BY object_id 
ORDER BY max_event_timestamp DESC;