postgresql Postgres 窗口函数和按异常分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8844903/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:33:05  来源:igfitidea点击:

Postgres window function and group by exception

sqlpostgresqlaggregate-functionswindow-functions

提问by Martin

I'm trying to put together a query that will retrieve the statistics of a user (profit/loss) as a cumulative result, over a period of time.

我正在尝试组合一个查询,该查询将在一段时间内检索用户的统计数据(利润/亏损)作为累积结果。

Here's the query I have so far:

这是我到目前为止的查询:

SELECT p.name, e.date, 
    sum(sp.payout) OVER (ORDER BY e.date)
    - sum(s.buyin) OVER (ORDER BY e.date) AS "Profit/Loss" 
FROM result r 
    JOIN game g ON r.game_id = g.game_id 
    JOIN event e ON g.event_id = e.event_id 
    JOIN structure s ON g.structure_id = s.structure_id 
    JOIN structure_payout sp ON g.structure_id = sp.structure_id
                            AND r.position = sp.position 
    JOIN player p ON r.player_id = p.player_id 
WHERE p.player_id = 17 
GROUP BY p.name, e.date, e.event_id, sp.payout, s.buyin
ORDER BY p.name, e.date ASC

The query will run. However, the result is slightly incorrect. The reason is that an eventcan have multiple games (with different sp.payouts). Therefore, the above comes out with multiple rows if a user has 2 results in an event with different payouts (i.e. there are 4 games per event, and a user gets £20 from one, and £40 from another).

查询将运行。但是,结果略有错误。原因是一个event可以有多个游戏(不同的sp.payouts)。因此,如果用户在具有不同支出的事件中有 2 个结果(即每个事件有 4 个游戏,并且用户从一个中获得 £20,从另一个中获得 £40),那么上面的结果会出现多行。

The obvious solution would be to amend the GROUP BYto:

显而易见的解决方案是将其修改GROUP BY为:

GROUP BY p.name, e.date, e.event_id

However, Postgres complains at this as it doesn't appear to be recognizing that sp.payoutand s.buyinare inside an aggregate function. I get the error:

但是,Postgres 对此表示不满,因为它似乎没有意识到这一点sp.payouts.buyin并且在聚合函数中。我收到错误:

column "sp.payout" must appear in the GROUP BY clause or be used in an aggregate function

列“sp.payout”必须出现在 GROUP BY 子句中或用于聚合函数中

I'm running 9.1 on Ubuntu Linux server.
Am I missing something, or could this be a genuine defect in Postgres?

我在 Ubuntu Linux 服务器上运行 9.1。
我是否遗漏了什么,或者这可能是 Postgres 的真正缺陷?

回答by Erwin Brandstetter

You are not, in fact, using aggregate functions. You are using window functions. That's why PostgreSQL demands sp.payoutand s.buyinto be included in the GROUP BYclause.

事实上,您没有使用聚合函数。您正在使用窗口函数。这就是 PostgreSQL 要求sp.payouts.buyin包含在GROUP BY子句中的原因。

By appending an OVERclause, the aggregate function sum()is turned into a window function, which aggregates values per partition while keepingall rows.

通过附加OVER子句,聚合函数sum()变成了窗口函数,它在保留所有行的同时聚合每个分区的值。

You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. NowI can remove sp.payoutand s.buyinfrom the GROUP BYclause and get one row per playerand event:

您可以组合窗口函数和聚合函数。首先应用聚合。从您的描述中,我不明白您希望如何处理每个事件的多次支付/买入。作为猜测,我计算了每个事件的总和。现在,我可以删除sp.payout,并s.buyinGROUP BY条款和得到每一个行playerevent

SELECT p.name
     , e.event_id
     , e.date
     , sum(sum(sp.payout)) OVER w
     - sum(sum(s.buyin  )) OVER w AS "Profit/Loss" 
FROM   player            p
JOIN   result            r ON r.player_id     = p.player_id  
JOIN   game              g ON g.game_id       = r.game_id 
JOIN   event             e ON e.event_id      = g.event_id 
JOIN   structure         s ON s.structure_id  = g.structure_id 
JOIN   structure_payout sp ON sp.structure_id = g.structure_id
                          AND sp.position     = r.position
WHERE  p.player_id = 17 
GROUP  BY e.event_id
WINDOW w AS (ORDER BY e.date, e.event_id)
ORDER  BY e.date, e.event_id;

In this expression: sum(sum(sp.payout)) OVER w, the outer sum()is a window function, the inner sum()is an aggregate function.

在这个表达式中:sum(sum(sp.payout)) OVER w,外部sum()是一个窗口函数,内部sum()是一个聚合函数。

Assuming p.player_idand e.event_idare PRIMARY KEYin their respective tables.

假设p.player_ide.event_idPRIMARY KEY它们各自的表英寸

I added e.event_idto the ORDER BYof the WINDOWclause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included event_idin the result to distinguish multiple events per day.

我在子句中添加e.event_idORDER BYofWINDOW以获得确定性的排序顺序。(同一日期可能有多个事件。)也包括event_id在结果中以区分每天的多个事件。

While the query restricts to a singleplayer (WHERE p.player_id = 17), we don't need to add p.nameor p.player_idto GROUP BYand ORDER BY. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by p.namecould not repair the query then.

虽然查询仅限于单个玩家 ( WHERE p.player_id = 17),但我们不需要添加p.nameorp.player_idGROUP BYand ORDER BY。如果连接之一会过度地乘以行,则结果总和将不正确(部分或完全相乘)。分组依据p.name无法修复查询。

I also removed e.datefrom the GROUP BYclause. The primary key e.event_idcovers all columns of the input row since PostgreSQL 9.1.

我也e.dateGROUP BY条款中删除。自 PostgreSQL 9.1 起,主键e.event_id覆盖输入行的所有列。

Ifyou change the query to return multiple players at once, adapt:

如果您将查询更改为一次返回多个玩家,请调整:

...
WHERE  p.player_id < 17  -- example - multiple players
GROUP  BY p.name, p.player_id, e.date, e.event_id  -- e.date and p.name redundant
WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id)
ORDER  BY p.name, p.player_id, e.date, e.event_id;

Unless p.nameis defined unique (?), group and order by player_idadditionally to get correct results in a deterministic sort order.

除非p.name被定义为唯一的 (?),否则group 和 order byplayer_id以确定的排序顺序获得正确的结果。

I only kept e.dateand p.namein GROUP BYto have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just e.datein the first query.)

我只在所有子句中保持e.datep.nameinGROUP BY具有相同的排序顺序,希望能提高性能。否则,您可以删除那里的列。(类似于仅e.date在第一个查询中。)