postgresql 在同一个分区上应用多个窗口函数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1896102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 23:56:16  来源:igfitidea点击:

Applying Multiple Window Functions On Same Partition

sqlpostgresqlwindow-functions

提问by Verhogen

Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)

是否可以将多个窗口函数应用于同一个分区?(如果我没有使用正确的词汇,请纠正我)

For example you can do

例如你可以做

SELECT name, first_value() over (partition by name order by date) from table1

But is there a way to do something like:

但是有没有办法做这样的事情:

SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1

Where we are applying two functions onto the same window?

我们在哪里将两个函数应用到同一个窗口?

Reference: http://postgresql.ro/docs/8.4/static/tutorial-window.html

参考:http: //postgresql.ro/docs/8.4/static/tutorial-window.html

回答by Adriaan Stander

Can you not just use the window per selection

你不能只使用每个选择的窗口吗

Something like

就像是

SELECT  name, 
        first_value() OVER (partition by name order by date) as f, 
        last_value() OVER (partition by name order by date) as l 
from table1

Also from your reference you can do it like this

同样从您的参考中,您可以这样做

SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)

回答by Skippy le Grand Gourou

Warning?:I don't delete this answer since it seems technically correct and therefore may be helpful, but bewarethat PARTITION BY bar ORDER BY foois probably not what you want to doanyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)is not equivalentto SELECT avg(foo) OVER (PARTITION BY bar)(see proof at the end of the answer).

警告?:因为它似乎技术上是正确的,我不删除这个答案,因此可能会有所帮助,但要注意PARTITION BY bar ORDER BY foo可能不是你想要做什么呢。实际上,聚合函数不会整体计算分区元素。也就是说,SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)不等同SELECT avg(foo) OVER (PARTITION BY bar)(见证据在回答结束)。

Though it doesn't improve performance per se, if you use multiple times the same partition, you probably wantto use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.

虽然它不会提高性能本身,如果你多次使用同一个分区,您可能希望使用由astander提出的第二个语法,这不仅是因为它更便宜来写。这是为什么。

Consider the following query?:

考虑以下查询?:

SELECT 
  array_agg(foo)
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar ORDER BY foo) 
FROM 
  foobar;

Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition)?:

由于原则上排序对平均值的计算没有影响,您可能会想改用以下查询(第二个分区上没有排序)?:

SELECT 
  array_agg(foo) 
    OVER (PARTITION BY bar ORDER BY foo), 
  avg(baz)
    OVER (PARTITION BY bar) 
FROM 
  foobar;

This is a big mistake, as it will take much longer. Proof :

这是一个很大的错误,因为它需要更长的时间。证明 :

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
                                                           QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
   ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
         Sort Key: bar, foo
         Sort Method: quicksort  Memory: 130006kB
         ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
 Total runtime: 2458.969 ms
(6 lignes)

> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
                                                              QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------------
 WindowAgg  (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
   ->  WindowAgg  (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
         ->  Sort  (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
               Sort Key: bar, foo
               Sort Method: quicksort  Memory: 130006kB
               ->  Seq Scan on foobar  (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
 Total runtime: 3060.041 ms
(7 lignes)

Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BYclause on a partition which doesn't need it by itself.

现在,如果您意识到这个问题,您当然会在任何地方使用相同的分区。但是当您有十次或更多相同的分区并且您在几天内更新它时,很容易忘记ORDER BY在一个不需要它的分区上添加子句。

Here comes the WINDOWsyntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE) to the first query?:

这里是WINDOW语法,它可以防止你犯这种粗心大意的错误(当然,前提是你知道最好尽量减少不同窗口函数的数量)。以下严格等同于(据我所知EXPLAIN ANALYZE)第一个查询?:

SELECT
  array_agg(foo)
    OVER qux,
  avg(baz)
    OVER qux
FROM
  foobar
WINDOW
  qux AS (PARTITION BY bar ORDER BY bar)

Post-warning update?:

警告后更新?:

I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)is not equivalentto SELECT avg(foo) OVER (PARTITION BY bar)" seems questionable, so here is an example?:

据我所知,“声明SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)不等同SELECT avg(foo) OVER (PARTITION BY bar)”似乎值得商榷,所以这里有一个例子?:

# SELECT * FROM foobar;
 foo | bar 
-----+-----
   1 |   1
   2 |   2
   3 |   1
   4 |   2
(4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
 array_agg | avg 
-----------+-----
 {1,3}     |   2
 {1,3}     |   2
 {2,4}     |   3
 {2,4}     |   3
 (4 lines)

# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
 array_agg | avg 
-----------+-----
 {1}       |   1
 {1,3}     |   2
 {2}       |   2
 {2,4}     |   3
(4 lines)