postgresql 在同一个分区上应用多个窗口函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1896102/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Applying Multiple Window Functions On Same Partition
提问by Verhogen
Is it possible to apply multiple window functions to the same partition? (Correct me if I'm not using the right vocabulary)
是否可以将多个窗口函数应用于同一个分区?(如果我没有使用正确的词汇,请纠正我)
For example you can do
例如你可以做
SELECT name, first_value() over (partition by name order by date) from table1
But is there a way to do something like:
但是有没有办法做这样的事情:
SELECT name, (first_value() as f, last_value() as l (partition by name order by date)) from table1
Where we are applying two functions onto the same window?
我们在哪里将两个函数应用到同一个窗口?
Reference: http://postgresql.ro/docs/8.4/static/tutorial-window.html
参考:http: //postgresql.ro/docs/8.4/static/tutorial-window.html
回答by Adriaan Stander
Can you not just use the window per selection
你不能只使用每个选择的窗口吗
Something like
就像是
SELECT name,
first_value() OVER (partition by name order by date) as f,
last_value() OVER (partition by name order by date) as l
from table1
Also from your reference you can do it like this
同样从您的参考中,您可以这样做
SELECT sum(salary) OVER w, avg(salary) OVER w
FROM empsalary
WINDOW w AS (PARTITION BY depname ORDER BY salary DESC)
回答by Skippy le Grand Gourou
Warning?:I don't delete this answer since it seems technically correct and therefore may be helpful, but bewarethat PARTITION BY bar ORDER BY foo
is probably not what you want to doanyway. Indeed, aggregate functions won't compute the partition elements as a whole. That is, SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
is not equivalentto SELECT avg(foo) OVER (PARTITION BY bar)
(see proof at the end of the answer).
警告?:因为它似乎技术上是正确的,我不删除这个答案,因此可能会有所帮助,但要注意这PARTITION BY bar ORDER BY foo
是可能不是你想要做什么呢。实际上,聚合函数不会整体计算分区元素。也就是说,SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
不等同于SELECT avg(foo) OVER (PARTITION BY bar)
(见证据在回答结束)。
Though it doesn't improve performance per se, if you use multiple times the same partition, you probably wantto use the second syntax proposed by astander, and not only because it's cheaper to write. Here is why.
虽然它不会提高性能本身,如果你多次使用同一个分区,您可能希望使用由astander提出的第二个语法,这不仅是因为它更便宜来写。这是为什么。
Consider the following query?:
考虑以下查询?:
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar ORDER BY foo)
FROM
foobar;
Since in principle the ordering has no effect on the computation of the average, you might be tempted to use the following query instead (no ordering on the second partition)?:
由于原则上排序对平均值的计算没有影响,您可能会想改用以下查询(第二个分区上没有排序)?:
SELECT
array_agg(foo)
OVER (PARTITION BY bar ORDER BY foo),
avg(baz)
OVER (PARTITION BY bar)
FROM
foobar;
This is a big mistake, as it will take much longer. Proof :
这是一个很大的错误,因为它需要更长的时间。证明 :
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar ORDER BY foo) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..254591.76 rows=1724882 width=12) (actual time=969.659..2353.865 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=969.640..1083.039 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.027..393.815 rows=1724882 loops=1)
Total runtime: 2458.969 ms
(6 lignes)
> EXPLAIN ANALYZE SELECT array_agg(foo) OVER (PARTITION BY bar ORDER BY foo), avg(baz) OVER (PARTITION BY bar) FROM foobar;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=215781.92..276152.79 rows=1724882 width=12) (actual time=938.733..2958.811 rows=1724882 loops=1)
-> WindowAgg (cost=215781.92..250279.56 rows=1724882 width=12) (actual time=938.699..2033.172 rows=1724882 loops=1)
-> Sort (cost=215781.92..220094.12 rows=1724882 width=12) (actual time=938.683..1062.568 rows=1724882 loops=1)
Sort Key: bar, foo
Sort Method: quicksort Memory: 130006kB
-> Seq Scan on foobar (cost=0.00..37100.82 rows=1724882 width=12) (actual time=0.028..377.299 rows=1724882 loops=1)
Total runtime: 3060.041 ms
(7 lignes)
Now, if you are aware of this issue, of course you will use the same partition everywhere. But when you have ten times or more the same partition and you are updating it over days, it is quite easy to forget to add the ORDER BY
clause on a partition which doesn't need it by itself.
现在,如果您意识到这个问题,您当然会在任何地方使用相同的分区。但是当您有十次或更多相同的分区并且您在几天内更新它时,很容易忘记ORDER BY
在一个不需要它的分区上添加子句。
Here comes the WINDOW
syntax, which will prevent you from such careless mistakes (provided, of course, you're aware it's better to minimize the number of different window functions). The following is strictly equivalent (as far as I can tell from EXPLAIN ANALYZE
) to the first query?:
这里是WINDOW
语法,它可以防止你犯这种粗心大意的错误(当然,前提是你知道最好尽量减少不同窗口函数的数量)。以下严格等同于(据我所知EXPLAIN ANALYZE
)第一个查询?:
SELECT
array_agg(foo)
OVER qux,
avg(baz)
OVER qux
FROM
foobar
WINDOW
qux AS (PARTITION BY bar ORDER BY bar)
Post-warning update?:
警告后更新?:
I understand the statement that "SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
is not equivalentto SELECT avg(foo) OVER (PARTITION BY bar)
" seems questionable, so here is an example?:
据我所知,“声明SELECT avg(foo) OVER (PARTITION BY bar ORDER BY foo)
不等同于SELECT avg(foo) OVER (PARTITION BY bar)
”似乎值得商榷,所以这里有一个例子?:
# SELECT * FROM foobar;
foo | bar
-----+-----
1 | 1
2 | 2
3 | 1
4 | 2
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar);
array_agg | avg
-----------+-----
{1,3} | 2
{1,3} | 2
{2,4} | 3
{2,4} | 3
(4 lines)
# SELECT array_agg(foo) OVER qux, avg(foo) OVER qux FROM foobar WINDOW qux AS (PARTITION BY bar ORDER BY foo);
array_agg | avg
-----------+-----
{1} | 1
{1,3} | 2
{2} | 2
{2,4} | 3
(4 lines)