SQL PostgreSQL - 我应该如何使用 first_value()?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15721207/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL - how should I use first_value()?
提问by Brent.Longborough
This answer to shows how to produce High/Low/Open/Close values from a ticker:
Retrieve aggregates for arbitrary time intervals
这个答案显示了如何从代码生成高/低/开盘/收盘值:
检索任意时间间隔的聚合
I am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for first_value()
.
我正在尝试基于此 (PG 9.2) 实施解决方案,但很难获得first_value()
.
So far, I have tried two queries:
到目前为止,我已经尝试了两个查询:
SELECT
cstamp,
price,
date_trunc('hour',cstamp) AS h,
floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
min(price) OVER w,
max(price) OVER w,
first_value(price) OVER w,
last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5)
ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC
)
ORDER BY cstamp;
Here's a piece of the result:
这是结果的一部分:
cstamp price h m5 min max first last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004
As you can see, 77.8 is notwhat I believe is the correct value for first_value()
, which should be 77.0.
如您所见,77.8不是我认为的正确值first_value()
,应该是 77.0。
I though this might be due to the ambiguous ORDER BY
in the WINDOW
, so I changed this to
我虽然这可能是由于不明确ORDER BY
的WINDOW
,所以我改变了这
ORDER BY cstamp ASC
but this appears to upset the PARTITION
as well:
但这似乎也令人不安PARTITION
:
cstamp price h m5 min max first last
"2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000
"2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000
"2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000
"2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000
since the values for max and last now vary within the partition.
因为 max 和 last 的值现在在分区内有所不同。
What am I doing wrong? Could someone help me better to understand the relation between PARTITION
and ORDER
within a WINDOW
?
我究竟做错了什么?有人可以帮助我更好地理解a之间PARTITION
和ORDER
内部的关系WINDOW
吗?
Although I have an answer, here's a trimmed-down pg_dump which will allow anyone to recreate the table. The only thing that's different is the table name.
虽然我有答案,但这里有一个精简的 pg_dump,它允许任何人重新创建表。唯一不同的是表名。
CREATE TABLE wtest (
cstamp timestamp without time zone,
price numeric(10,5)
);
COPY wtest (cstamp, price) FROM stdin;
2013-03-29 09:04:54 77.80000
2013-03-29 09:04:50 76.98000
2013-03-29 09:29:51 77.00000
2013-03-29 09:29:41 77.80000
2013-03-29 09:26:18 77.00000
2013-03-29 09:19:14 77.00000
2013-03-29 09:19:10 77.00000
2013-03-29 09:33:50 76.00000
2013-03-29 09:33:46 76.10000
2013-03-29 09:33:15 77.79000
2013-03-29 09:30:08 77.80000
2013-03-29 09:30:04 77.00000
\.
回答by Clodoaldo Neto
All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...
):
您使用的所有功能都作用于窗框,而不是分区。如果省略,则帧结束是当前行。要使窗口框架成为整个分区,请在框架子句 ( range...
) 中声明它:
SELECT
cstamp,
price,
date_trunc('hour',cstamp) AS h,
floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
min(price) OVER w,
max(price) OVER w,
first_value(price) OVER w,
last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
ORDER BY cstamp
range between unbounded preceding and unbounded following
)
ORDER BY cstamp;
回答by Lukas Eder
Here's a quick query to illustrate the behaviour:
这是一个用于说明行为的快速查询:
select
v,
first_value(v) over w1 f1,
first_value(v) over w2 f2,
first_value(v) over w3 f3,
last_value (v) over w1 l1,
last_value (v) over w2 l2,
last_value (v) over w3 l3,
max (v) over w1 m1,
max (v) over w2 m2,
max (v) over w3 m3,
max (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
w1 as (order by v),
w2 as (order by v rows between unbounded preceding and current row),
w3 as (order by v rows between unbounded preceding and unbounded following)
The output of the above query can be seen here (SQLFiddle here):
| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 | 1 | 1 | 1 | 1 | 1 | 4 | 1 | 1 | 4 | 4 |
| 2 | 1 | 1 | 1 | 2 | 2 | 4 | 2 | 2 | 4 | 4 |
| 3 | 1 | 1 | 1 | 3 | 3 | 4 | 3 | 3 | 4 | 4 |
| 4 | 1 | 1 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Few people think of the implicit frames that are applied to window functions that take an ORDER BY
clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. Think about it this way:
很少有人想到应用于带ORDER BY
子句的窗口函数的隐式框架。在这种情况下, windows 默认为 frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
。这样想:
- On the row with
v = 1
the ordered window's frame spansv IN (1)
- On the row with
v = 2
the ordered window's frame spansv IN (1, 2)
- On the row with
v = 3
the ordered window's frame spansv IN (1, 2, 3)
- On the row with
v = 4
the ordered window's frame spansv IN (1, 2, 3, 4)
- 在具有
v = 1
有序窗口的框架跨度的行上v IN (1)
- 在具有
v = 2
有序窗口的框架跨度的行上v IN (1, 2)
- 在具有
v = 3
有序窗口的框架跨度的行上v IN (1, 2, 3)
- 在具有
v = 4
有序窗口的框架跨度的行上v IN (1, 2, 3, 4)
If you want to prevent that behaviour, you have two options:
如果你想阻止这种行为,你有两个选择:
- Use an explicit
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
clause for orderedwindow functions - Use no
ORDER BY
clause in those window functions that allow for omitting them (asMAX(v) OVER()
)
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
对有序窗口函数使用显式子句ORDER BY
在那些允许省略它们的窗口函数中使用 no子句 (asMAX(v) OVER()
)
More details are explained in this article about LEAD()
, LAG()
, FIRST_VALUE()
and LAST_VALUE()
回答by Erwin Brandstetter
The result of max()
as window function is base on the frame definition.
max()
as 窗函数的结果是基于框架定义的。
The default frame definition (with ORDER BY
) is from the start of the frame up to the last peer of the current row(including the current row and possibly more rows ranking equallyaccording to ORDER BY
). In the absence of ORDER BY
(like in my answer you are referring to), or if ORDER BY
treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max()
produces the same result for every row in the partition, effectively considering allrows of the partition.
默认的框架定义(with ORDER BY
)是从框架的开始到当前行的最后一个同级(包括当前行和可能更多的行根据排名相同ORDER BY
)。在没有ORDER BY
(就像在我的回答中你所指的),或者如果ORDER BY
将分区中的每一行都视为相等(就像在你的第一个例子中一样),分区中的所有行都是对等的,并且max()
对中的每一行产生相同的结果分区,有效地考虑了分区的所有行。
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's last peer. WithoutORDER BY
, all rows of the partition are included in the window frame, since all rows become peers of the current row.
默认的成帧选项是
RANGE UNBOUNDED PRECEDING
,与RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. 使用ORDER BY
,这会将框架设置为从分区开始到当前行的最后一个 peer 的所有行 。如果没有ORDER BY
,分区的所有行都包含在窗口框架中,因为所有行都成为当前行的对等方。
Bold emphasis mine.
大胆强调我的。
The simple solution would be to omit the ORDER BY
in the window definition - just like I demonstrated in the example you are referring to.
简单的解决方案是在窗口定义中省略ORDER BY
- 就像我在您所指的示例中演示的那样。
All the gory details about frame specifications in the chapter Window Function Callsin the manual.
手册中窗口函数调用一章中有关框架规范的所有详细信息。