postgresql 中的移动平均线

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14749450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:47:19  来源:igfitidea点击:

Moving average in postgresql

postgresqlmathaveragepostgresql-9.1moving-average

提问by Glicious

I have the following table in my Postgresql 9.1 database:

我的 Postgresql 9.1 数据库中有下表:

select * from ro;
date       |  shop_id | amount 
-----------+----------+--------
2013-02-07 |     1001 |      3
2013-01-31 |     1001 |      2
2013-01-24 |     1001 |      1
2013-01-17 |     1001 |      5
2013-02-10 |     1001 |     10
2013-02-03 |     1001 |      4
2012-12-27 |     1001 |      6
2012-12-20 |     1001 |      8
2012-12-13 |     1001 |      4
2012-12-06 |     1001 |      3
2012-10-29 |     1001 |      3

I am trying to get a moving average comparing data against last 3 Thursdays without including the current Thursday. Here's my query:

我正在尝试将数据与过去 3 个星期四的数据进行比较,而不包括当前星期四。这是我的查询:

select date, shop_id, amount, extract(dow from date),
avg(amount) OVER (PARTITION BY extract(dow from date) ORDER BY date DESC
                      ROWS BETWEEN 0 PRECEDING AND 2 FOLLOWING)                          
from ro
where extract(dow from date) = 4

This is the result given

这是给出的结果

date       |  shop_id | amount | date_part |        avg         
-----------+----------+--------+-----------+--------------------
2013-02-07 |     1001 |      3 |         4 | 2.0000000000000000
2013-01-31 |     1001 |      2 |         4 | 2.6666666666666667
2013-01-24 |     1001 |      1 |         4 | 4.0000000000000000
2013-01-17 |     1001 |      5 |         4 | 6.3333333333333333
2012-12-27 |     1001 |      6 |         4 | 6.0000000000000000
2012-12-20 |     1001 |      8 |         4 | 5.0000000000000000
2012-12-13 |     1001 |      4 |         4 | 3.5000000000000000
2012-12-06 |     1001 |      3 |         4 | 3.0000000000000000

I expect

我预计

date       |  shop_id | amount | date_part |        avg         
-----------+----------+--------+-----------+--------------------
2013-02-07 |     1001 |      3 |         4 | 2.6666666666666667
2013-01-31 |     1001 |      2 |         4 | 4.0000000000000000
2013-01-24 |     1001 |      1 |         4 | 6.3333333333333333
2013-01-17 |     1001 |      5 |         4 | 6.0000000000000000
2012-12-27 |     1001 |      6 |         4 | 5.0000000000000000
2012-12-20 |     1001 |      8 |         4 |
2012-12-13 |     1001 |      4 |         4 |
2012-12-06 |     1001 |      3 |         4 |

采纳答案by Clodoaldo Neto

SQL Fiddle

SQL小提琴

select
    "date",
    shop_id,
    amount,
    extract(dow from date),
    case when
        row_number() over (order by date) > 3
        then
            avg(amount) OVER (
                ORDER BY date DESC
                ROWS BETWEEN 1 following AND 3 FOLLOWING
            )
        else null end
from (
    select *
    from ro
    where extract(dow from date) = 4
) s

What is wrong with the OP's query is the frame specification:

OP 的查询有什么问题是框架规范:

ROWS BETWEEN 0 PRECEDING AND 2 FOLLOWING

Other than that my query avoids unneeded computing by filtering Thursdays before applying the expensive window functions.

除此之外,我的查询通过在应用昂贵的窗口函数之前过滤星期四来避免不必要的计算。

If it is necessary to partition by shop_id then obviously add the partition by shop_idto both functions, avgand row_number.

如果有必要的分区由shop_id那么显然添加partition by shop_id到两个功能,avgrow_number

回答by Ian Gow

I think a better answer might be:

我认为更好的答案可能是:

SELECT date, shop_id, amount, 
    extract(dow from date) AS dow,
    CASE WHEN count(amount) OVER w = 3 
        THEN avg(amount) OVER w END AS average_amt             
FROM ro
WHERE extract(dow from date) = 4 
WINDOW w AS (ORDER BY date DESC ROWS BETWEEN 1 FOLLOWING AND 3 FOLLOWING)

I think it is cleaner to use the same window for both checking the number of rows in the window andtaking the average. (This also saves two window aggregations, as can be seen in the original answer.)

我认为使用同一个窗口来检查窗口中的行数取平均值会更干净。(这也节省了两个窗口聚合,如原始答案中所示。)

Regarding the claim in the earlier answer that "my query avoids unneeded computing by filtering Thursdays before applying the expensive window functions", this also applies to the query suggested by the OP and to my query, as appending EXPLAINto either shows.

关于早期答案中的声明,即“我的查询通过在应用昂贵的窗口函数之前过滤星期四来避免不必要的计算”,这也适用于 OP 建议的查询和我的查询,如附加EXPLAIN到任一显示。