postgresql 如何有效地选择以前的非空值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18987791/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-11 00:25:33  来源:igfitidea点击:

How do I efficiently select the previous non-null value?

postgresql

提问by adamlamar

I have a table in Postgres that looks like this:

我在 Postgres 中有一个表,如下所示:

# select * from p;
 id | value 
----+-------
  1 |   100
  2 |      
  3 |      
  4 |      
  5 |      
  6 |      
  7 |      
  8 |   200
  9 |          
(9 rows)

And I'd like to query to make it look like this:

我想查询以使其看起来像这样:

# select * from p;
 id | value | new_value
----+-------+----------
  1 |   100 |    
  2 |       |    100
  3 |       |    100
  4 |       |    100
  5 |       |    100
  6 |       |    100
  7 |       |    100
  8 |   200 |    100
  9 |       |    200
(9 rows)

I can already do this with a subquery in the select, but in my real data I have 20k or more rows and it gets to be quite slow.

我已经可以使用 select 中的子查询来做到这一点,但在我的真实数据中,我有 20k 或更多行,而且速度很慢。

Is this possible to do in a window function? I'd love to use lag(), but it doesn't seem to support the IGNORE NULLSoption.

这可以在窗口函数中完成吗?我很想使用 lag(),但它似乎不支持IGNORE NULLS选项。

select id, value, lag(value, 1) over (order by id) as new_value from p;
 id | value | new_value
----+-------+-----------
  1 |   100 |      
  2 |       |       100
  3 |       |      
  4 |       |
  5 |       |
  6 |       |
  7 |       |
  8 |   200 |
  9 |       |       200
(9 rows)

回答by adamlamar

I found this answerfor SQL Server that also works in Postgres. Having never done it before, I thought the technique was quite clever. Basically, he creates a custom partition for the windowing function by using a case statement inside of a nested query that increments a sum when the value is not null and leaves it alone otherwise. This allows one to delineate every null section with the same number as the previous non-null value. Here's the query:

我找到了同样适用于 Postgres 的 SQL Server 的答案。以前从未做过,我认为这项技术非常聪明。基本上,他通过在嵌套查询中使用 case 语句为窗口函数创建一个自定义分区,该语句在值不为空时增加总和,否则不理会它。这允许用与前一个非空值相同的数字来描述每个空部分。这是查询:

SELECT
  id, value, value_partition, first_value(value) over (partition by value_partition order by id)
FROM (
  SELECT
    id,
    value,
    sum(case when value is null then 0 else 1 end) over (order by id) as value_partition

  FROM p
  ORDER BY id ASC
) as q

And the results:

结果:

 id | value | value_partition | first_value
----+-------+-----------------+-------------
  1 |   100 |               1 |         100
  2 |       |               1 |         100
  3 |       |               1 |         100
  4 |       |               1 |         100
  5 |       |               1 |         100
  6 |       |               1 |         100
  7 |       |               1 |         100
  8 |   200 |               2 |         200
  9 |       |               2 |         200
(9 rows)

回答by Slobodan Pejic

You can create a custom aggregate function in Postgres. Here's an example for the inttype:

您可以在 Postgres 中创建自定义聚合函数。以下是该int类型的示例:

CREATE FUNCTION coalesce_agg_sfunc(state int, value int) RETURNS int AS
$$
    SELECT coalesce(value, state);
$$ LANGUAGE SQL;

CREATE AGGREGATE coalesce_agg(int) (
    SFUNC = coalesce_agg_sfunc,
    STYPE  = int);

Then query as usual.

然后像往常一样查询。

SELECT *, coalesce_agg(b) over w, sum(b) over w FROM y
  WINDOW w AS (ORDER BY a);

a b coalesce_agg sum 
- - ------------ ---
a 0            0   0
b ?            0   0
c 2            2   2
d 3            3   5
e ?            3   5
f 5            5  10
(6 rows)

回答by MatheusOl

Well, I can't guarantee this is the most efficient way, but works:

好吧,我不能保证这是最有效的方法,但有效:

SELECT id, value, (
    SELECT p2.value
    FROM p p2
    WHERE p2.value IS NOT NULL AND p2.id <= p1.id
    ORDER BY p2.id DESC
    LIMIT 1
) AS new_value
FROM p p1 ORDER BY id;

The following index can improve the sub-query for large datasets:

以下索引可以改进大型数据集的子查询:

CREATE INDEX idx_p_idvalue_nonnull ON p (id, value) WHERE value IS NOT NULL;

Assuming the valueis sparse (e.g. there are a lot of nulls) it will run fine.

假设value稀疏(例如有很多空值)它会运行良好。

回答by Zoran Stipanicev

You can use LAST_VALUE with FILTER to achieve what you need (at least in PG 9.4)

您可以将 LAST_VALUE 与 FILTER 一起使用来实现您的需求(至少在 PG 9.4 中)

WITH base AS (
SELECT 1 AS id , 100 AS val
UNION ALL
SELECT 2 AS id , null AS val
UNION ALL
SELECT 3 AS id , null AS val
UNION ALL
SELECT 4 AS id , null AS val
UNION ALL
SELECT 5 AS id , 200 AS val
UNION ALL
SELECT 6 AS id , null AS val
UNION ALL
SELECT 7 AS id , null AS val
)
SELECT id, val, last(val) FILTER (WHERE val IS NOT NULL) over(ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) new_val
  FROM base