postgresql 如何有效地选择以前的非空值？

Question

提问by adamlamar

I have a table in Postgres that looks like this:

我在 Postgres 中有一个表，如下所示：

# select * from p;
 id | value 
----+-------
  1 |   100
  2 |      
  3 |      
  4 |      
  5 |      
  6 |      
  7 |      
  8 |   200
  9 |          
(9 rows)

And I'd like to query to make it look like this:

我想查询以使其看起来像这样：

# select * from p;
 id | value | new_value
----+-------+----------
  1 |   100 |    
  2 |       |    100
  3 |       |    100
  4 |       |    100
  5 |       |    100
  6 |       |    100
  7 |       |    100
  8 |   200 |    100
  9 |       |    200
(9 rows)

I can already do this with a subquery in the select, but in my real data I have 20k or more rows and it gets to be quite slow.

我已经可以使用 select 中的子查询来做到这一点，但在我的真实数据中，我有 20k 或更多行，而且速度很慢。

Is this possible to do in a window function? I'd love to use lag(), but it doesn't seem to support the IGNORE NULLSoption.

这可以在窗口函数中完成吗？我很想使用 lag()，但它似乎不支持IGNORE NULLS选项。

select id, value, lag(value, 1) over (order by id) as new_value from p;
 id | value | new_value
----+-------+-----------
  1 |   100 |      
  2 |       |       100
  3 |       |      
  4 |       |
  5 |       |
  6 |       |
  7 |       |
  8 |   200 |
  9 |       |       200
(9 rows)

Answer 1

回答by adamlamar

I found this answerfor SQL Server that also works in Postgres. Having never done it before, I thought the technique was quite clever. Basically, he creates a custom partition for the windowing function by using a case statement inside of a nested query that increments a sum when the value is not null and leaves it alone otherwise. This allows one to delineate every null section with the same number as the previous non-null value. Here's the query:

我找到了同样适用于 Postgres 的 SQL Server 的答案。以前从未做过，我认为这项技术非常聪明。基本上，他通过在嵌套查询中使用 case 语句为窗口函数创建一个自定义分区，该语句在值不为空时增加总和，否则不理会它。这允许用与前一个非空值相同的数字来描述每个空部分。这是查询：

SELECT
  id, value, value_partition, first_value(value) over (partition by value_partition order by id)
FROM (
  SELECT
    id,
    value,
    sum(case when value is null then 0 else 1 end) over (order by id) as value_partition

  FROM p
  ORDER BY id ASC
) as q

And the results:

结果：

 id | value | value_partition | first_value
----+-------+-----------------+-------------
  1 |   100 |               1 |         100
  2 |       |               1 |         100
  3 |       |               1 |         100
  4 |       |               1 |         100
  5 |       |               1 |         100
  6 |       |               1 |         100
  7 |       |               1 |         100
  8 |   200 |               2 |         200
  9 |       |               2 |         200
(9 rows)

Answer 2

回答by Slobodan Pejic

You can create a custom aggregate function in Postgres. Here's an example for the inttype:

您可以在 Postgres 中创建自定义聚合函数。以下是该int类型的示例：

CREATE FUNCTION coalesce_agg_sfunc(state int, value int) RETURNS int AS
$$
    SELECT coalesce(value, state);
$$ LANGUAGE SQL;

CREATE AGGREGATE coalesce_agg(int) (
    SFUNC = coalesce_agg_sfunc,
    STYPE  = int);

Then query as usual.

然后像往常一样查询。

SELECT *, coalesce_agg(b) over w, sum(b) over w FROM y
  WINDOW w AS (ORDER BY a);

a b coalesce_agg sum 
- - ------------ ---
a 0            0   0
b ?            0   0
c 2            2   2
d 3            3   5
e ?            3   5
f 5            5  10
(6 rows)

Answer 3

回答by MatheusOl

Well, I can't guarantee this is the most efficient way, but works:

好吧，我不能保证这是最有效的方法，但有效：

SELECT id, value, (
    SELECT p2.value
    FROM p p2
    WHERE p2.value IS NOT NULL AND p2.id <= p1.id
    ORDER BY p2.id DESC
    LIMIT 1
) AS new_value
FROM p p1 ORDER BY id;

The following index can improve the sub-query for large datasets:

以下索引可以改进大型数据集的子查询：

CREATE INDEX idx_p_idvalue_nonnull ON p (id, value) WHERE value IS NOT NULL;

Assuming the valueis sparse (e.g. there are a lot of nulls) it will run fine.

假设value稀疏（例如有很多空值）它会运行良好。

Answer 4

回答by Zoran Stipanicev

You can use LAST_VALUE with FILTER to achieve what you need (at least in PG 9.4)

您可以将 LAST_VALUE 与 FILTER 一起使用来实现您的需求（至少在 PG 9.4 中）

WITH base AS (
SELECT 1 AS id , 100 AS val
UNION ALL
SELECT 2 AS id , null AS val
UNION ALL
SELECT 3 AS id , null AS val
UNION ALL
SELECT 4 AS id , null AS val
UNION ALL
SELECT 5 AS id , 200 AS val
UNION ALL
SELECT 6 AS id , null AS val
UNION ALL
SELECT 7 AS id , null AS val
)
SELECT id, val, last(val) FILTER (WHERE val IS NOT NULL) over(ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) new_val
  FROM base

postgresql 如何有效地选择以前的非空值？

提问by adamlamar

回答by adamlamar

回答by Slobodan Pejic

回答by MatheusOl

回答by Zoran Stipanicev

相关推荐

最近更新

标签

postgresql 如何有效地选择以前的非空值？

提问by adamlamar

回答by adamlamar

回答by Slobodan Pejic

回答by MatheusOl

回答by Zoran Stipanicev

相关推荐

postgresql fe_sendauth：未提供密码

PostgreSQL - 必须出现在 GROUP BY 子句中或在聚合函数中使用

postgresql 在postgres中获取月份的第一个日期

postgresql 无法找到从未知到文本的转换函数

相关推荐

最近更新

标签