条件超前/滞后功能 PostgreSQL？

Question

提问by KolM

I have a table like this:

我有一张这样的表：

Name   activity  time

user1  A1        12:00
user1  E3        12:01
user1  A2        12:02
user2  A1        10:05
user2  A2        10:06
user2  A3        10:07
user2  M6        10:07
user2  B1        10:08
user3  A1        14:15
user3  B2        14:20
user3  D1        14:25
user3  D2        14:30

Now, I need a result like this:

现在，我需要这样的结果：

Name   activity  next_activity

user1  A2        NULL
user2  A3        B1
user3  A1        B2

I would like to check for every user the last activity from group A and what type of activity took place next from group B (activity from group B always takes place after activity from group A). Other types of activity are not interesting for me. I've tried to use the lead()function, but it hasn't worked.

我想为每个用户检查 A 组的最后一个活动以及 B 组接下来发生的活动类型（B 组的活动总是在 A 组的活动之后发生）。其他类型的活动对我来说并不有趣。我尝试使用该lead()功能，但没有奏效。

How I can solve my problem?

我该如何解决我的问题？

Answer 1

回答by Erwin Brandstetter

Test setup:

测试设置：

CREATE TEMP TABLE t (name text, activity text, time time);
INSERT INTO t values
 ('user1', 'A1', '12:00')
,('user1', 'E3', '12:01')
,('user1', 'A2', '12:02')
,('user2', 'A1', '10:05')
,('user2', 'A2', '10:06')
,('user2', 'A3', '10:07')
,('user2', 'M6', '10:07')
,('user2', 'B1', '10:08')
,('user3', 'A1', '14:15')
,('user3', 'B2', '14:20')
,('user3', 'D1', '14:25')
,('user3', 'D2', '14:30');

Your definition:

你的定义：

activity from group B always takes place after activity from group A.

B 组的活动总是在 A 组的活动之后进行。

.. logically implies that there is, per user, 0 or 1 B activity after 1 or more A activities. Never more than 1 B activities in sequence.

.. 从逻辑上说，在 1 个或多个 A 活动之后，每个用户有 0 或 1 个 B 活动。顺序不超过 1 个 B 活动。

You can make it work with a single window function, DISTINCT ONand CASE, which should be the fastest way for fewrows per user (also see below):

您可以使用单个窗口函数DISTINCT ON和CASE，这应该是每个用户几行的最快方法（另见下文）：

SELECT name
     , CASE WHEN a2 LIKE 'B%' THEN a1 ELSE a2 END AS activity
     , CASE WHEN a2 LIKE 'B%' THEN a2 END AS next_activity
FROM  (
   SELECT DISTINCT ON (name)
          name
        , lead(activity) OVER (PARTITION BY name ORDER BY time DESC) AS a1
        , activity AS a2
   FROM   t
   WHERE (activity LIKE 'A%' OR activity LIKE 'B%')
   ORDER  BY name, time DESC
   ) sub;

An SQL CASEexpression defaults to NULLif no ELSEbranch is added, so I kept that short.

如果没有添加分支，SQLCASE表达式默认为，所以我保持简短。NULLELSE

Also assuming timeis defined NOT NULL. Else, you might want to add NULLS LAST. Why?

还假设time已定义NOT NULL。否则，您可能想要添加NULLS LAST. 为什么？

Select first row in each GROUP BY group?

选择每个 GROUP BY 组中的第一行？

(activity LIKE 'A%' OR activity LIKE 'B%')is more verbose than activity ~ '^[AB]', but typically faster in older versions of Postgres. About pattern matching:

(activity LIKE 'A%' OR activity LIKE 'B%')比更冗长activity ~ '^[AB]'，但在较旧版本的 Postgres 中通常更快。关于模式匹配：

Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

PostgreSQL 中 LIKE、SIMILAR TO 或正则表达式的模式匹配

Conditional window functions?

条件窗口函数？

That's actually possible. You can combine the aggregate FILTERclause with the OVERclause of window functions. However:

这实际上是可能的。您可以将聚合FILTER子句与OVER窗口函数子句结合使用。但是：

The FILTERclause itself can only work with values from the current row.
More importantly, FILTERis not implemented for pure window functions like lead()or lag()in Postgres 9.6 (yet) - only for aggregate functions.

该FILTER条款本身只能从当前行值工作。
更重要的是，FILTER对于像纯窗口功能未实现lead()或lag()在Postgres的9.6（还） -仅适用于聚集函数。

If you try:

如果你试试：

lead(activity) FILTER (WHERE activity LIKE 'A%') OVER () AS activity

Postgres will tell you:

Postgres 会告诉你：

FILTER is not implemented for non-aggregate window functions

FILTER is not implemented for non-aggregate window functions

About FILTER:

关于FILTER：

Performance

表现

(For fewusers with fewrows per user, pretty much anyquery is fast, even without index.)

（对于少数用户数每个用户行，几乎任何查询速度快，即使没有索引）。

For manyusers and fewrows per user, the first query above should be fastest. See the linked answerabove about index and performance.

对于很多用户和每个用户很少的行，上面的第一个查询应该是最快的。请参阅上面有关索引和性能的链接答案。

For manyrows per user, there are (potentially much) faster techniques, depending on other details of your setup:

对于每个用户的多行，有（可能很多）更快的技术，具体取决于您的设置的其他细节：

Optimize GROUP BY query to retrieve latest record per user

优化 GROUP BY 查询以检索每个用户的最新记录

Answer 2

回答by David ???? Markovitz

select      distinct on(name) name,activity,next_activity

from       (select name,activity,time
                  ,lead(activity) over (partition by name order by time) as next_activity

            from   t

            where  left(activity,1) in ('A','B')
            ) t

where       left(activity,1) = 'A'

order by    name,time desc

条件超前/滞后功能 PostgreSQL？

提问by KolM

回答by Erwin Brandstetter

Conditional window functions?

条件窗口函数？

Performance

表现

回答by David ???? Markovitz

相关推荐

最近更新

标签

条件超前/滞后功能 PostgreSQL？

提问by KolM

回答by Erwin Brandstetter

Conditional window functions?

条件窗口函数？

Performance

表现

回答by David ???? Markovitz

相关推荐

postgresql 带有 -C 选项的 pg_restore 不会创建数据库

postgresql DBeaver 无法访问非默认数据库

postgresql 将 Excel 2016 连接到 Postgres

postgresql 错误：“SELECT”处或附近的语法错误

相关推荐

最近更新

标签