Postgresql 递归自连接

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17261792/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:58:55  来源:igfitidea点击:

Postgresql recursive self join

sqldatabasepostgresql

提问by cpz

My table in postgres looks like below, Table stores a chain sort of relation between IDs and I want to have a query which can produce the result like "vc1" -> "rc7" or "vc3"->"rc7", I will only query on the IDs in first column ID1

我在 postgres 中的表如下所示,表存储了 ID 之间的一种链式关系,我想要一个查询,它可以产生像“vc1”->“rc7”或“vc3”->“rc7”这样的结果,我会仅查询第一列 ID1 中的 ID

ID1     ID2
"vc1"   "vc2"
"vc2"   "vc3"
"vc3"   "vc4"
"vc4"   "rc7"

So I want to supply some "head" id here for which I have to fetch the tail(last in the chain) id.

所以我想在这里提供一些“头部”ID,我必须为它获取尾部(链中的最后一个)ID。

回答by Craig Ringer

This is a classic use of a simple recursive common table expression (WITH RECURSIVE), available in PostgreSQL 8.4 and later.

这是简单递归公用表表达式 ( WITH RECURSIVE)的经典用法,可在 PostgreSQL 8.4 及更高版本中使用。

Demonstrated here: http://sqlfiddle.com/#!12/78e15/9

在这里演示:http: //sqlfiddle.com/#!12/78e15/9

Given the sample data as SQL:

将示例数据作为 SQL:

CREATE TABLE Table1
    ("ID1" text, "ID2" text)
;

INSERT INTO Table1
    ("ID1", "ID2")
VALUES
    ('vc1', 'vc2'),
    ('vc2', 'vc3'),
    ('vc3', 'vc4'),
    ('vc4', 'rc7')
;

You could write:

你可以写:

WITH RECURSIVE chain(from_id, to_id) AS (
  SELECT NULL, 'vc2'
  UNION
  SELECT c.to_id, t."ID2"
  FROM chain c
  LEFT OUTER JOIN Table1 t ON (t."ID1" = to_id)
  WHERE c.to_id IS NOT NULL
)
SELECT from_id FROM chain WHERE to_id IS NULL;

What this does is iteratively walk the chain, adding each row to the chaintable as from- and to-pointers. When it encounters a row for which the 'to' reference doesn't exist it will add a null 'to' reference for that row. The next iteration will notice that the 'to' reference is null and produce zero rows, which causes the iteration to end.

这样做是迭代地遍历链,将每一行chain作为起始和终止指针添加到表中。当它遇到“to”引用不存在的行时,它将为该行添加一个空“to”引用。下一次迭代会注意到“to”引用为空并产生零行,这会导致迭代结束。

The outer query then picks up rows that've been determined to be the end of the chain by having a non-existent to_id.

然后,外部查询通过具有不存在的 to_id 来选取已被确定为链末端的行。

It takes a bit of effort to get your head around recursive CTEs. They key things to understand are:

了解递归 CTE 需要一些努力。他们需要理解的关键是:

  • They start with the output of an initial query, which they repeatedly union with the output of the "recursive part" (the query after the UNIONor UNION ALL) until the recursive part adds no rows. That stops iteration.

  • They aren't really recursive, more iterative, though they're good for the sorts of things you might use recursion for.

  • 它们从初始查询的输出开始,它们反复与“递归部分”(UNION或之后的查询UNION ALL)的输出合并,直到递归部分不添加任何行。这将停止迭代。

  • 它们不是真正的递归,更迭代,尽管它们适用于您可能使用递归的类型。

So you're basically building a table in a loop. You can't delete rows or change them, only add new ones, so you generally need an outer query that filters the results to get the result rows you want. You'll often add extra columns containing intermediate data that you use to track the state of the iteration, control stop-conditions, etc.

所以你基本上是在循环中构建一个表。您无法删除或更改行,只能添加新行,因此您通常需要一个外部查询来过滤结果以获取您想要的结果行。您通常会添加包含中间数据的额外列,用于跟踪迭代状态、控制停止条件等。

It can help to look at the unfiltered result. If I replace the final summary query with a simple SELECT * FROM chainI can see the table that's been generated:

它可以帮助查看未过滤的结果。如果我用一个简单的替换最终的汇总查询,SELECT * FROM chain我可以看到生成的表:

 from_id | to_id 
---------+-------
         | vc2
 vc2     | vc3
 vc3     | vc4
 vc4     | rc7
 rc7     | 
(5 rows)

The first row is the manually added starting point row, where you specify what you want to look up - in this case that was vc2. Each subsequent row was added by the UNIONed recursive term, which does a LEFT OUTER JOINon the previous result and returns a new set of rows that pair up the previous to_id(now in the from_idcolumn) to the next to_id. If the LEFT OUTER JOINdoesn't match then the to_idwill be null, causing the next invocation to return now rows and end iteration.

第一行是手动添加的起点行,您可以在其中指定要查找的内容 - 在本例中为vc2. 每个后续行都由UNIONed 递归项添加,它对LEFT OUTER JOIN前一个结果执行 a并返回一组新的行,这些行将前一个to_id(现在在from_id列中)与下一个配对to_id。如果LEFT OUTER JOIN不匹配,则to_id将为空,导致下一次调用现在返回行并结束迭代。

Because this query doesn't attempt to add only the lastrow each time, it's actually repeating a fair bit of work each iteration. To avoid that you would need to use an approach more like Gordon's, but additionally filter on the previous depth field when you scanned input table, so you joined only the most recent row. In practice this usually isn't necessary, but it can be a concern for very big data sets or where you can't create appropriate indexes.

因为这个查询不会每次都尝试只添加最后一行,它实际上每次迭代都会重复相当多的工作。为避免这种情况,您需要使用更像 Gordon 的方法,但在扫描输入表时额外过滤前一个深度字段,因此您只加入了最近的行。在实践中,这通常不是必需的,但对于非常大的数据集或无法创建适当索引的情况,这可能是一个问题。

More can be learned in the PostgreSQL documentation on CTEs.

可以在有关 CTE 的 PostgreSQL 文档中了解更多信息。

回答by Gordon Linoff

Here is the SQL using a recursive CTE:

这是使用递归 CTE 的 SQL:

with recursive tr(id1, id2, level) as (
      select t.id1, t.id2, 1 as level
      from t union all
      select t.id1, tr.id2, tr.level + 1
      from t join
           tr
           on t.id2 = tr.id1
     )
select *
from (select tr.*,
             max(level) over (partition by id1) as maxlevel
      from tr
     ) tr
where level = maxlevel;

Hereis the SQLFiddle

是 SQLFiddle