oracle 使用递归子查询分解的循环检测
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1731889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cycle detection with recursive subquery factoring
提问by Rob van Wijk
Oracle SQL can do hierarchical queries since v2, using their proprietary CONNECT BY syntax. In their latest 11g release 2, they added recursive subquery factoring, also known as the recursive with clause. This is the ANSI standard, and if I understand correctly, this one has been implemented by other RDBMS vendors as well.
Oracle SQL 从 v2 开始就可以使用其专有的 CONNECT BY 语法进行分层查询。在他们最新的 11g 第 2 版中,他们添加了递归子查询分解,也称为递归 with 子句。这是 ANSI 标准,如果我理解正确的话,其他 RDBMS 供应商也已经实现了这一标准。
When comparing the connect-by with the recursive with, I noticed a difference in the result set when using cycle detection. The connect by results are more intuitive to me, so I'm wondering if Oracle's implementation contains a bug, or if this is standard ANSI and expected behaviour. Therefore my question is if you can check the recursive with query using other databases like MySQL, DB2, SQL Server and others. Provided those databases support the recursive with clause of course.
在比较 connect-by 和递归 with 时,我注意到使用循环检测时结果集存在差异。结果的连接对我来说更直观,所以我想知道 Oracle 的实现是否包含错误,或者这是否是标准的 ANSI 和预期行为。因此,我的问题是您是否可以使用其他数据库(如 MySQL、DB2、SQL Server 等)检查递归查询。当然,前提是那些数据库支持递归 with 子句。
Here is how it works on Oracle 11.2.0.1.0
这是它在 Oracle 11.2.0.1.0 上的工作原理
SQL> select *
2 from t
3 /
ID PARENT_ID
---------- ----------
1 2
2 1
2 rows selected.
The query using CONNECT BY syntax:
使用 CONNECT BY 语法的查询:
SQL> select id
2 , parent_id
3 , connect_by_iscycle
4 from t
5 connect by nocycle parent_id = prior id
6 start with id = 1
7 /
ID PARENT_ID CONNECT_BY_ISCYCLE
---------- ---------- ------------------
1 2 0
2 1 1
2 rows selected.
Which looks intuitive to me. However, using the new ANSI syntax it returns one more row:
这对我来说看起来很直观。但是,使用新的 ANSI 语法,它会再返回一行:
SQL> with tr (id,parent_id) as
2 ( select id
3 , parent_id
4 from t
5 where id = 1
6 union all
7 select t.id
8 , t.parent_id
9 from t
10 join tr on t.parent_id = tr.id
11 ) cycle id set is_cycle to '1' default '0'
12 select id
13 , parent_id
14 , is_cycle
15 from tr
16 /
ID PARENT_ID I
---------- ---------- -
1 2 0
2 1 0
1 2 1
3 rows selected.
This is the script you can use to check:
这是您可以用来检查的脚本:
create table t
( id number
, parent_id number
);
insert into t values (1, 2);
insert into t values (2, 1);
commit;
with tr (id,parent_id) as
( select id
, parent_id
from t
where id = 1
union all
select t.id
, t.parent_id
from t
join tr on t.parent_id = tr.id
) cycle id set is_cycle to '1' default '0'
select id
, parent_id
, is_cycle
from tr;
采纳答案by Quassnoi
From documentation on CONNECT_BY_ISCYCLE
:
从文档中CONNECT_BY_ISCYCLE
:
The
CONNECT_BY_ISCYCLE
pseudocolumn returns1
if the current row has a child which is also its ancestor
该
CONNECT_BY_ISCYCLE
虚列返回1
如果当前行有一个孩子,这也是它的祖先
and that on CYCLE
:
以及CYCLE
:
A row is considered to form a cycle if one of its ancestor rows has the same values for the cycle columns.
如果某一行的祖先行的循环列具有相同的值,则该行被视为形成一个循环。
In your example, row 2
does have a child which is also its ancestor, but its id
has not been returned yet.
在您的示例中, row2
确实有一个孩子,这也是其祖先,但id
尚未返回。
In other words, CONNECT_BY_ISCYCLE
checks the children(which are yet to be returned), while CYCLE
checks the current row(which is already returned).
换句话说,CONNECT_BY_ISCYCLE
检查子行(尚未返回),同时CYCLE
检查当前行(已返回)。
CONNECT BY
is row based, while recursive CTE
's are set-based.
CONNECT BY
是基于行的,而递归是基于CTE
集合的。
Note that Oracle's documentation on CYCLE
mentions an "ancestor row". However, generally speaking, there is no concept of an "ancestor row" in a recursive CTE
. It's a set based operation which can yield results completely out of the tree. Generally speaking, the anchor part and the recursive part can even use the different tables.
请注意,Oracle 的文档中CYCLE
提到了“祖先行”。但是,一般来说,递归中没有“祖先行”的概念CTE
。这是一个基于集合的操作,可以完全从树中产生结果。一般来说,锚部分和递归部分甚至可以使用不同的表。
Since recursive CTE
's are usuallyused to build hierarchy trees, Oracle
decided to add a cycle check. But due the set-based way the recursive CTE
's operate, it's generally impossible to tell will the next step generate a cycle or not, because without a clear definition of the "ancestor row" cycle condition cannot be defined either.
由于递归CTE
的是通常用于建立层次结构树,Oracle
决定增加一个周期的检查。但是由于递归CTE
操作的基于集合的方式,通常无法判断下一步是否会生成循环,因为如果没有明确定义“祖先行”循环条件,也无法定义。
To perform the "next" step, the whole "current" set needs to be available, but to generate each row of the current set (which includes the cycle column) we just need to have the results of the "next" operation.
要执行“下一步”,整个“当前”集合需要可用,但要生成当前集合的每一行(包括循环列),我们只需要“下一步”操作的结果。
It's not a problem if the current set always consists of a single row (like in CONNECT BY
), but it is a problem if the recursive operation defined on a set as a whole.
如果当前集合总是由单行组成(如 in CONNECT BY
),这不是问题,但如果将递归操作定义为一个整体,则是一个问题。
Didn't look into Oracle 11
yet, but SQL Server
implements recursive CTE
's by just hiding a CONNECT BY
behind them, which requires placing numerous restrictions (all of which effectively forbid all set-based operations).
还没有研究Oracle 11
,但是通过将 a 隐藏在它们后面来SQL Server
实现递归,这需要设置许多限制(所有这些限制都有效地禁止了所有基于集合的操作)。CTE
CONNECT BY
PostgreSQL
's implementation, on the other hand, is truly set-based: you can do any operation with the anchor part in the recursive part. It does not have any means to detect cycles, though, because cycles are not defined in the first place.
PostgreSQL
另一方面, 的实现是真正基于集合的:您可以对递归部分中的锚点部分进行任何操作。但是,它没有任何检测周期的方法,因为首先没有定义周期。
As was mentioned before, MySQL
does not implement CTE
's at all (it does not implement HASH JOIN
's or MERGE JOIN
s as well, only the nested loops, so don't be surprised much).
如前所述,MySQL
根本不实现CTE
's (它也没有实现HASH JOIN
's 或MERGE JOIN
s,只有嵌套循环,所以不要感到惊讶)。
Ironically, I received a letter today on this very subject, which I will cover in my blog.
具有讽刺意味的是,我今天收到了一封关于这个主题的信,我将在我的博客中介绍。
Update:
更新:
Recursive CTE
's in SQL Server
are no more than CONNECT BY
in disguise. See this article in my blog for shocking details:
递归CTE
的 inSQL Server
只不过CONNECT BY
是伪装。有关令人震惊的详细信息,请参阅我博客中的这篇文章:
回答by Aleksander Kmetec
PostgreSQL supports WITH-style hierarchical queries, but doesn't have any automatic cycle detection. This means that you need to write your own and the number of rows returned depends on the way you specify join conditions in the recursive part of the query.
PostgreSQL 支持 WITH 样式的分层查询,但没有任何自动循环检测。这意味着您需要自己编写,返回的行数取决于您在查询的递归部分指定连接条件的方式。
Both examples use an array if IDs (called all_ids) to detect loops:
两个示例都使用数组 if IDs(称为 all_ids)来检测循环:
WITH recursive tr (id, parent_id, all_ids, cycle) AS (
SELECT id, parent_id, ARRAY[id], false
FROM t
WHERE id = 1
UNION ALL
SELECT t.id, t.parent_id, all_ids || t.id, t.id = ANY(all_ids)
FROM t
JOIN tr ON t.parent_id = tr.id AND NOT cycle)
SELECT id, parent_id, cycle
FROM tr;
id | parent_id | cycle
----+-----------+-------
1 | 2 | f
2 | 1 | f
1 | 2 | t
WITH recursive tr (id, parent_id, all_ids, cycle) AS (
SELECT id, parent_id, ARRAY[id], false
FROM t
WHERE id = 1
UNION ALL
SELECT t.id, t.parent_id, all_ids || t.id, (EXISTS(SELECT 1 FROM t AS x WHERE x.id = t.parent_id))
FROM t
JOIN tr ON t.parent_id = tr.id
WHERE NOT t.id = ANY(all_ids))
SELECT id, parent_id, cycle
FROM tr;
id | parent_id | cycle
----+-----------+-------
1 | 2 | f
2 | 1 | t
回答by Andomar
AFAIK:
AFAIK:
- MySQL doesn't support recursive CTE's
- SQL Sever does not support cycle detection in recursive CTE's
- MySQL 不支持递归 CTE
- SQL Sever 不支持递归 CTE 中的循环检测
回答by wallyk
MySQL Server version 5.0.45 didn't like with
:
MySQL Server 5.0.45 版不喜欢with
:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'with tr (id, parent_id) as (select id, parent_id from t where id = 1 union all s' at line 1.
ERROR 1064 (42000):您的 SQL 语法有错误;检查与您的 MySQL 服务器版本相对应的手册,了解在第 1 行使用“with tr (id, parent_id) as (select id, parent_id from t where id = 1 union all s”)附近的正确语法。
回答by Dr Y Wit
Results of the connect by may not always be intuitive.
connect by 的结果可能并不总是直观的。
Below queries demonstrate different approaches to detect cycles starting with id = 3
for the graph on the picture.
下面的查询演示id = 3
了从图片上的图形开始检测循环的不同方法。
create table graph (id, id_parent) as
(select 2, 1 from dual
union all select 3, 1 from dual
union all select 4, 3 from dual
union all select 5, 4 from dual
union all select 3, 5 from dual)
SQL> select level lvl, graph.*, connect_by_iscycle cycle
2 from graph
3 start with id = 3
4 connect by nocycle prior id = id_parent;
LVL ID ID_PARENT CYCLE
---------- ---------- ---------- ----------
1 3 1 0
2 4 3 0
3 5 4 1
1 3 5 0
2 4 3 0
3 5 4 1
6 rows selected.
SQL> select level lvl, graph.*, connect_by_iscycle cycle
2 from graph
3 start with id = 3
4 connect by nocycle prior id = id_parent
5 and prior id_parent is not null;
LVL ID ID_PARENT CYCLE
---------- ---------- ---------- ----------
1 3 1 0
2 4 3 0
3 5 4 0
4 3 5 1
1 3 5 0
2 4 3 0
3 5 4 1
7 rows selected.
SQL> with t(id, id_parent) as
2 (select *
3 from graph
4 where id = 3
5 union all
6 select g.id, g.id_parent
7 from t
8 join graph g
9 on t.id = g.id_parent)
10 search depth first by id set ord
11 cycle id set cycle to 1 default 0
12 select * from t;
ID ID_PARENT ORD C
---------- ---------- ---------- -
3 1 1 0
4 3 2 0
5 4 3 0
3 5 4 1
3 5 5 0
4 3 6 0
5 4 7 0
3 5 8 1
8 rows selected.
Node with id = 3
has two parents so Oracle traverses two cycles in this example.
节点 withid = 3
有两个父节点,因此 Oracle 在此示例中遍历两个循环。
(1, 3) -> (3, 4) -> (4, 5) -> (5, 3)
and
和
(5, 3) -> (3, 4) -> (4, 5)
Edge (5, 3) is missing from the result of the first query and first cycle. At the same time edge (5, 3) appears in the result for the third query and second cycle twice.
第一次查询和第一次循环的结果中缺少边 (5, 3)。同时edge (5, 3) 出现在第三次查询和第二次循环的结果中。
Why so? You can check description of the cycle detection logic in the answer provided by Quassnoi. In plain English it means that
为什么这样?您可以在 Quassnoi 提供的答案中查看循环检测逻辑的描述。用简单的英语来说,这意味着
(1) connect by detects a cycle if child ID for current rowis part of IDs visited so far
(2) rec with detects a cycle if ID for current rowis part of IDs visited so far
(1) 如果当前行的子 ID是迄今为止访问过的 ID 的一部分,connect by 会检测到一个循环
(2) 如果当前行的 ID 是迄今为止访问过的 ID 的一部分,则 rec with 检测到一个循环
Result of the second query looks the most natural although there is additional predicate and prior id_parent is not null
. In this case
尽管有额外的谓词,但第二个查询的结果看起来最自然and prior id_parent is not null
。在这种情况下
(3) it detects a cycle if ID for current rowis part of parent IDsvisited so far
(3) 如果当前行的ID是迄今为止访问过的父 ID 的一部分,它会检测到一个循环
All this conditions are implemented in columns cnt1, cnt2, cnt3 in below query.
所有这些条件都在以下查询的 cnt1、cnt2、cnt3 列中实现。
SQL> with t(id, id_parent, path_id, path_id_parent, cnt1, cnt2, cnt3) as
2 (select g.*,
3 cast('->' || g.id as varchar2(4000)),
4 cast('->' || g.id_parent as varchar2(4000)),
5 0,
6 0,
7 0
8 from graph g
9 where id = 3
10 union all
11 select g.id,
12 g.id_parent,
13 t.path_id || '->' || g.id,
14 t.path_id_parent || '->' || g.id_parent,
15 regexp_count(t.path_id || '->', '->' ||
16 (select id from graph c where c.id_parent = g.id) || '->'),
17 regexp_count(t.path_id || '->', '->' || g.id || '->'),
18 regexp_count(t.path_id_parent || '->', '->' || g.id || '->')
19 from t
20 join graph g
21 on t.id = g.id_parent
22 -- and t.cnt1 = 0
23 -- and t.cnt2 = 0
24 -- and t.cnt3 = 0
25 )
26 search depth first by id set ord
27 cycle id set cycle to 1 default 0
28 select * from t;
ID ID_PARENT PATH_ID PATH_ID_PARENT CNT1 CNT2 CNT3 ORD C
---------- ---------- --------------- --------------- ---- ---- ---- ---------- -
3 1 ->3 ->1 0 0 0 1 0
4 3 ->3->4 ->1->3 0 0 0 2 0
5 4 ->3->4->5 ->1->3->4 1 0 0 3 0
3 5 ->3->4->5->3 ->1->3->4->5 1 1 1 4 1
3 5 ->3 ->5 0 0 0 5 0
4 3 ->3->4 ->5->3 0 0 0 6 0
5 4 ->3->4->5 ->5->3->4 1 0 1 7 0
3 5 ->3->4->5->3 ->5->3->4->5 1 1 1 8 1
8 rows selected.
If you uncomment filter by cnt1/cnt2/cnt3 and remove "cycle id set cycle to 1 default 0" then query will return result as corresponding query above. In other words you can avoid cycle clause
and implement whatever cycle detection logic you find more intuitive.
如果您取消对 cnt1/cnt2/cnt3 过滤器的注释并删除“cycle id set cycle to 1 default 0”,则查询将返回结果作为上述相应的查询。换句话说,您可以避免cycle clause
和实施任何您觉得更直观的循环检测逻辑。
Additional details about traversing hierarchies and cycle detection can be found in the book Oracle SQL Revealed.
可以在Oracle SQL Revealed一书中找到有关遍历层次结构和循环检测的其他详细信息。
回答by user3237729
WITH RECURSIVE s (master, slave, all_ids, cycle) AS
(
SELECT master, slave, ARRAY[master], false FROM binding WHERE master=3477
UNION ALL
SELECT d.master, d.slave, all_ids || d.master, d.slave = ANY(all_ids)
FROM
binding AS d
JOIN
s
ON (d.master = s.slave)
WHERE NOT d.master = ANY(all_ids)
)
SELECT *
FROM s;
I think better is this condition d.slave = ANY(all_ids)
我认为这种情况更好 d.slave = ANY(all_ids)