SQL 将平面表解析为树的最有效/最优雅的方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/192220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:53:31  来源:igfitidea点击:

What is the most efficient/elegant way to parse a flat table into a tree?

sqlalgorithmrecursiontreehierarchical-data

提问by Tomalak

Assume you have a flat table that stores an ordered tree hierarchy:

假设您有一个存储有序树层次结构的平面表:

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

Here's a diagram, where we have [id] Name. Root node 0 is fictional.

这是一个图表,我们有[id] Name. 根节点 0 是虚构的。

                       [0] ROOT
                          /    \ 
              [1] Node 1          [3] Node 2
              /       \                   \
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly indented tree?

您将使用什么简约方法将其作为正确排序、正确缩进的树输出到 HTML(或文本,就此而言)?

Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly.

进一步假设你只有基本的数据结构(数组和哈希图),没有带有父/子引用的花哨对象,没有 ORM,没有框架,只有你的两只手。该表表示为一个结果集,可以随机访问。

Pseudo code or plain English is okay, this is purely a conceptional question.

伪代码或简单的英文都可以,这纯粹是一个概念问题。

Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS?

额外问题:有没有一种从根本上更好的方法来在 RDBMS 中存储这样的树结构?



EDITS AND ADDITIONS

编辑和补充

To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted.

回答一位评论者(Mark Bessey的)的问题:不需要根节点,因为它永远不会被显示。ParentId = 0 是表达“这些是顶级”的约定。Order 列定义了如何对具有相同父节点的节点进行排序。

The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay.

我所说的“结果集”可以被描绘成一个哈希图数组(保持在那个术语中)。对于我的例子,本来就已经存在了。有些答案会加倍努力并首先构建它,但这没关系。

The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though.

树可以任意深。每个节点可以有 N 个子节点。不过,我并没有完全想到“数百万个条目”树。

Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable.

不要误以为我选择的节点命名('Node 1.1.1')是可以依赖的。节点同样可以称为“Frank”或“Bob”,没有暗示命名结构,这只是为了使其可读。

I have posted my own solution so you guys can pull it to pieces.

我已经发布了我自己的解决方案,所以你们可以把它撕成碎片。

采纳答案by Bill Karwin

Now that MySQL 8.0 supports recursive queries, we can say that all popular SQL databases support recursive queriesin standard syntax.

现在MySQL 8.0 支持递归查询,我们可以说所有流行的 SQL 数据库都支持标准语法的递归查询

WITH RECURSIVE MyTree AS (
    SELECT * FROM MyTable WHERE ParentId IS NULL
    UNION ALL
    SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

I tested recursive queries in MySQL 8.0 in my presentation Recursive Query Throwdownin 2017.

我在 2017 年的Recursive Query Throwdown演示文稿中测试了 MySQL 8.0 中的递归查询

Below is my original answer from 2008:

以下是我 2008 年的原始答案:



There are several ways to store tree-structured data in a relational database. What you show in your example uses two methods:

有几种方法可以在关系数据库中存储树结构数据。您在示例中显示的内容使用两种方法:

  • Adjacency List(the "parent" column) and
  • Path Enumeration(the dotted-numbers in your name column).
  • 邻接表(“父”列)和
  • 路径枚举(姓名列中的点号)。

Another solution is called Nested Sets, and it can be stored in the same table too. Read "Trees and Hierarchies in SQL for Smarties" by Joe Celko for a lot more information on these designs.

另一种解决方案称为Nested Sets,它也可以存储在同一个表中。有关这些设计的更多信息,请阅读Joe Celko撰写的Smarties 中 SQL 中的树和层次结构”。

I usually prefer a design called Closure Table(aka "Adjacency Relation") for storing tree-structured data. It requires another table, but then querying trees is pretty easy.

我通常更喜欢一种称为闭包表(又名“邻接关系”)的设计来存储树结构数据。它需要另一个表,但是查询树非常容易。

I cover Closure Table in my presentation Models for Hierarchical Data with SQL and PHPand in my book SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

我在我的SQL 和 PHP 分层数据演示模型和我的书SQL 反模式:避免数据库编程的陷阱中介绍了闭包表。

CREATE TABLE ClosureTable (
  ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
  descendant_id INT NOT NULL REFERENCES FlatTable(id),
  PRIMARY KEY (ancestor_id, descendant_id)
);

Store all paths in the Closure Table, where there is a direct ancestry from one node to another. Include a row for each node to reference itself. For example, using the data set you showed in your question:

将所有路径存储在闭合表中,其中存在从一个节点到另一个节点的直接祖先。为每个节点包含一行以引用自身。例如,使用您在问题中显示的数据集:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
  (1,1), (1,2), (1,4), (1,6),
  (2,2), (2,4),
  (3,3), (3,5),
  (4,4),
  (5,5),
  (6,6);

Now you can get a tree starting at node 1 like this:

现在你可以得到一个从节点 1 开始的树,如下所示:

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

The output (in MySQL client) looks like the following:

输出(在 MySQL 客户端中)如下所示:

+----+
| id |
+----+
|  1 | 
|  2 | 
|  4 | 
|  6 | 
+----+

In other words, nodes 3 and 5 are excluded, because they're part of a separate hierarchy, not descending from node 1.

换句话说,节点 3 和 5 被排除在外,因为它们是单独层次结构的一部分,而不是从节点 1 下降。



Re: comment from e-satis about immediate children (or immediate parent). You can add a "path_length" column to the ClosureTableto make it easier to query specifically for an immediate child or parent (or any other distance).

回复:来自 e-satis 的关于直系子女(或直系父母)的评论。您可以向 中添加“ path_length” 列,ClosureTable以便更轻松地专门查询直系子代或父代(或任何其他距离)。

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
  (1,1,0), (1,2,1), (1,4,2), (1,6,1),
  (2,2,0), (2,4,1),
  (3,3,0), (3,5,1),
  (4,4,0),
  (5,5,0),
  (6,6,0);

Then you can add a term in your search for querying the immediate children of a given node. These are descendants whose path_lengthis 1.

然后,您可以在搜索中添加一个术语来查询给定节点的直接子节点。这些path_length是 1 的后代。

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
  AND path_length = 1;

+----+
| id |
+----+
|  2 | 
|  6 | 
+----+


Re comment from @ashraf: "How about sorting the whole tree [by name]?"

来自@ashraf 的评论:“[按名称] 对整棵树进行排序怎么样?”

Here's an example query to return all nodes that are descendants of node 1, join them to the FlatTable that contains other node attributes such as name, and sort by the name.

这是一个示例查询,用于返回节点 1 的所有后代节点,将它们连接到包含其他节点属性(如 )的 FlatTable name,并按名称排序。

SELECT f.name
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;


Re comment from @Nate:

来自@Nate 的评论:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id) 
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
WHERE a.ancestor_id = 1 
GROUP BY a.descendant_id 
ORDER BY f.name

+------------+-------------+
| name       | breadcrumbs |
+------------+-------------+
| Node 1     | 1           |
| Node 1.1   | 1,2         |
| Node 1.1.1 | 1,2,4       |
| Node 1.2   | 1,6         |
+------------+-------------+


A user suggested an edit today. SO moderators approved the edit, but I am reversing it.

一位用户今天建议进行编辑。SO 版主批准了编辑,但我正在撤销它。

The edit suggested that the ORDER BY in the last query above should be ORDER BY b.path_length, f.name, presumably to make sure the ordering matches the hierarchy. But this doesn't work, because it would order "Node 1.1.1" after "Node 1.2".

编辑建议上面最后一个查询中的 ORDER BY 应该是ORDER BY b.path_length, f.name,大概是为了确保排序与层次结构相匹配。但这不起作用,因为它会在“Node 1.2”之后订购“Node 1.1.1”。

If you want the ordering to match the hierarchy in a sensible way, that is possible, but not simply by ordering by the path length. For example, see my answer to MySQL Closure Table hierarchical database - How to pull information out in the correct order.

如果您希望排序以合理的方式匹配层次结构,这是可能的,但不能简单地按路径长度排序。例如,请参阅我对MySQL Closure Table 分层数据库的回答- 如何以正确的顺序提取信息

回答by Jonny Buchanan

If you use nested sets (sometimes referred to as Modified Pre-order Tree Traversal) you can extract the entire tree structure or any subtree within it in tree order with a single query, at the cost of inserts being more expensive, as you need to manage columns which describe an in-order path through thee tree structure.

如果您使用嵌套集(有时称为修改前序树遍历),您可以使用单个查询以树顺序提取整个树结构或其中的任何子树,但插入成本更高,因为您需要管理通过树结构描述有序路径的列。

For django-mptt, I used a structure like this:

对于django-mptt,我使用了这样的结构:

id  parent_id  tree_id  level  lft  rght
--  ---------  -------  -----  ---  ----
 1       null        1      0    1    14
 2          1        1      1    2     7
 3          2        1      2    3     4
 4          2        1      2    5     6
 5          1        1      1    8    13
 6          5        1      2    9    10
 7          5        1      2    11   12

Which describes a tree which looks like this (with idrepresenting each item):

它描述了一个看起来像这样的树(id代表每个项目):

 1
 +-- 2
 |   +-- 3
 |   +-- 4
 |
 +-- 5
     +-- 6
     +-- 7

Or, as a nested set diagram which makes it more obvious how the lftand rghtvalues work:

或者,作为嵌套集图,它使lftrght值的工作方式更加明显:

 __________________________________________________________________________
|  Root 1                                                                  |
|   ________________________________    ________________________________   |
|  |  Child 1.1                     |  |  Child 1.2                     |  |
|  |   ___________    ___________   |  |   ___________    ___________   |  |
|  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  |  |  C 1.2.2  |  |  |
1  2  3___________4  5___________6  7  8  9___________10 11__________12 13 14
|  |________________________________|  |________________________________|  |
|__________________________________________________________________________|

As you can see, to get the entire subtree for a given node, in tree order, you simply have to select all rows which have lftand rghtvalues between its lftand rghtvalues. It's also simple to retrieve the tree of ancestors for a given node.

正如你看到的,让整个子树给定节点,在树的顺序,你只需要选择哪个都行lftrght其之间的值lftrght值。检索给定节点的祖先树也很简单。

The levelcolumn is a bit of denormalisation for convenience more than anything and the tree_idcolumn allows you to restart the lftand rghtnumbering for each top-level node, which reduces the number of columns affected by inserts, moves and deletions, as the lftand rghtcolumns have to be adjusted accordingly when these operations take place in order to create or close gaps. I made some development notesat the time when I was trying to wrap my head around the queries required for each operation.

level为了方便起见,该列有点非规范化,并且该tree_id列允许您重新启动每个顶级节点的lftrght编号,这减少了受插入、移动和删除影响的列数,因为lftrght列必须是当这些操作发生时进行相应的调整,以创建或缩小差距。当我试图围绕每个操作所需的查询时,我做了一些开发笔记

In terms of actually working with this data to display a tree, I created a tree_item_iteratorutility function which, for each node, should give you sufficient information to generate whatever kind of display you want.

就实际使用这些数据来显示树而言,我创建了一个tree_item_iterator实用函数,对于每个节点,它应该为您提供足够的信息来生成您想要的任何类型的显示。

More info about MPTT:

有关 MPTT 的更多信息:

回答by Micha? Ko?odziejski

It's a quite old question, but as it's got many views I think it's worth to present an alternative, and in my opinion very elegant, solution.

这是一个相当古老的问题,但由于它有很多观点,我认为值得提出一个替代方案,在我看来非常优雅的解决方案。

In order to read a tree structure you can use recursive Common Table Expressions(CTEs). It gives a possibility to fetch whole tree structure at once, have the information about the level of the node, its parent node and order within children of the parent node.

为了读取树结构,您可以使用递归公用表表达式(CTE)。它提供了一次获取整个树结构的可能性,具有关于节点级别、其父节点和父节点子节点中的顺序的信息。

Let me show you how this would work in PostgreSQL 9.1.

让我向您展示这在 PostgreSQL 9.1 中是如何工作的。

  1. Create a structure

    CREATE TABLE tree (
        id int  NOT NULL,
        name varchar(32)  NOT NULL,
        parent_id int  NULL,
        node_order int  NOT NULL,
        CONSTRAINT tree_pk PRIMARY KEY (id),
        CONSTRAINT tree_tree_fk FOREIGN KEY (parent_id) 
          REFERENCES tree (id) NOT DEFERRABLE
    );
    
    
    insert into tree values
      (0, 'ROOT', NULL, 0),
      (1, 'Node 1', 0, 10),
      (2, 'Node 1.1', 1, 10),
      (3, 'Node 2', 0, 20),
      (4, 'Node 1.1.1', 2, 10),
      (5, 'Node 2.1', 3, 10),
      (6, 'Node 1.2', 1, 20);
    
  2. Write a query

    WITH RECURSIVE 
    tree_search (id, name, level, parent_id, node_order) AS (
      SELECT 
        id, 
        name,
        0,
        parent_id, 
        1 
      FROM tree
      WHERE parent_id is NULL
    
      UNION ALL 
      SELECT 
        t.id, 
        t.name,
        ts.level + 1, 
        ts.id, 
        t.node_order 
      FROM tree t, tree_search ts 
      WHERE t.parent_id = ts.id 
    ) 
    SELECT * FROM tree_search 
    WHERE level > 0 
    ORDER BY level, parent_id, node_order;
    

    Here are the results:

     id |    name    | level | parent_id | node_order 
    ----+------------+-------+-----------+------------
      1 | Node 1     |     1 |         0 |         10
      3 | Node 2     |     1 |         0 |         20
      2 | Node 1.1   |     2 |         1 |         10
      6 | Node 1.2   |     2 |         1 |         20
      5 | Node 2.1   |     2 |         3 |         10
      4 | Node 1.1.1 |     3 |         2 |         10
    (6 rows)
    

    The tree nodes are ordered by a level of depth. In the final output we would present them in the subsequent lines.

    For each level, they are ordered by parent_id and node_order within the parent. This tells us how to present them in the output - link node to the parent in this order.

    Having such a structure it wouldn't be difficult to make a really nice presentation in HTML.

    Recursive CTEs are available in PostgreSQL, IBM DB2, MS SQL Server and Oracle.

    If you'd like to read more on recursive SQL queries, you can either check the documentation of your favourite DBMS or read my two articles covering this topic:

  1. 创建一个结构

    CREATE TABLE tree (
        id int  NOT NULL,
        name varchar(32)  NOT NULL,
        parent_id int  NULL,
        node_order int  NOT NULL,
        CONSTRAINT tree_pk PRIMARY KEY (id),
        CONSTRAINT tree_tree_fk FOREIGN KEY (parent_id) 
          REFERENCES tree (id) NOT DEFERRABLE
    );
    
    
    insert into tree values
      (0, 'ROOT', NULL, 0),
      (1, 'Node 1', 0, 10),
      (2, 'Node 1.1', 1, 10),
      (3, 'Node 2', 0, 20),
      (4, 'Node 1.1.1', 2, 10),
      (5, 'Node 2.1', 3, 10),
      (6, 'Node 1.2', 1, 20);
    
  2. 写一个查询

    WITH RECURSIVE 
    tree_search (id, name, level, parent_id, node_order) AS (
      SELECT 
        id, 
        name,
        0,
        parent_id, 
        1 
      FROM tree
      WHERE parent_id is NULL
    
      UNION ALL 
      SELECT 
        t.id, 
        t.name,
        ts.level + 1, 
        ts.id, 
        t.node_order 
      FROM tree t, tree_search ts 
      WHERE t.parent_id = ts.id 
    ) 
    SELECT * FROM tree_search 
    WHERE level > 0 
    ORDER BY level, parent_id, node_order;
    

    结果如下:

     id |    name    | level | parent_id | node_order 
    ----+------------+-------+-----------+------------
      1 | Node 1     |     1 |         0 |         10
      3 | Node 2     |     1 |         0 |         20
      2 | Node 1.1   |     2 |         1 |         10
      6 | Node 1.2   |     2 |         1 |         20
      5 | Node 2.1   |     2 |         3 |         10
      4 | Node 1.1.1 |     3 |         2 |         10
    (6 rows)
    

    树节点按深度级别排序。在最终输出中,我们将在后续行中呈现它们。

    对于每个级别,它们在父级中按 parent_id 和 node_order 排序。这告诉我们如何在输出中将它们呈现给父节点的链接节点。

    有了这样的结构,用 HTML 制作一个非常好的演示文稿并不困难。

    递归 CTE 在PostgreSQL、IBM DB2、MS SQL Server 和 Oracle中可用。

    如果您想阅读有关递归 SQL 查询的更多信息,您可以查看您最喜欢的 DBMS 的文档或阅读我的两篇涵盖该主题的文章:

回答by Eric Weilnau

As of Oracle 9i, you can use CONNECT BY.

从 Oracle 9i 开始,您可以使用 CONNECT BY。

SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name"
FROM (SELECT * FROM TMP_NODE ORDER BY "Order")
CONNECT BY PRIOR "Id" = "ParentId"
START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0)

As of SQL Server 2005, you can use a recursive common table expression (CTE).

从 SQL Server 2005 开始,您可以使用递归公用表表达式 (CTE)。

WITH [NodeList] (
  [Id]
  , [ParentId]
  , [Level]
  , [Order]
) AS (
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , 0 AS [Level]
    , CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
  WHERE [Node].[ParentId] = 0
  UNION ALL
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , [NodeList].[Level] + 1 AS [Level]
    , [NodeList].[Order] + '|'
      + CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
    INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[ParentId]
) SELECT REPLICATE(' ', [NodeList].[Level] * 4) + [Node].[Name] AS [Name]
FROM [Node]
  INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[Id]
ORDER BY [NodeList].[Order]

Both will output the following results.

两者都会输出以下结果。

Name
'Node 1'
'    Node 1.1'
'        Node 1.1.1'
'    Node 1.2'
'Node 2'
'    Node 2.1'

回答by bobobobo

Bill's answer is pretty gosh-darned good, this answer adds some things to it which makes me wish SO supported threaded answers.

比尔的答案真是太棒了,这个答案增加了一些东西,这让我希望SO支持线程答案。

Anyway I wanted to support the tree structure and the Order property. I included a single property in each Node called leftSiblingthat does the same thing Orderis meant to do in the original question (maintain left-to-right order).

无论如何,我想支持树结构和 Order 属性。我在每个 Node 中都包含了一个属性,该属性与原始问题中要做的leftSibling事情相同Order(保持从左到右的顺序)。

mysql> desc nodes ;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| name        | varchar(255) | YES  |     | NULL    |                |
| leftSibling | int(11)      | NO   |     | 0       |                |
+-------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> desc adjacencies;
+------------+---------+------+-----+---------+----------------+
| Field      | Type    | Null | Key | Default | Extra          |
+------------+---------+------+-----+---------+----------------+
| relationId | int(11) | NO   | PRI | NULL    | auto_increment |
| parent     | int(11) | NO   |     | NULL    |                |
| child      | int(11) | NO   |     | NULL    |                |
| pathLen    | int(11) | NO   |     | NULL    |                |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

More detail and SQL code on my blog.

更多细节和 SQL 代码在我的博客上

Thanks Bill your answer was helpful in getting started!

谢谢比尔,您的回答对入门很有帮助!

回答by Oli

Well given the choice, I'd be using objects. I'd create an object for each record where each object has a childrencollection and store them all in an assoc array (/hashtable) where the Id is the key. And blitz through the collection once, adding the children to the relevant children fields. Simple.

如果可以选择,我会使用对象。我会为每个记录创建一个对象,其中每个对象都有一个children集合,并将它们全部存储在一个关联数组 (/hashtable) 中,其中 Id 是键。并快速浏览一次集合,将子项添加到相关的子项字段。简单的。

But because you're being no fun by restricting use of some good OOP, I'd probably iterate based on:

但是因为你通过限制使用一些好的 OOP 没有乐趣,我可能会基于以下内容进行迭代:

function PrintLine(int pID, int level)
    foreach record where ParentID == pID
        print level*tabs + record-data
        PrintLine(record.ID, level + 1)

PrintLine(0, 0)

Edit: this is similar to a couple of other entries, but I think it's slightly cleaner. One thing I'll add: this is extremely SQL-intensive. It's nasty. If you have the choice, go the OOP route.

编辑:这与其他几个条目类似,但我认为它更简洁一些。我要补充的一件事是:这是非常 SQL 密集型的。这是肮脏的如果可以选择,请走 OOP 路线。

回答by matt b

This was written quickly, and is neither pretty nor efficient (plus it autoboxes alot, converting between intand Integeris annoying!), but it works.

这写得很快,既不漂亮也不高效(加上它自动装箱很多,在int和之间转换Integer很烦人!),但它有效。

It probably breaks the rules since I'm creating my own objects but hey I'm doing this as a diversion from real work :)

它可能违反了规则,因为我正在创建自己的对象,但是嘿,我这样做是为了转移对实际工作的注意力:)

This also assumes that the resultSet/table is completely read into some sort of structure before you start building Nodes, which wouldn't be the best solution if you have hundreds of thousands of rows.

这还假设在您开始构建节点之前,结果集/表已完全读入某种结构,如果您有数十万行,这将不是最佳解决方案。

public class Node {

    private Node parent = null;

    private List<Node> children;

    private String name;

    private int id = -1;

    public Node(Node parent, int id, String name) {
        this.parent = parent;
        this.children = new ArrayList<Node>();
        this.name = name;
        this.id = id;
    }

    public int getId() {
        return this.id;
    }

    public String getName() {
        return this.name;
    }

    public void addChild(Node child) {
        children.add(child);
    }

    public List<Node> getChildren() {
        return children;
    }

    public boolean isRoot() {
        return (this.parent == null);
    }

    @Override
    public String toString() {
        return "id=" + id + ", name=" + name + ", parent=" + parent;
    }
}

public class NodeBuilder {

    public static Node build(List<Map<String, String>> input) {

        // maps id of a node to it's Node object
        Map<Integer, Node> nodeMap = new HashMap<Integer, Node>();

        // maps id of a node to the id of it's parent
        Map<Integer, Integer> childParentMap = new HashMap<Integer, Integer>();

        // create special 'root' Node with id=0
        Node root = new Node(null, 0, "root");
        nodeMap.put(root.getId(), root);

        // iterate thru the input
        for (Map<String, String> map : input) {

            // expect each Map to have keys for "id", "name", "parent" ... a
            // real implementation would read from a SQL object or resultset
            int id = Integer.parseInt(map.get("id"));
            String name = map.get("name");
            int parent = Integer.parseInt(map.get("parent"));

            Node node = new Node(null, id, name);
            nodeMap.put(id, node);

            childParentMap.put(id, parent);
        }

        // now that each Node is created, setup the child-parent relationships
        for (Map.Entry<Integer, Integer> entry : childParentMap.entrySet()) {
            int nodeId = entry.getKey();
            int parentId = entry.getValue();

            Node child = nodeMap.get(nodeId);
            Node parent = nodeMap.get(parentId);
            parent.addChild(child);
        }

        return root;
    }
}

public class NodePrinter {

    static void printRootNode(Node root) {
        printNodes(root, 0);
    }

    static void printNodes(Node node, int indentLevel) {

        printNode(node, indentLevel);
        // recurse
        for (Node child : node.getChildren()) {
            printNodes(child, indentLevel + 1);
        }
    }

    static void printNode(Node node, int indentLevel) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < indentLevel; i++) {
            sb.append("\t");
        }
        sb.append(node);

        System.out.println(sb.toString());
    }

    public static void main(String[] args) {

        // setup dummy data
        List<Map<String, String>> resultSet = new ArrayList<Map<String, String>>();
        resultSet.add(newMap("1", "Node 1", "0"));
        resultSet.add(newMap("2", "Node 1.1", "1"));
        resultSet.add(newMap("3", "Node 2", "0"));
        resultSet.add(newMap("4", "Node 1.1.1", "2"));
        resultSet.add(newMap("5", "Node 2.1", "3"));
        resultSet.add(newMap("6", "Node 1.2", "1"));

        Node root = NodeBuilder.build(resultSet);
        printRootNode(root);

    }

    //convenience method for creating our dummy data
    private static Map<String, String> newMap(String id, String name, String parentId) {
        Map<String, String> row = new HashMap<String, String>();
        row.put("id", id);
        row.put("name", name);
        row.put("parent", parentId);
        return row;
    }
}

回答by Konchog

There are really good solutions which exploit the internal btree representation of sql indices. This is based on some great research done back around 1998.

确实有很好的解决方案可以利用 sql 索引的内部 btree 表示。这是基于 1998 年左右完成的一些伟大研究。

Here is an example table (in mysql).

这是一个示例表(在 mysql 中)。

CREATE TABLE `node` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `tw` int(10) unsigned NOT NULL,
  `pa` int(10) unsigned DEFAULT NULL,
  `sz` int(10) unsigned DEFAULT NULL,
  `nc` int(11) GENERATED ALWAYS AS (tw+sz) STORED,
  PRIMARY KEY (`id`),
  KEY `node_tw_index` (`tw`),
  KEY `node_pa_index` (`pa`),
  KEY `node_nc_index` (`nc`),
  CONSTRAINT `node_pa_fk` FOREIGN KEY (`pa`) REFERENCES `node` (`tw`) ON DELETE CASCADE
)

The only fields necessary for the tree representation are:

树表示所需的唯一字段是:

  • tw: The Left to Right DFS Pre-order index, where root = 1.
  • pa: The reference (using tw) to the parent node, root has null.
  • sz: The size of the node's branch including itself.
  • nc: is used as syntactic sugar. it is tw+nc and represents the tw of the node's "next child".
  • tw:从左到右的 DFS 预排序索引,其中 root = 1。
  • pa:对父节点的引用(使用 tw),root 为 null。
  • sz:包括自身在内的节点分支的大小。
  • nc:用作语法糖。它是 tw+nc 并且代表节点的“下一个子节点”的 tw。

Here is an example 24 node population, ordered by tw:

这是一个示例 24 个节点填充,按 tw 排序:

+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|   2 | A       |  2 |    1 |   14 |   16 |
|   3 | AA      |  3 |    2 |    1 |    4 |
|   4 | AB      |  4 |    2 |    7 |   11 |
|   5 | ABA     |  5 |    4 |    1 |    6 |
|   6 | ABB     |  6 |    4 |    3 |    9 |
|   7 | ABBA    |  7 |    6 |    1 |    8 |
|   8 | ABBB    |  8 |    6 |    1 |    9 |
|   9 | ABC     |  9 |    4 |    2 |   11 |
|  10 | ABCD    | 10 |    9 |    1 |   11 |
|  11 | AC      | 11 |    2 |    4 |   15 |
|  12 | ACA     | 12 |   11 |    2 |   14 |
|  13 | ACAA    | 13 |   12 |    1 |   14 |
|  14 | ACB     | 14 |   11 |    1 |   15 |
|  15 | AD      | 15 |    2 |    1 |   16 |
|  16 | B       | 16 |    1 |    1 |   17 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
|  18 | D       | 23 |    1 |    1 |   24 |
|  19 | E       | 24 |    1 |    1 |   25 |
+-----+---------+----+------+------+------+

Every tree result can be done non-recursively. For instance, to get a list of ancestors of node at tw='22'

每个树结果都可以非递归地完成。例如,要获取 tw='22' 节点的祖先列表

Ancestors

祖先

select anc.* from node me,node anc 
where me.tw=22 and anc.nc >= me.tw and anc.tw <= me.tw 
order by anc.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

Siblings and children are trivial - just use pa field ordering by tw.

兄弟姐妹和孩子是微不足道的 - 只需使用 pa 字段按 tw 排序。

Descendants

后代

For example the set (branch) of nodes that are rooted at tw = 17.

例如,以 tw = 17 为根的节点集(分支)。

select des.* from node me,node des 
where me.tw=17 and des.tw < me.nc and des.tw >= me.tw 
order by des.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

Additional Notes

补充说明

This methodology is extremely useful for when there are a far greater number of reads than there are inserts or updates.

当读取次数远多于插入或更新次数时,此方法非常有用。

Because the insertion, movement, or updating of a node in the tree requires the tree to be adjusted, it is necessary to lock the table before commencing with the action.

由于树中节点的插入、移动或更新需要对树进行调整,因此在开始操作之前需要锁定表。

The insertion/deletion cost is high because the tw index and sz (branch size) values will need to be updated on all the nodes after the insertion point, and for all ancestors respectively.

插入/删除成本很高,因为需要在插入点之后的所有节点上更新 tw 索引和 sz(分支大小)值,并分别为所有祖先更新。

Branch moving involves moving the tw value of the branch out of range, so it is also necessary to disable foreign key constraints when moving a branch. There are, essentially four queries required to move a branch:

分支移动涉及到将分支的 tw 值移出范围,因此在移动分支时也需要禁用外键约束。移动分支基本上需要四个查询:

  • Move the branch out of range.
  • Close the gap that it left. (the remaining tree is now normalised).
  • Open the gap where it will go to.
  • Move the branch into it's new position.
  • 将分支移出范围。
  • 关闭它留下的间隙。(剩下的树现在已经标准化了)。
  • 打开它要去的地方。
  • 将分支移动到新位置。

Adjust Tree Queries

调整树查询

The opening/closing of gaps in the tree is an important sub-function used by create/update/delete methods, so I include it here.

树中间隙的打开/关闭是 create/update/delete 方法使用的一个重要子功能,因此我将其包含在此处。

We need two parameters - a flag representing whether or not we are downsizing or upsizing, and the node's tw index. So, for example tw=18 (which has a branch size of 5). Let's assume that we are downsizing (removing tw) - this means that we are using '-' instead of '+' in the updates of the following example.

我们需要两个参数 - 一个表示我们是缩小规模还是扩大规模的标志,以及节点的 tw 索引。因此,例如 tw=18(其分支大小为 5)。让我们假设我们正在缩小规模(删除 tw) - 这意味着我们在以下示例的更新中使用“-”而不是“+”。

We first use a (slightly altered) ancestor function to update the sz value.

我们首先使用(稍微改变的)祖先函数来更新 sz 值。

update node me, node anc set anc.sz = anc.sz - me.sz from 
node me, node anc where me.tw=18 
and ((anc.nc >= me.tw and anc.tw < me.pa) or (anc.tw=me.pa));

Then we need to adjust the tw for those whose tw is higher than the branch to be removed.

然后我们需要为那些tw高于要删除的分支的人调整tw。

update node me, node anc set anc.tw = anc.tw - me.sz from 
node me, node anc where me.tw=18 and anc.tw >= me.tw;

Then we need to adjust the parent for those whose pa's tw is higher than the branch to be removed.

然后我们需要为那些 pa 的 tw 高于要删除的分支的人调整父级。

update node me, node anc set anc.pa = anc.pa - me.sz from 
node me, node anc where me.tw=18 and anc.pa >= me.tw;

回答by Mark Bessey

You can emulate any other data structure with a hashmap, so that's not a terrible limitation. Scanning from the top to the bottom, you create a hashmap for each row of the database, with an entry for each column. Add each of these hashmaps to a "master" hashmap, keyed on the id. If any node has a "parent" that you haven't seen yet, create an placeholder entry for it in the master hashmap, and fill it in when you see the actual node.

您可以使用哈希图模拟任何其他数据结构,因此这不是一个可怕的限制。从上到下扫描,为数据库的每一行创建一个哈希图,每列都有一个条目。将这些哈希映射中的每一个添加到以 id 为键的“主”哈希映射中。如果任何节点具有您尚未看到的“父”,请在主哈希图中为它创建一个占位符条目,并在您看到实际节点时填写它。

To print it out, do a simple depth-first pass through the data, keeping track of indent level along the way. You can make this easier by keeping a "children" entry for each row, and populating it as you scan the data.

要打印出来,请执行简单的深度优先遍历数据,并在此过程中跟踪缩进级别。您可以通过为每一行保留一个“子”条目并在扫描数据时填充它来使这更容易。

As for whether there's a "better" way to store a tree in a database, that depends on how you're going to use the data. I've seen systems that had a known maximum depth that used a different table for each level in the hierarchy. That makes a lot of sense if the levels in the tree aren't quite equivalent after all (top level categories being different than the leaves).

至于在数据库中存储树是否有“更好”的方法,这取决于您将如何使用数据。我见过具有已知最大深度的系统,这些系统对层次结构中的每个级别使用不同的表。如果树中的级别毕竟不完全相同(顶级类别与叶子不同),这很有意义。

回答by wcm

Assuming that you know that the root elements are zero, here's the pseudocode to output to text:

假设您知道根元素为零,以下是输出到文本的伪代码:

function PrintLevel (int curr, int level)
    //print the indents
    for (i=1; i<=level; i++)
        print a tab
    print curr \n;
    for each child in the table with a parent of curr
        PrintLevel (child, level+1)


for each elementID where the parentid is zero
    PrintLevel(elementID, 0)