如何在 SQL 中表示数据树?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2175882/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to represent a data tree in SQL?
提问by Avi Harush
I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data. I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.
我正在编写一个由 Tree 和 TreeNode 组合而成的数据树结构。树将包含对数据的根和顶级操作。我正在使用 UI 库以 Windows 形式呈现树,我可以在其中将树绑定到 TreeView。
I will need to save this tree and nodes in the DB. What will be the best way to save the tree and to get the following features:
我需要在数据库中保存这棵树和节点。保存树并获得以下功能的最佳方法是什么:
- Intuitive implementation.
- Easy binding. Will be easy to move from the tree to the DB structure and back (if any)
- 直观的实施。
- 容易绑定。将很容易从树移动到数据库结构并返回(如果有)
I had 2 ideas. The first is to serialize the data into a one liner in a table. The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.
我有 2 个想法。第一种是将数据序列化为表中的一行。第二个是保存在表中,但是,当移动到数据实体时,我会在更改的节点上丢失表上的行状态。
Any ideas?
有任何想法吗?
回答by Quassnoi
The easiest implementation is adjacency liststructure:
最简单的实现是邻接表结构:
id parent_id data
However, some databases, particularly MySQL
, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL
lacks.
然而,一些数据库,特别是MySQL
,在处理这个模型时有一些问题,因为它需要运行递归查询的能力,而这正是它所MySQL
缺乏的。
Another model is nested sets:
另一个模型是嵌套集:
id lft rgt data
where lft
and rgt
are arbitrary values that define the hierarchy (any child's lft
, rgt
should be within any parent's lft
, rgt
)
其中lft
和rgt
是定义层次结构的任意值(任何孩子的lft
,rgt
应该在任何父母的lft
, 内rgt
)
This does not require recursive queries, but it slower and harder to maintain.
这不需要递归查询,但速度较慢且难以维护。
However, in MySQL
this can be improved using SPATIAL
abitilies.
但是,在MySQL
这方面可以使用SPATIAL
能力进行改进。
See these articles in my blog:
在我的博客中查看这些文章:
- Adjacency list vs. nested sets: PostgreSQL
- Adjacency list vs. nested sets: SQL Server
- Adjacency list vs. nested sets: Oracle
- Adjacency list vs. nested sets: MySQL
for more detailed explanations.
更详细的解释。
回答by Bj?rn
I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
我已将此关于 SQL-Antipatterns 的 slidshare 加入书签,其中讨论了几种替代方案:http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
The recommendation from there is to use a Closure Table (it's explained in the slides).
那里的建议是使用闭包表(幻灯片中对此进行了解释)。
Here is the summary (slide 77):
这是摘要(幻灯片 77):
| Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List | Easy | Hard | Easy | Yes
Path Enumeration | Easy | Easy | Hard | No
Nested Sets | Hard | Easy | Hard | No
Closure Table | Easy | Easy | Easy | Yes
回答by niutech
I'm suprised that nobody mentioned the materialized pathsolution, which is probably the fastest way of working with trees in standard SQL.
我很惊讶没有人提到物化路径解决方案,这可能是在标准 SQL 中处理树的最快方法。
In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.
在这种方法中,树中的每个节点都有一个列path,其中存储了从根到节点的完整路径。这涉及非常简单和快速的查询。
Have a look at the example table node:
看看示例表节点:
+---------+-------+
| node_id | path |
+---------+-------+
| 0 | |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1.4 |
| 5 | 2.5 |
| 6 | 2.6 |
| 7 | 2.6.7 |
| 8 | 2.6.8 |
| 9 | 2.6.9 |
+---------+-------+
In order to get the children of node x, you can write the following query:
为了获取节点x的子节点,您可以编写以下查询:
SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')
Keep in mind, that the column pathshould be indexed, in order to perform fast with the LIKEclause.
请记住,应该对列路径进行索引,以便使用LIKE子句快速执行。
回答by Gabriel Furstenheim
If you are using PostgreSQL you can use ltree
, a package in the contrib extension (comes by default) which implements the tree data structure.
如果您使用的是 PostgreSQL,则可以使用ltree
contrib 扩展(默认提供)中的一个包,它实现了树数据结构。
From the docs:
从文档:
CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);
You can do queries like:
您可以执行以下查询:
ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
path
------------------------------------
Top.Science
Top.Science.Astronomy
Top.Science.Astronomy.Astrophysics
Top.Science.Astronomy.Cosmology
(4 rows)
回答by Mark Byers
It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.
这取决于您将如何查询和更新数据。如果将所有数据存储在一行中,则它基本上是一个单元,您无法在不重写所有数据的情况下对其进行查询或部分更新。
If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL(MySQL specific, but the advice holds for many other databases too).
如果您想将每个元素存储为一行,您应该首先阅读在 MySQL 中管理分层数据(特定于 MySQL,但该建议也适用于许多其他数据库)。
If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do SELECT * WHERE head_id = @id
and get the whole tree in one non-recursive query, but it denormalizes the database.
如果您只访问整个树,邻接表模型很难在不使用递归查询的情况下检索根下的所有节点。如果您添加一个链接回头部的额外列,那么您可以SELECT * WHERE head_id = @id
在一个非递归查询中执行并获取整个树,但它会使数据库非规范化。
Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.
某些数据库具有自定义扩展,可以更轻松地存储和检索分层数据,例如 Oracle 具有CONNECT BY。
回答by Holger Waldmann
As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).
由于这是在 google 搜索中询问“sql 树”时的最佳答案,因此我将尝试从今天(2018 年 12 月)的角度更新它。
Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.
大多数答案暗示使用邻接列表既简单又缓慢,因此推荐其他方法。
Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.
从版本 8(2018 年 4 月发布)开始,MySQL 支持递归公用表表达式 (CTE)。MySQL 有点晚了,但这开辟了一个新的选择。
There is a tutorial herethat explains the use of recursive queries to manage an adjacency list.
有一个教程在这里,说明使用递归查询的管理邻接表。
As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).
由于递归现在完全在数据库引擎中运行,因此它比过去(当它必须在脚本引擎中运行时)快得多。
The blog heregives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.
此处的博客给出了一些测量值(这些测量值都是有偏差的,并且针对 postgres 而不是 MySQL),但它表明邻接列表不一定很慢。
So my conclusion today is:
所以我今天的结论是:
- The simple adjacency list may be fast enough if the database engine supports recursion.
- Do a benchmark with your own data and your own engine.
- Do not trust outdated recommendations to point out the "best" method.
- 如果数据库引擎支持递归,简单的邻接表可能足够快。
- 使用您自己的数据和您自己的引擎进行基准测试。
- 不要相信过时的建议会指出“最佳”方法。
回答by bigblind
The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits
最好的方法,我认为确实是给每个节点一个 id 和一个 parent_id,其中父 id 是父节点的 id。这有几个好处
- When you want to update a node, you only have to rewrite the data of that node.
- When you want to query only a certain node, you can get exactly the information you want, thus having less overhead on the database connection
- A lot of programming languages have functionality to transform mysql data into XML or json, which will make it easier to open up your application using an api.
- 当你想更新一个节点时,你只需要重写那个节点的数据。
- 当您只想查询某个节点时,可以准确获取您想要的信息,从而减少数据库连接的开销
- 许多编程语言都具有将 mysql 数据转换为 XML 或 json 的功能,这将使使用 api 打开应用程序变得更加容易。
回答by Kimvais
Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.
类似于“节点”表,其中每个节点行都包含父 ID(除了普通节点数据)。对于 root,父项为 NULL。
Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.
当然,这会使查找子项更加耗时,但这样实际的数据库将非常简单。