SQL 关系数据库和图数据库的比较
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13046442/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Comparison of Relational Databases and Graph Databases
提问by user782220
Can someone explain to me the advantages and disadvantages for a relation database such as MySQL compared to a graph database such as Neo4j?
有人可以向我解释一下关系数据库(如 MySQL)与图形数据库(如 Neo4j)相比的优缺点吗?
In SQL you have multiple tables with various ids linking them. Then you have to join to connect the tables. From the perspective of a newbie why would you design the database to require a join rather than having the connections explicit as edges from the start as with a graph database. Conceptually it would make no sense to a newbie. Presumably there is a very technical but non-conceptual reason for this?
在 SQL 中,您有多个表,其中有各种 id 链接它们。然后您必须加入以连接表。从新手的角度来看,为什么要将数据库设计为需要连接,而不是像图形数据库那样从一开始就将连接显式作为边。从概念上讲,这对新手来说毫无意义。大概有一个非常技术性但非概念性的原因?
回答by dan1111
There actually is conceptual reasoning behind both styles. Wikipedia on the relational modeland graph databasesgives good overviews of this.
这两种风格背后实际上都有概念推理。关于关系模型和图形数据库的维基百科对此进行了很好的概述。
The primary difference is that in a graph database, the relationships are stored at the individual record level, while in a relational database, the structure is defined at a higher level (the table definitions).
主要区别在于,在图形数据库中,关系存储在单个记录级别,而在关系数据库中,结构定义在更高级别(表定义)。
This has important ramifications:
这有重要的影响:
- A relational database is much faster when operating on huge numbers of records. In a graph database, each record has to be examined individually during a query in order to determine the structure of the data, while this is known ahead of time in a relational database.
- Relational databases use less storage space, because they don't have to store all of those relationships.
- 在处理大量记录时,关系数据库要快得多。在图形数据库中,必须在查询期间单独检查每条记录以确定数据的结构,而这在关系数据库中是提前知道的。
- 关系数据库使用较少的存储空间,因为它们不必存储所有这些关系。
Storing all of the relationships at the individual-record level only makes sense if there is going to be a lot of variation in the relationships; otherwise you are just duplicating the same things over and over. This means that graph databases are well-suited to irregular, complex structures. But in the real world, most databases require regular, relatively simple structures. This is why relational databases predominate.
将所有关系存储在个人记录级别只有在关系中会有很多变化时才有意义;否则你只是一遍又一遍地复制相同的东西。这意味着图数据库非常适合不规则、复杂的结构。但在现实世界中,大多数数据库都需要规则的、相对简单的结构。这就是关系数据库占主导地位的原因。
回答by Jim Webber
The key difference between a graph and relational database is that relational databases work with sets while graph databases work with paths.
图和关系数据库之间的主要区别在于,关系数据库使用集合,而图数据库使用路径。
This manifests itself in unexpected and unhelpful ways for a RDBMS user. For example when trying to emulate path operations (e.g. friends of friends) by recursively joining in a relational database, query latency grows unpredictably and massively as does memory usage, not to mention that it tortures SQL to express those kinds of operations. More data means slower in a set-based database, even if you can delay the pain through judicious indexing.
对于 RDBMS 用户来说,这以意想不到的和无益的方式表现出来。例如,当试图通过递归加入关系数据库来模拟路径操作(例如朋友的朋友)时,查询延迟会像内存使用一样不可预测且大量增长,更不用说它折磨 SQL 来表达这些类型的操作。更多的数据意味着在基于集合的数据库中速度更慢,即使您可以通过明智的索引来延迟痛苦。
As Dan1111 hinted at, most graph databases don't suffer this kind of join pain because they express relationships at a fundamental level. That is, relationships physically exist on disk and they are named, directed, and can be themselves decorated with properties (this is called the property graph model, see: https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model). This means if you chose to, you could look at the relationships on disk and see how they "join" entities. Relationships are therefore first-class entities in a graph database and are semantically far stronger than those implied relationships reified at runtime in a relational store.
正如 Dan1111 暗示的那样,大多数图数据库不会遭受这种连接痛苦,因为它们在基本级别表达关系。也就是说,关系物理存在于磁盘上,它们被命名、定向,并且可以用属性装饰它们自己(这称为属性图模型,请参阅:https: //github.com/tinkerpop/blueprints/wiki/Property-Graph -模型)。这意味着如果您选择这样做,您可以查看磁盘上的关系并查看它们如何“加入”实体。因此,关系是图形数据库中的一流实体,并且在语义上远强于那些在运行时在关系存储中具体化的隐含关系。
So why should you care? For two reasons:
那你为什么要关心?有两个原因:
- Graph databases are much faster than relational databases for connected data - a strength of the underlying model. A consequence of this is that query latency in a graph database is proportional to how much of the graph you choose to explore in a query, and is not proportional to the amount of data stored, thus defusing the join bomb.
- Graph databases make modelling and querying much more pleasant meaning faster development and fewer WTF moments. For example expressing friend-of-friend for a typical social network in Neo4j's Cypher query language is just
MATCH (me)-[:FRIEND]->()-[:FRIEND]->(foaf) RETURN foaf
.
- 对于连接数据,图形数据库比关系数据库快得多——这是底层模型的一个优势。这样做的结果是,图形数据库中的查询延迟与您在查询中选择探索的图形数量成正比,而不与存储的数据量成正比,从而消除了连接炸弹。
- 图数据库使建模和查询变得更加愉快,这意味着更快的开发和更少的 WTF 时刻。例如,用 Neo4j 的 Cypher 查询语言为典型的社交网络表达朋友的朋友只是
MATCH (me)-[:FRIEND]->()-[:FRIEND]->(foaf) RETURN foaf
.
回答by Walter Mitty
Dan1111 has already given an answer flagged as correct. A couple of additional points are worth noting in passing.
Dan1111 已经给出了一个标记为正确的答案。还有几点值得注意。
First, in almost every implementation of graph databases, the records are "pinned" because there are an unknown number of pointers pointing at the record in its current location. This means that a record cannot be shuffled to a new location without either leaving a forwarding address at the old location or breaking an unknown number of pointers.
首先,在几乎所有图形数据库的实现中,记录都是“固定”的,因为有未知数量的指针指向当前位置的记录。这意味着在不将转发地址留在旧位置或破坏未知数量的指针的情况下,无法将记录混洗到新位置。
Theoretically, one could shuffle all the records at once and figure out a way to locate and repair all the pointers. In practice this is an operation that could take weeks on a large graph database, during which time the database would have to be off the air. It's just not feasible.
理论上,可以一次对所有记录进行打乱,并找到一种方法来定位和修复所有指针。在实践中,这是一项在大型图形数据库上可能需要数周时间的操作,在此期间数据库将不得不关闭。它只是不可行。
By contrast, in a relational database, records can be reshuffled on a fairly large scale, and the only thing that has to be done is to rebuild any indexes that have been affected. This is a fairly large operation, but nowhere near as large as the equivalent for a graph database.
相比之下,在关系数据库中,记录可以在相当大的范围内进行重组,唯一要做的就是重建任何受到影响的索引。这是一个相当大的操作,但远不及图形数据库的等效操作。
The second point worth noting in passing is that the world wide web can be seen as a gigantic graph database. Web pages contain hyperlinks, and hyperlinks reference, among other things, other web pages. The reference is via URLs, which function like pointers.
顺便提一下的第二点是,万维网可以被视为一个巨大的图形数据库。网页包含超链接,超链接引用其他网页等。引用是通过 URL 进行的,其功能类似于指针。
When a web page is moved to a different URL without leaving a forwarding address at the old URL, an unknown number of hyperlinks will become broken. These broken links then give rise to the dreaded, "Error 404: page not found" message that interrupts the pleasure of so many surfers.
当一个网页被移动到一个不同的 URL 而没有在旧的 URL 上留下转发地址时,未知数量的超链接将被破坏。这些断开的链接会导致可怕的“错误 404:页面未找到”消息,中断了许多冲浪者的乐趣。
回答by Uli Bethke
With a relational database we can model and query a graph by using foreign keys and self-joins. Just because RDBMS' contain the word relational does not mean that they are good at handling relationships. The word relational in RDBMS stems from relational algebra and not from relationship. In an RDBMS, the relationship itself does not exist as an object in its own right. It either needs to be represented explicitly as a foreign key or implicitly as a value in a link table (when using a generic/universal modelling approach). Links between data sets are stored in the data itself.
使用关系数据库,我们可以使用外键和自连接来建模和查询图形。仅仅因为 RDBMS 包含关系这个词并不意味着它们擅长处理关系。RDBMS 中的关系这个词源于关系代数而不是关系。在 RDBMS 中,关系本身并不作为对象存在。它需要显式表示为外键或隐式表示为链接表中的值(当使用通用/通用建模方法时)。数据集之间的链接存储在数据本身中。
The more we increase the search depth in a relational database the more self-joins we need to perform and the more our query performance suffers. The deeper we go in our hierarchy the more tables we need to join and the slower our query gets. Mathematically the cost grows exponentially in a relational database. In other words the more complex our queries and relationships get the more we benefit from a graph versus a relational database. We don't have performance problems in a graph database when navigating the graph. This is because a graph database stores the relationships as separate objects. However, the superior read performance comes at the cost of slower writes.
我们在关系数据库中增加的搜索深度越多,我们需要执行的自联接越多,我们的查询性能受到的影响就越大。我们在层次结构中越深入,需要加入的表越多,查询速度就越慢。从数学上讲,关系数据库中的成本呈指数增长。换句话说,我们的查询和关系越复杂,我们就越能从图与关系数据库中受益。导航图形时,我们在图形数据库中没有性能问题。这是因为图形数据库将关系存储为单独的对象。然而,卓越的读取性能是以写入速度较慢为代价的。
In certain situations it is easier to change the data model in a graph database than it is in an RDBMS, e.g. in an RDBMS if I change a table relationship from 1:n to m:n I need to apply DDL with potential downtime.
在某些情况下,在图形数据库中更改数据模型比在 RDBMS 中更容易,例如在 RDBMS 中,如果我将表关系从 1:n 更改为 m:n,我需要应用 DDL 并有潜在的停机时间。
RDBMS has on the other hand advantages in other areas, e.g. aggregating data or doing timestamped version control on data.
另一方面,RDBMS 在其他领域具有优势,例如聚合数据或对数据进行时间戳版本控制。
I discuss some of the other pros and cons in my blog post on graph databases for data warehousing
我在关于用于数据仓库的图形数据库的博客文章中讨论了其他一些优缺点
回答by Mohammad Akbari
While the relational model can easily represent the data that is contained in a graph model, we face two significant problems in practice:
虽然关系模型可以轻松表示图模型中包含的数据,但我们在实践中面临两个重要问题:
- SQL lacks the syntax to easily perform graph traversal, especially traversals where the depth is unknown or unbounded. For instance, using SQL to determine friends of your friends is easy enough, but it is hard to solve the “degrees of separation” problem.
- Performance degrades quickly as we traverse the graph. Each level of traversal adds significantly to query response time.
- SQL 缺乏轻松执行图遍历的语法,尤其是深度未知或无界的遍历。例如,使用 SQL 确定您的朋友的朋友很容易,但很难解决“分离度”问题。
- 当我们遍历图形时,性能会迅速下降。每个级别的遍历都会显着增加查询响应时间。
Reference: Next Generation Databases
参考:下一代数据库