database 基于图形的数据库 (http://neo4j.org/) 的用例是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1000162/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:21:38  来源:igfitidea点击:

What are the use cases of Graph-based Databases (http://neo4j.org/)?

databaseneo4jgraph-databases

提问by Khangharoth

I have used Relational DB's a lot and decided to venture out on other types available.

我经常使用关系数据库,并决定尝试使用其他可用的类型。

This particular product looks good and promising: http://neo4j.org/

这个特殊的产品看起来不错,很有前途:http: //neo4j.org/

Has anyone used graph-based databases? What are the pros and cons from a usability prespective?

有没有人使用过基于图形的数据库?可用性方面的优缺点是什么?

Have you used these in a production environment? What was the requirement that prompted you to use them?

你在生产环境中使用过这些吗?促使您使用它们的要求是什么?

回答by Will Harris

I used a graph database in a previous job. We weren't using neo4j, it was an in-house thing built on top of Berkeley DB, but it was similar. It was used in production (it still is).

我在以前的工作中使用了图形数据库。我们没有使用neo4j,它是一个建立在Berkeley DB 之上的内部东西,但它是相似的。它被用于生产(现在仍然是)。

The reason we used a graph database was that the data being stored by the system and the operations the system was doing with the data were exactly the weak spot of relational databases and were exactly the strong spot of graph databases. The system needed to store collections of objects that lack a fixed schema and are linked together by relationships. To reason about the data, the system needed to do a lot of operations that would be a couple of traversals in a graph database, but that would be quite complex queries in SQL.

我们使用图数据库的原因是系统存储的数据以及系统对数据进行的操作正是关系数据库的弱点,也正是图数据库的强点。该系统需要存储缺乏固定模式并通过关系链接在一起的对象集合。为了对数据进行推理,系统需要执行许多操作,这些操作可能需要在图形数据库中进行几次遍历,但这将是 SQL 中非常复杂的查询。

The main advantages of the graph model were rapid development time and flexibility. We could quickly add new functionality without impacting existing deployments. If a potential customer wanted to import some of their own data and graft it on top of our model, it could usually be done on site by the sales rep. Flexibility also helped when we were designing a new feature, saving us from trying to squeeze new data into a rigid data model.

图模型的主要优点是快速的开发时间和灵活性。我们可以在不影响现有部署的情况下快速添加新功能。如果潜在客户想要导入他们自己的一些数据并将其移植到我们的模型之上,通常可以由销售代表在现场完成。当我们设计新功能时,灵活性也有帮助,使我们不必尝试将新数据压缩到刚性数据模型中。

Having a weird database let us build a lot of our other weird technologies, giving us lots of secret-sauce to distinguish our product from those of our competitors.

拥有一个奇怪的数据库让我们可以构建许多其他奇怪的技术,从而为我们提供了许多秘诀来将我们的产品与竞争对手的产品区分开来。

The main disadvantage was that we weren't using the standard relational database technology, which can be a problem when your customers are enterprisey. Our customers would ask why we couldn't just host our data on their giant Oracle clusters (our customers usually had large datacenters). One of the team actually rewrote the database layer to use Oracle (or PostgreSQL, or MySQL), but it was slightly slower than the original. At least one large enterprise even had an Oracle-only policy, but luckily Oracle bought Berkeley DB. We also had to write a lot of extra tools - we couldn't just use Crystal Reports for example.

主要的缺点是我们没有使用标准的关系数据库技术,当您的客户是企业时,这可能是一个问题。我们的客户会问为什么我们不能将数据托管在他们巨大的 Oracle 集群上(我们的客户通常拥有大型数据中心)。团队中的一个人实际上重写了数据库层以使用 Oracle(或 PostgreSQL,或 MySQL),但比原来的要慢一些。至少有一家大型企业甚至制定了仅限 Oracle 的政策,但幸运的是 Oracle 购买了 Berkeley DB。我们还必须编写许多额外的工具——例如,我们不能只使用 Crystal Reports。

The other disadvantage of our graph database was that we built it ourselves, which meant when we hit a problem (usually with scalability) we had to solve it ourselves. If we'd used a relational database, the vendor would have already solved the problem ten years ago.

我们图形数据库的另一个缺点是我们自己构建它,这意味着当我们遇到问题(通常是可扩展性)时,我们必须自己解决它。如果我们使用关系数据库,供应商十年前就已经解决了这个问题。

If you're building a product for enterprisey customers and your data fits into the relational model, use a relational database if you can. If your application doesn't fit the relational model but it does fit the graph model, use a graph database. If it only fits something else, use that.

如果您正在为企业客户构建产品并且您的数据适合关系模型,请尽可能使用关系数据库。如果您的应用程序不适合关系模型但适合图形模型,请使用图形数据库。如果它只适合其他东西,请使用它。

If your application doesn't need to fit into the current blub architecture, use a graph database, or CouchDB, or BigTable, or whatever fits your app and you think is cool. It might give you an advantage, and its fun to try new things.

如果您的应用程序不需要适应当前的 blub 架构,请使用图形数据库、CouchDB 或 BigTable,或任何适合您的应用程序并且您认为很酷的东西。它可能会给你带来优势,尝试新事物也很有趣。

Whatever you chose, try not to build the database engine yourself unless you really like building database engines.

无论您选择什么,除非您真的喜欢构建数据库引擎,否则尽量不要自己构建数据库引擎。

回答by DataRiot

We've been working with the Neo team for over a year now and have been very happy. We model scholarly artifacts and their relationships, which is spot on for a graph db, and run recommendation algorithms over the network.

我们已经与 Neo 团队合作了一年多,并且非常开心。我们对学术成果及其关系进行建模,这对于图数据库来说是正确的,并在网络上运行推荐算法。

If you are already working in Java, I think that modeling using Neo4j is very straight forward and it has the flattest / fastest performance for R/W of any other solutions we tried.

如果您已经在使用 Java,我认为使用 Neo4j 建模非常简单,并且在我们尝试过的任何其他解决方案中,它具有最平坦/最快的 R/W 性能。

To be honest, I have a hard time notthinking in terms of a Graph/Network because it's so much easier than designing convoluted table structures to hold object properties and relationships.

老实说,我很难考虑图形/网络,因为它比设计复杂的表结构来保存对象属性和关系要容易得多。

That being said, we do store some information in MySQL simply because it's easier for the Business side to run quick SQL queries against. To perform the same functions with Neo we would need to write code that we simply don't have the bandwidth for right now. As soon as we do though, I'm moving all that data to Neo!

话虽如此,我们确实在 MySQL 中存储了一些信息,只是因为业务方更容易对其运行快速 SQL 查询。要使用 Neo 执行相同的功能,我们需要编写我们现在根本没有带宽的代码。不过,一旦我们这样做,我就会将所有数据移至 Neo!

Good luck.

祝你好运。

回答by Turbo

Two points:

两点:

First, on the data I've been working with the past 5 years in SQL Server, I've recently hit the scalability wall with SQL for the type of queries we need to run (nested relationhsips...you know...graphs). I've been playing around with neo4j, and my lookup times are several orders of magnitude faster when I need this kind of lookup.

首先,关于过去 5 年我一直在 SQL Server 中使用的数据,我最近使用 SQL 遇到了我们需要运行的查询类型的可伸缩性墙(嵌套关系......你知道......图)。我一直在玩neo4j,当我需要这种查找时,我的查找时间要快几个数量级。

Second, to the point that graph databases are outdated. Um...no. Early on, as people were trying to figure out how to store and lookup data efficiently, they created and played with graph and network style database models. These were designed so the physical model reflected the logical model, so their efficiency wasnt that great. This type of data structure was good for semi-structured data, but not as good for structured dense data. So, this IBM dude named Codd was researching efficient ways to arrange and store structured data and came up with the idea for the relational database model. And it was good, and people were happy.

其次,图数据库已经过时了。不。早期,当人们试图弄清楚如何有效地存储和查找数据时,他们创建并使用了图形和网络样式的数据库模型。这些被设计成物理模型反映逻辑模型,所以它们的效率不是那么高。这种类型的数据结构适用于半结构化数据,但不适用于结构化密集数据。因此,这位名叫 Codd 的 IBM 家伙正在研究排列和存储结构化数据的有效方法,并提出了关系数据库模型的想法。它很好,人们很高兴。

What do we have here? Two tools for two different purposes. Graph database models are very good for representing semi-structured data and the relationships between entities (that may or may not exist). Relational databases are good for structured data that has a very static schema, and where join depths do not go very deep. One is good for one kind of data, the other is good for other kinds of data.

我们有什么在这里?两种不同用途的两种工具。图数据库模型非常适合表示半结构化数据和实体之间的关系(可能存在也可能不存在)。关系数据库适用于具有非常静态模式且连接深度不是很深的结构化数据。一种适用于一种数据,另一种适用于其他类型的数据。

To coin the phrase, there is no Silver Bullet. Its very short sighted to say that graph database models are out of date and to use one gives up 40 years of progress. That's like saying using C is giving up all the technological progress we've gone through to get things like Java and C#. That's not true though. C is a tool that is needed for certain tasks. And Java is a tool for other tasks.

总而言之,没有 Silver Bullet。说图数据库模型已经过时并且使用一个模型放弃了 40 年的进步,这是非常短视的。这就像说使用 C 正在放弃我们为获得 Java 和 C# 之类的东西而经历的所有技术进步。然而事实并非如此。C是某些任务所需的工具。Java 是其他任务的工具。

回答by Craig Taverner

I've been using MySQL for years to manage engineering data, and it worked well, but one of the problems we had (but didn't realise we had) was that we always had to plan the schema up-front. Another problem we knew we had was mapping the data up to domain objects and back.

我多年来一直使用 MySQL 来管理工程数据,它运行良好,但我们遇到的问题之一(但没有意识到我们遇到了)是我们总是必须预先计划模式。我们知道的另一个问题是将数据映射到域对象并返回。

Now we've just started trying out neo4j and it looks like it is solving both problems for us. The ability to add different properties to each node (and relation) has allowed us to re-think our entire approach to data. It is like dynamic versus static languages (Ruby versus Java), but for databases. Building the data model in the database can be done in a much more agile and dynamic way, and that is dramatically simplifying our code.

现在我们刚刚开始试用 neo4j,看起来它正在为我们解决这两个问题。为每个节点(和关系)添加不同属性的能力使我们能够重新思考我们处理数据的整个方法。这就像动态语言与静态语言(Ruby 与 Java),但适用于数据库。在数据库中构建数据模型可以以更加敏捷和动态的方式完成,这极大地简化了我们的代码。

And since the object model in code is generally a graph structure, mapping from the database is also simpler, with less code and consequently fewer bugs.

而且由于代码中的对象模型通常是图结构,因此从数据库映射也更简单,代码更少,因此错误更少。

And as a additional bonus, our initial prototype code for loading our data into neo4j is actually performing faster than the previous MySQL version. I have no solid numbers on this (yet), but that was a nice additional feature.

作为额外的奖励,我们用于将数据加载到 neo4j 的初始原型代码实际上比以前的 MySQL 版本执行得更快。我对此(还)没有可靠的数字,但这是一个不错的附加功能。

But at the end of the day, the choice probably should be based mostly on the nature of your domain model. Does it map better to tables or graphs? Decide by doing some prototypes, load the data and play with it. Use neoclipse to look at different views of the data. Once you've done that, hopefully you know if you're on to a good thing or not.

但归根结底,选择可能应该主要基于域模型的性质。它是否更好地映射到表格或图形?通过做一些原型来决定,加载数据并使用它。使用 neoclipse 查看数据的不同视图。一旦你这样做了,希望你知道你是否正在做一件好事。

回答by Paul Bock

I am building an intranet at my company.

我正在我的公司建立一个内部网。

I am interested in understanding how to load data that was stored in tables (Oracle, MySQL, SQL Server, Excel, Access, various random lists) and loading it into Neo4J, or some other graph database. Specifcally, what happens when common data overlaps existing data already in the system.

我有兴趣了解如何加载存储在表(Oracle、MySQL、SQL Server、Excel、Access、各种随机列表)中的数据并将其加载到 Neo4J 或其他一些图形数据库中。具体来说,当公共数据与系统中已有的数据重叠时会发生什么。

Yes, I know some data is best modeled in RDBMS, but I have this idea itching me, that when you need to superimpose several distinct tables, the graph model is better than the table structure.

是的,我知道一些数据最好在 RDBMS 中建模,但我有一个想法让我很痒,当你需要叠加几个不同的表时,图模型比表结构更好。

For instance, I work in a manufacturing environment. There is a major project we are working on and because of the complexity, each department has created a seperate Excel spreadsheet that has a BOM (Bill Of Materials)hierarchy in a column on the left and then several columns of notes and checks made by individuals who made these sheets.

例如,我在制造环境中工作。我们正在开展一个重大项目,由于复杂性,每个部门都创建了一个单独的 Excel 电子表格,该电子表格在左侧的列中有一个BOM(物料清单)层次结构,然后是几列由个人进行的笔记和检查谁制作了这些床单。

So one of the problems is merging all these notes together into one "view" so that someone can see all the issues that need to be addressed in any particular part.

因此,问题之一是将所有这些注释合并到一个“视图”中,以便有人可以看到任何特定部分需要解决的所有问题。

The second problem is that an Excel spreadsheet sucks at representing a hierarchial BOM when a common component is used in more than one subassembly. Meaning that, if someone writes a note about the P34 relay in the ignition subassembly, the same comment should be associated with the P34 relays used in the motor driver subassembly. This won't occur in the excel spreadsheet.

第二个问题是,当一个通用组件用于多个子装配体时,Excel 电子表格在表示分层 BOM 方面很糟糕。这意味着,如果有人写了关于点火子组件中 P34 继电器的注释,则相同的注释应该与电机驱动器子组件中使用的 P34 继电器相关联。这不会发生在 Excel 电子表格中。

For the company intranet, I want to be able to search for anything easily. Such as data related to a part number, a BOM structure, a phone number, an email address, a company policy, or procedure. I want to even extend this to manage computer hardware assets, and installed software.

对于公司内网,我希望能够轻松搜索任何内容。例如与零件编号、BOM 结构、电话号码、电子邮件地址、公司政策或程序相关的数据。我什至想将其扩展到管理计算机硬件资产和已安装的软件。

I envision that once the information network starts to get populated you can start doing cool traversals such as "I want to write an email to everyone working on the XYZ project". People will have been associated with the project because they will be tagged as creating and modifying the data within the XYZ project. So by using the XYZ project as a search key, a huge set with everything related to the XYZ project will be created. Including links to people who built the XYZ project. The people links will connect to their email addresses. So by their involvement in the XYZ project, they will be included in my email. This is in stark contrast to some secretary trying to maintain a list of people work on the project. We generate a lot of lists. We spend a lot of time maintaining lists and making sure they are up to date. And most of it doesn't add any value to our products.

我设想一旦信息网络开始变得拥挤,您就可以开始进行很酷的遍历,例如“我想给从事 XYZ 项目的每个人写一封电子邮件”。人们将与项目相关联,因为他们将被标记为在 XYZ 项目中创建和修改数据。因此,通过使用 XYZ 项目作为搜索关键字,将创建一个包含与 XYZ 项目相关的所有内容的庞大集合。包括指向构建 XYZ 项目的人员的链接。人员链接将连接到他们的电子邮件地址。因此,通过他们参与 XYZ 项目,他们将包含在我的电子邮件中。这与一些秘书试图维护项目工作人员名单形成鲜明对比。我们生成了很多列表。我们花费大量时间维护列表并确保它们是最新的。

Another cool traversal could report all the computers that have a certain piece of software installed, by version. That report could be used to generate tasks to remove extra copies of old software and to update people who need to have the latest copy. It would also be useful for license tracking.

另一个很酷的遍历可以按版本报告所有安装了某个软件的计算机。该报告可用于生成删除旧软件的额外副本以及更新需要最新副本的人员的任务。它也可用于许可证跟踪。

回答by Paul Bock

Here is a good article that talks about the needs that non relational databases fill: http://www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php

这里有一篇很好的文章,讨论了非关系数据库满足的需求:http: //www.readwriteweb.com/enterprise/2009/02/is-the-relational-database-doomed.php

It does a good job at pointing out (aside from the name) that relational databases arent flawed or wrong, its just that these days people are starting to process more and more data in mainstream software and web sites, and that relational databases just wont scale for these needs.

它很好地指出(除了名称之外)关系数据库没有缺陷或错误,只是现在人们开始在主流软件和网站中处理越来越多的数据,而关系数据库只是无法扩展对于这些需求。

回答by Peter Neubauer

might be a bit late, but there is a growing number of projects using Neo4j, the better known ones listed at Neo4j. Also NeoTechnology, the company behind Neo4j, has some references at their customers page

可能是有点晚了,但使用的Neo4j,在列出的更广为人知的人越来越多的项目Neo4j的。Neo4j 背后的公司 NeoTechnology 在他们的客户页面上也有一些参考资料

Note: I am part of the Neo4j team

注意:我是 Neo4j 团队的一员