MongoDB 关系:嵌入还是引用?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5373198/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MongoDB relationships: embed or reference?
提问by Freewind
I'm new to MongoDB--coming from a relational database background. I want to design a question structure with some comments, but I don't know which relationship to use for comments: embed
or reference
?
我是 MongoDB 的新手——来自关系数据库背景。我想设计一个带有一些评论的问题结构,但我不知道评论使用哪种关系:embed
或reference
?
A question with some comments, like stackoverflow, would have a structure like this:
带有一些评论的问题,例如stackoverflow,将具有如下结构:
Question
title = 'aaa'
content = bbb'
comments = ???
At first, I want to use embeded comments (I think embed
is recommended in MongoDB), like this:
一开始,我想使用嵌入式注释(我认为embed
在 MongoDB 中是推荐的),如下所示:
Question
title = 'aaa'
content = 'bbb'
comments = [ { content = 'xxx', createdAt = 'yyy'},
{ content = 'xxx', createdAt = 'yyy'},
{ content = 'xxx', createdAt = 'yyy'} ]
It clear, but I'm worried about this case: If I want to edit a specified comment, how do I get its content and its question?There is no _id
to let me find one, nor question_ref
to let me find its question. (I'm so newbie, that I don't know if there's any way to do this without _id
and question_ref
.)
很清楚,但我担心这种情况:如果我想编辑指定的评论,我如何获取其内容和问题?没有_id
让我找到一个,也没有question_ref
让我找到它的问题。(我是新手,我不知道是否有任何方法可以在没有_id
和 的情况下做到这一点question_ref
。)
Do I have to use ref
not embed
? Then I have to create a new collection for comments?
我必须使用ref
notembed
吗?然后我必须创建一个新的评论集?
回答by John F. Miller
This is more an art than a science. The Mongo Documentation on Schemasis a good reference, but here are some things to consider:
这与其说是科学,不如说是一门艺术。在对架构蒙戈文档是一个很好的参考,但这里有一些事情要考虑:
Put as much in as possible
The joy of a Document database is that it eliminates lots of Joins. Your first instinct should be to place as much in a single document as you can. Because MongoDB documents have structure, and because you can efficiently query within that structure (this means that you can take the part of the document that you need, so document size shouldn't worry you much) there is no immediate need to normalize data like you would in SQL. In particular any data that is not useful apart from its parent document should be part of the same document.
Separate data that can be referred to from multiple places into its own collection.
This is not so much a "storage space" issue as it is a "data consistency" issue. If many records will refer to the same data it is more efficient and less error prone to update a single record and keep references to it in other places.
Document size considerations
MongoDB imposes a 4MB (16MB with 1.8) size limit on a single document. In a world of GB of data this sounds small, but it is also 30 thousand tweets or 250 typical Stack Overflow answers or 20 flicker photos. On the other hand, this is far more information than one might want to present at one time on a typical web page. First consider what will make your queries easier. In many cases concern about document sizes will be premature optimization.
Complex data structures:
MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document. (Note that there are data stores specifically designed for this type of data that one should consider as well)
It has also been pointed outthan it is impossible to return a subset of elements in a document. If you need to pick-and-choose a few bits of each document, it will be easier to separate them out.
Data Consistency
MongoDB makes a trade off between efficiency and consistency. The rule is changes to a single document are alwaysatomic, while updates to multiple documents should never be assumed to be atomic. There is also no way to "lock" a record on the server (you can build this into the client's logic using for example a "lock" field). When you design your schema consider how you will keep your data consistent. Generally, the more that you keep in a document the better.
尽可能多地投入
文档数据库的乐趣在于它消除了大量的联接。您的第一直觉应该是在一个文档中尽可能多地放置。因为 MongoDB 文档具有结构,并且因为您可以在该结构内有效地查询(这意味着您可以获取所需的文档部分,因此文档大小不应该太担心),所以没有立即需要对数据进行规范化,例如你会在 SQL 中。特别是,除了其父文档之外没有用的任何数据都应该是同一文档的一部分。
将可以从多个地方引用的数据分离到自己的集合中。
这与其说是“存储空间”问题,不如说是“数据一致性”问题。如果许多记录将引用相同的数据,则更新单个记录并在其他地方保留对它的引用会更有效且更不容易出错。
文档大小注意事项
MongoDB 对单个文档施加了 4MB(16MB 和 1.8)的大小限制。在 GB 数据的世界中,这听起来很小,但它也是 3 万条推文或 250 个典型的 Stack Overflow 答案或 20 张闪烁的照片。另一方面,这比人们一次在典型网页上可能想要呈现的信息要多得多。首先考虑什么会使您的查询更容易。在许多情况下,关注文档大小将是过早的优化。
复杂的数据结构:
MongoDB 可以存储任意深度嵌套的数据结构,但不能有效地搜索它们。如果您的数据形成树、森林或图形,则您实际上需要将每个节点及其边存储在单独的文档中。(请注意,有专门为此类数据设计的数据存储,您也应该考虑)
也有人指出,在文档中返回元素的子集是不可能的。如果您需要从每个文档中挑选几部分,将它们分开会更容易。
数据一致性
MongoDB 在效率和一致性之间进行权衡。规则是对单个文档的更改始终是原子的,而对多个文档的更新永远不应假定为原子的。也没有办法“锁定”服务器上的记录(您可以使用例如“锁定”字段将其构建到客户端的逻辑中)。在设计架构时,请考虑如何保持数据的一致性。通常,您在文档中保存的越多越好。
For what you are describing, I would embed the comments, and give each comment an id field with an ObjectID. The ObjectID has a time stamp embedded in it so you can use that instead of created at if you like.
对于您所描述的内容,我会嵌入评论,并为每个评论提供一个带有 ObjectID 的 id 字段。ObjectID 中嵌入了一个时间戳,因此您可以根据需要使用它而不是在 at 处创建。
回答by ywang1724
In general, embed is good if you have one-to-one or one-to-many relationships between entities, and reference is good if you have many-to-many relationships.
一般来说,如果实体之间有一对一或一对多的关系,embed 是好的,如果你有多对多的关系,reference 是好的。
回答by Gates VP
If I want to edit a specified comment, how to get its content and its question?
如果我想编辑指定的评论,如何获取其内容和问题?
You can query by sub-document: db.question.find({'comments.content' : 'xxx'})
.
您可以通过子文档查询:db.question.find({'comments.content' : 'xxx'})
。
This will return the whole Question document. To edit the specified comment, you then have to find the comment on the client, make the edit and save that back to the DB.
这将返回整个问题文档。要编辑指定的评论,您必须在客户端上找到评论,进行编辑并将其保存回数据库。
In general, if your document contains an array of objects, you'll find that those sub-objects will need to be modified client side.
通常,如果您的文档包含一组对象,您会发现这些子对象需要在客户端进行修改。
回答by Silom
Well, I'm a bit late but still would like to share my way of schema creation.
好吧,我有点晚了,但仍然想分享我的模式创建方式。
I have schemas for everything that can be described by a word, like you would do it in the classical OOP.
我有可以用一个词来描述的一切的模式,就像你在经典的 OOP 中所做的那样。
E.G.
例如
- Comment
- Account
- User
- Blogpost
- ...
- 评论
- 帐户
- 用户
- 博文
- ...
Every schema can be saved as a Document or Subdocument, so I declare this for each schema.
每个模式都可以保存为文档或子文档,因此我为每个模式声明了这一点。
Document:
文档:
- Can be used as a reference. (E.g. the user made a comment -> comment has a "made by" reference to user)
- Is a "Root" in you application. (E.g. the blogpost -> there is a page about the blogpost)
- 可以作为参考。(例如,用户发表了评论 -> 评论有对用户的“发表者”引用)
- 是您应用程序中的“根”。(例如博文 -> 有一个关于博文的页面)
Subdocument:
子文件:
- Can only be used once / is never a reference. (E.g. Comment is saved in the blogpost)
- Is never a "Root" in you application. (The comment just shows up in the blogpost page but the page is still about the blogpost)
- 只能使用一次/永远不是参考。(例如评论保存在博客中)
- 在您的应用程序中永远不是“根”。(评论仅显示在博客文章页面中,但该页面仍然是关于博客文章的)
回答by Chris Bloom
I came across this small presentation while researching this question on my own. I was surprised at how well it was laid out, both the info and the presentation of it.
我在自己研究这个问题时遇到了这个小演示。我很惊讶它的布局,包括信息和展示。
http://openmymind.net/Multiple-Collections-Versus-Embedded-Documents
http://openmymind.net/Multiple-Collections-Versus-Embedded-Documents
It summarized:
它总结了:
As a general rule, if you have a lot of [child documents] or if they are large, a separate collection might be best.
Smaller and/or fewer documents tend to be a natural fit for embedding.
作为一般规则,如果您有很多 [子文档] 或者它们很大,那么单独的集合可能是最好的。
更小和/或更少的文档往往很适合嵌入。
回答by finspin
I know this is quite old but if you are looking for the answer to the OP's question on how to return only specified comment, you can use the $ (query)operator like this:
我知道这已经很老了,但是如果您正在寻找关于如何仅返回指定评论的 OP 问题的答案,您可以使用$ (query)运算符,如下所示:
db.question.update({'comments.content': 'xxx'}, {'comments.$': true})
回答by Narendran
Yes, we can use the reference in the document.To populate the another document just like sql i joins.In mongo db they dont have joins to mapping one to many relationship document.Instead that we can use populateto fulfill our scenario..
是的,我们可以使用文档中的引用。要像 sql i joins 一样填充另一个文档。在 mongo db 中,他们没有连接来映射一对多关系文档。相反,我们可以使用populate来实现我们的场景..
var mongoose = require('mongoose')
, Schema = mongoose.Schema
var personSchema = Schema({
_id : Number,
name : String,
age : Number,
stories : [{ type: Schema.Types.ObjectId, ref: 'Story' }]
});
var storySchema = Schema({
_creator : { type: Number, ref: 'Person' },
title : String,
fans : [{ type: Number, ref: 'Person' }]
});
Population is the process of automatically replacing the specified paths in the document with document(s) from other collection(s). We may populate a single document, multiple documents, plain object, multiple plain objects, or all objects returned from a query. Let's look at some examples.
填充是用来自其他集合的文档自动替换文档中指定路径的过程。我们可以填充单个文档、多个文档、普通对象、多个普通对象或从查询返回的所有对象。让我们看一些例子。
Better you can get more information please visit :http://mongoosejs.com/docs/populate.html
更好地您可以获得更多信息,请访问:http: //mongoosejs.com/docs/populate.html
回答by Bonjour123
Actually, I'm quite curious why nobody spoke about the UML specifications. A rule of thumb is that if you have an aggregation, then you should use references. But if it is a composition, then the coupling is stronger, and you should use embedded documents.
实际上,我很好奇为什么没有人谈论 UML 规范。一条经验法则是,如果您有一个聚合,那么您应该使用引用。但如果是组合,那么耦合性更强,应该使用嵌入式文档。
And you will quickly understand why it is logical. If an object can exist independently of the parent, then you will want to access it even if the parent doesn't exist. As you just can't embed it in a non-existing parent, you have to make it live in it's own data structure. And if a parent exist, just link them together by adding a ref of the object in the parent.
你很快就会明白为什么它是合乎逻辑的。如果一个对象可以独立于父对象而存在,那么即使父对象不存在,您也会希望访问它。由于您无法将其嵌入到不存在的父项中,因此您必须使其存在于它自己的数据结构中。如果存在父对象,只需通过在父对象中添加对象的 ref 将它们链接在一起。
Don't really know what is the difference between the two relationships ? Here is a link explaining them: Aggregation vs Composition in UML
真的不知道这两种关系有什么区别?这是一个解释它们的链接: UML中的聚合与组合
回答by Emmanuel Orozco
I created this quizz as reference to know if you should use one or another
我创建了这个测验作为参考,以了解您是否应该使用一种或另一种
回答by serv-inc
If I want to edit a specified comment, how do I get its content and its question?
如果我想编辑指定的评论,如何获取其内容和问题?
If you had kept track of the number of comments and the index of the comment you wanted to alter, you could use the dot operator(SO example).
如果您一直跟踪评论的数量和要更改的评论的索引,则可以使用点运算符(SO 示例)。
You could do f.ex.
你可以做 f.ex。
db.questions.update(
{
"title": "aaa"
},
{
"comments.0.contents": "new text"
}
)
(as another way to edit the comments inside the question)
(作为编辑问题内评论的另一种方式)