了解 MongoDB BSON 文档大小限制
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4667597/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding MongoDB BSON Document size limit
提问by saint
From MongoDB The Definitive Guide:
来自 MongoDB 权威指南:
Documents larger than 4MB (when converted to BSON) cannot be saved to the database. This is a somewhat arbitrary limit (and may be raised in the future); it is mostly to prevent bad schema design and ensure consistent performance.
大于 4MB(转换为 BSON 时)的文档无法保存到数据库中。这是一个有点随意的限制(将来可能会提高);它主要是为了防止糟糕的架构设计并确保一致的性能。
I don't understand this limit, does this mean that A Document containing a Blog post with a lot of comments which just so happens to be larger than 4MB cannot be stored as a single document?
我不明白这个限制,这是否意味着包含大量评论的博客文章的文档恰好大于 4MB 不能存储为单个文档?
Also does this count the nested documents too?
这也计算嵌套文档吗?
What if I wanted a document which audits the changes to a value. (It will eventually may grow, exceeding 4MB limit.)
如果我想要一个审核值更改的文档怎么办。(它最终可能会增长,超过 4MB 的限制。)
Hope someone explains this correctly.
希望有人正确解释这一点。
I have just started reading about MongoDB (first nosql database I'm learning about).
我刚刚开始阅读 MongoDB(我正在学习的第一个 nosql 数据库)。
Thank you.
谢谢你。
采纳答案by Justin Jenkins
First off, this actually is being raised in the next version to 8MB
or 16MB
... but I think to put this into perspective, Eliot from 10gen (who developed MongoDB) puts it best:
首先,这实际上在下一个版本中被提升到8MB
或16MB
......但我认为从这个角度来看,来自 10gen(开发 MongoDB)的 Eliot 说得最好:
EDIT:The size has been officially'raised' to 16MB
编辑:大小已正式“提高”到16MB
So, on your blog example, 4MB is actually a whole lot.. For example, the full uncompresses text of "War of the Worlds" is only 364k (html): http://www.gutenberg.org/etext/36
If your blog post is that long with that many comments, I for one am not going to read it :)
For trackbacks, if you dedicated 1MB to them, you could easily have more than 10k (probably closer to 20k)
So except for truly bizarre situations, it'll work great. And in the exception case or spam, I really don't think you'd want a 20mb object anyway. I think capping trackbacks as 15k or so makes a lot of sense no matter what for performance. Or at least special casing if it ever happens.
-Eliot
因此,在您的博客示例中,4MB 实际上是一大堆。例如,“世界大战”的完整解压缩文本仅为 364k (html):http: //www.gutenberg.org/etext/36
如果你的博文有那么多评论,那么我就不会读它了:)
对于引用,如果您为它们分配了 1MB,则很容易超过 10k(可能接近 20k)
所以除了真正奇怪的情况,它会很好用。在例外情况或垃圾邮件中,我真的不认为你会想要一个 20mb 的对象。我认为无论性能如何,将引用限制为 15k 左右都是很有意义的。或者至少是特殊的外壳,如果它发生的话。
-艾略特
I think you'd be pretty hard pressed to reach the limit ... and over time, if you upgrade ... you'll have to worry less and less.
我认为你很难达到极限......随着时间的推移,如果你升级......你将越来越不用担心。
The main point of the limit is so you don't use up all the RAM on your server (as you need to load all MB
s of the document into RAM when you query it.)
限制的要点是这样您就不会用完服务器上的所有 RAM(因为您需要MB
在查询时将文档的所有s加载到 RAM 中。)
So the limit is some % of normal usable RAM on a common system ... which will keep growing year on year.
所以限制是普通系统上正常可用内存的一些百分比......这将逐年增长。
Note on Storing Files in MongoDB
在 MongoDB 中存储文件的注意事项
If you need to store documents (or files) larger than 16MB
you can use the GridFS APIwhich will automatically break up the data into segments and stream them back to you (thus avoiding the issue with size limits/RAM.)
如果您需要存储大于16MB
您可以使用的GridFS API 的文档(或文件),它会自动将数据分解成段并将它们流回给您(从而避免大小限制/RAM 的问题。)
Instead of storing a file in a single document, GridFS divides the file into parts, or chunks, and stores each chunk as a separate document.
GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata.
GridFS 不是将文件存储在单个文档中,而是将文件分成多个部分或块,并将每个块存储为一个单独的文档。
GridFS 使用两个集合来存储文件。一个集合存储文件块,另一个集合存储文件元数据。
You can use this method to store images, files, videos, etc in the database much as you might in a SQL database. I have used this to even store multi gigabyte video files.
您可以使用此方法在数据库中存储图像、文件、视频等,就像在 SQL 数据库中一样。我什至用它来存储多 GB 的视频文件。
回答by marr75
Many in the community would prefer no limit with warnings about performance, see this comment for a well reasoned argument: https://jira.mongodb.org/browse/SERVER-431?focusedCommentId=22283&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-22283
社区中的许多人希望没有关于性能警告的限制,请参阅此评论以获得合理的论据:https: //jira.mongodb.org/browse/SERVER-431?focusedCommentId =22283 &page =com.atlassian.jira.plugin。 system.issuetabpanels:comment-tabpanel#comment-22283
My take, the lead developers are stubborn about this issue because they decided it was an important "feature" early on. They're not going to change it anytime soon because their feelings are hurt that anyone questioned it. Another example of personality and politics detracting from a product in open source communities but this is not really a crippling issue.
我的看法是,主要开发人员对这个问题很固执,因为他们很早就认为这是一个重要的“功能”。他们不会很快改变它,因为他们的感情受到任何质疑的伤害。另一个例子是个性和有损开源社区中的产品,但这并不是一个真正严重的问题。
回答by Sammaye
To post a clarification answer here for those who get directed here by Google.
在这里为那些被 Google 引导到这里的人发布澄清答案。
The document size includes everything in the document including the subdocuments, nested objects etc.
文档大小包括文档中的所有内容,包括子文档、嵌套对象等。
So a document of:
所以一个文件:
{
_id:{},
na: [1,2,3],
naa: [
{w:1,v:2,b:[1,2,3]},
{w:5,b:2,h:[{d:5,g:7},{}]}
]
}
Has a maximum size of 16meg.
最大大小为 16meg。
Sbudocuments and nested objects are all counted towards the size of the document.
Sbudocuments 和嵌套对象都计入文档的大小。
回答by user2903536
Nested Depth for BSON Documents:MongoDB supports no more than 100 levels of nesting for BSON documents.
BSON 文档的嵌套深度:MongoDB 支持不超过 100 级的 BSON 文档嵌套。
回答by Chris Golledge
I have not yet seen a problem with the limit that did not involve large files stored within the document itself. There are already a variety of databases which are very efficient at storing/retrieving large files; they are called operating systems. The database exists as a layer over the operating system. If you are using a NoSQL solution for performance reasons, why would you want to add additional processing overhead to the access of your data by putting the DB layer between your application and your data?
我还没有看到不涉及存储在文档本身中的大文件的限制问题。已经有多种数据库在存储/检索大文件方面非常有效;它们被称为操作系统。数据库作为操作系统之上的一层存在。如果您出于性能原因使用 NoSQL 解决方案,为什么要通过将数据库层置于应用程序和数据之间来为数据访问增加额外的处理开销?
JSON is a text format. So, if you are accessing your data through JSON, this is especially true if you have binary files because they have to be encoded in uuencode, hexadecimal, or Base 64. The conversion path might look like
JSON 是一种文本格式。因此,如果您通过 JSON 访问数据,那么当您有二进制文件时尤其如此,因为它们必须以 uuencode、十六进制或 Base 64 进行编码。转换路径可能如下所示
binary file <> JSON (encoded) <> BSON (encoded)
二进制文件 <> JSON(编码)<> BSON(编码)
It would be more efficient to put the path (URL) to the data file in your document and keep the data itself in binary.
将数据文件的路径 (URL) 放在文档中并将数据本身以二进制形式保存会更有效。
If you really want to keep these files of unknown length in your DB, then you would probably be better off putting these in GridFS and not risking killing your concurrency when the large files are accessed.
如果你真的想在你的数据库中保留这些未知长度的文件,那么你最好将它们放在 GridFS 中,而不是在访问大文件时冒着杀死并发的风险。
回答by Mchl
Perhaps storing a blog post -> comments relationin a non-relational database is not really the best design.
也许在非关系数据库中存储博客文章 -> 评论关系并不是最好的设计。
You should probably store comments in a separate collection to blog posts anyway.
无论如何,您可能应该将评论存储在单独的集合中以保存博客文章。
[edit]
[编辑]
See comments below for further discussion.
请参阅下面的评论以进行进一步讨论。
回答by mzarrugh
According to https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
根据https://www.mongodb.com/blog/post/6-rules-of-thumb-for-mongodb-schema-design-part-1
If you expect that a blog post may exceed the 16Mb document limit, you should extract the comments into a separate collection and reference the blog post from the comment and do an application-level join.
如果您预计博客文章可能会超过 16Mb 的文档限制,您应该将评论提取到一个单独的集合中,并从评论中引用博客文章并进行应用程序级联接。
// posts
[
{
_id: ObjectID('AAAA'),
text: 'a post',
...
}
]
// comments
[
{
text: 'a comment'
post: ObjectID('AAAA')
},
{
text: 'another comment'
post: ObjectID('AAAA')
}
]