Mongodb:多个集合或一个带索引的大集合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15314769/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 13:06:28  来源:igfitidea点击:

Mongodb: multiple collections or one big collection w/ index

mongodb

提问by lostintranslation

I need help modeling my data in mongo. Most my experience has been in relational DBs, I am just starting out w/ mongo. I am modeling data for different events.

我需要帮助在 mongo 中对我的数据进行建模。我的大部分经验都是在关系数据库中,我刚刚开始使用 mongo。我正在为不同的事件建模数据。

  1. Each 'event' with have the same fields.
  2. Each 'event' will have hundreds to millions of documents/rows
  3. Events are dynamic, i.e. new ones will be created as needed. i.e. maybe create a new 'Summer Olympics 2016' event.
  1. 每个“事件”具有相同的字段。
  2. 每个“事件”将有数百到数百万个文档/行
  3. 事件是动态的,即会根据需要创建新事件。即可能创建一个新的“2016 年夏季奥运会”活动。

Probably most important, when dealing with events (CRUD operations) users will have to specify an event name.

可能最重要的是,在处理事件(CRUD 操作)时,用户必须指定一个事件名称。

I can see a couple of ways to do this so far and I don't want to make a major mistake in setting up my data model the 'wrong' way.

到目前为止,我可以看到几种方法来做到这一点,我不想在以“错误”的方式设置我的数据模型时犯重大错误。

1) One 'events' collection that has data for all events. Index on 'event' name. Query would look something like:

1) 一个“事件”集合,其中包含所有事件的数据。“事件”名称的索引。查询看起来像:

db.events.find({event: 'Summer Olympics 2012');
{event: 'Summer Olympics 2012', attributes: [{name: 'joe smith', .... }
{event: 'Summer Olympics 2012', attributes: [{name: 'jane doe', .... }
{event: 'Summer Olympics 2012', attributes: [{name: 'john avery', .... }
{event: 'Summer Olympics 2012', attributes: [{name: 'ted williams', .... }

db.events.find({event: 'Summer Olympics 2013'})
{event: 'Summer Olympics 2013', attributes: [{name: 'steve smith', .... }
{event: 'Summer Olympics 2013', attributes: [{name: 'amy jones', .... }

2) A collection for each new event that comes along, w/ collection to keep track of all event names. No index on event name needs as each event is stored in a different collection.

2)每个新事件的集合,带有用于跟踪所有事件名称的集合。由于每个事件都存储在不同的集合中,因此不需要事件名称的索引。

// multiple collections, create new as needed
db.summer2012.find() // get summer 2012 docs

db.summer2016.find() // get summer 2016 docs

//'events' collection
db.events.find() // get all events that I would have collections for
{name: 'summer2012', title: 'Summer Olympics 2012'};
{name: 'summer2016', title: 'Summer Olympics 2016'};

For #1 I am a little worried that once I reach 100 events each with millions of records that lookups per 'event' will be slow even if one of the events only has 500 documents.

对于 #1,我有点担心,一旦我达到 100 个事件,每个事件都有数百万条记录,即使其中一个事件只有 500 个文档,每个“事件”的查找也会很慢。

For #2 Am I 'skirting' the mongo model here by creating a new collection each time and an event comes along?

对于#2,我是否通过每次创建一个新集合并出现事件来“绕过”mongo 模型?

Any comments/ideas are welcome as I really have no idea which one is going to end up performing better or if one or the other would get me into more trouble down the road. I have looked around (mongo's site included) an I really cannot find a concrete answer.

欢迎任何评论/想法,因为我真的不知道哪一个最终会表现得更好,或者一个或另一个是否会给我带来更多麻烦。我环顾四周(包括 mongo 的网站),我真的找不到具体的答案。

回答by lostintranslation

From mongo docs here: data modeling

来自这里的 mongo 文档:数据建模

In certain situations, you might choose to store information in several collections rather than in a single collection.

Consider a sample collection logs that stores log documents for various environment and applications. The logs collection contains documents of the following form:

{ log: "dev", ts: ..., info: ... } { log: "debug", ts: ..., info: ...}

If the total number of documents is low you may group documents into collection by type. For logs, consider maintaining distinct log collections, such as logs.dev and logs.debug. The logs.dev collection would contain only the documents related to the dev environment.

Generally, having large number of collections has no significant performance penalty and results in very good performance. Distinct collections are very important for high-throughput batch processing.

在某些情况下,您可能会选择将信息存储在多个集合中而不是单个集合中。

考虑存储各种环境和应用程序的日志文档的示例收集日志。日志集合包含以下形式的文档:

{日志:“开发”,ts:...,信息:...} {日志:“调试”,ts:...,信息:...}

如果文档总数较少,您可以按类型将文档分组到集合中。对于日志,请考虑维护不同的日志集合,例如 logs.dev 和 logs.debug。logs.dev 集合将仅包含与开发环境相关的文档。

通常,拥有大量集合不会造成显着的性能损失,并且会产生非常好的性能。不同的集合对于高吞吐量批处理非常重要。

Also spoke w/ 10gen guy. For really large collections he listed multiple benefits for separating out into smaller more specific collections. His comment on using one collection for all the data and using an index was:

还和 10gen 的家伙说话。对于非常大的集合,他列出了分离成更小的更具体的集合的多种好处。他对所有数据使用一个集合并使用索引的评论是:

Just because you can do something does not mean you should. Model your data appropriately. may be easy to store in one large collection and index but that is not always best approach.

仅仅因为您可以做某事并不意味着您应该这样做。适当地为您的数据建模。可能很容易存储在一个大型集合和索引中,但这并不总是最好的方法。