MongoDB 和复合主键

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23164417/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 13:41:22  来源:igfitidea点击:

MongoDB and composite primary keys

mongodbcomposite-primary-keyprimary-key-design

提问by herbrandson

I'm trying to determine the best way to deal with a composite primary key in a mongo db. The main key for interacting with the data in this system is made up of 2 uuids. The combination of uuids is guaranteed to be unique, but neither of the individual uuids is.

我正在尝试确定在 mongo db 中处理复合主键的最佳方法。本系统中与数据交互的主键由2个uuid组成。uuid 的组合保证是唯一的,但单个 uuid 都不是。

I see a couple of ways of managing this:

我看到了几种管理方法:

  1. Use an object for the primary key that is made up of 2 values (as suggested here)

  2. Use a standard auto-generated mongo object id as the primary key, store my key in two separate fields, and then create a composite index on those two fields

  3. Make the primary key a hash of the 2 uuids

  4. Some other awesome solution that I currently am unaware of

  1. 使用由 2 个值组成的主键的对象(如建议here

  2. 使用标准的自动生成的 mongo 对象 ID 作为主键,将我的键存储在两个单独的字段中,然后在这两个字段上创建复合索引

  3. 使主键成为 2 个 uuid 的散列

  4. 我目前不知道的其他一些很棒的解决方案

What are the performance implications of these approaches?

这些方法的性能影响是什么?

For option 1, I'm worried about the insert performance due to having non sequential keys. I know this can kill traditional RDBMS systems and I've seen indications that this could be true in MongoDB as well.

对于选项 1,由于具有非顺序键,我担心插入性能。我知道这会扼杀传统的 RDBMS 系统,而且我已经看到有迹象表明这在 MongoDB 中也是如此。

For option 2, it seems a little odd to have a primary key that would never be used by the system. Also, it seems that query performance might not be as good as in option 1. In a traditional RDBMS a clustered index gives the best query results. How relevant is this in MongoDB?

对于选项 2,拥有一个系统永远不会使用的主键似乎有点奇怪。此外,查询性能似乎不如选项 1。在传统的 RDBMS 中,聚集索引提供了最好的查询结果。这在 MongoDB 中有多重要?

For option 3, this would create one single id field, but again it wouldn't be sequential when inserting. Are there any other pros/cons to this approach?

对于选项 3,这将创建一个单一的 id 字段,但在插入时它也不会是连续的。这种方法还有其他优点/缺点吗?

For option 4, well... what is option 4?

对于选项 4,嗯……选项 4 是什么?

Also, there's some discussion of possibly using CouchDB instead of MongoDB at some point in the future. Would using CouchDB suggest a different solution?

此外,还有一些关于在未来某个时候可能使用 CouchDB 而不是 MongoDB 的讨论。使用 CouchDB 会提出不同的解决方案吗?

MORE INFO:some background about the problem can be found here

更多信息:可以在此处找到有关该问题的一些背景信息

采纳答案by Asya Kamsky

You should go with option 1.

您应该选择选项 1。

The main reason is that you say you are worried about performance - using the _id index which is always there and already unique will allow you to save having to maintain a second unique index.

主要原因是您说您担心性能 - 使用始终存在且已经唯一的 _id 索引将使您不必维护第二个唯一索引。

For option 1, I'm worried about the insert performance do to having non sequential keys. I know this can kill traditional RDBMS systems and I've seen indications that this could be true in MongoDB as well.

对于选项 1,我担心插入性能对非顺序键的影响。我知道这会扼杀传统的 RDBMS 系统,而且我已经看到有迹象表明这在 MongoDB 中也是如此。

Your other options do not avoid this problem, they just shift it from the _id index to the secondary unique index - but now you have two indexes, once that's right-balanced and the other one that's random access.

您的其他选项并没有避免这个问题,他们只是将它从 _id 索引转移到辅助唯一索引 - 但是现在您有两个索引,一个是右平衡的,另一个是随机访问的。

There is only one reason to question option 1 and that is if you plan to access the documents by just one or just the other UUID value. As long as you are always providing both values and (this part is very important) you always order them the same way in all your queries, then the _id index will be efficiently serving its full purpose.

质疑选项 1 的原因只有一个,那就是如果您打算仅通过一个或仅通过另一个 UUID 值访问文档。只要您始终提供两个值并且(这部分非常重要)您在所有查询中始终以相同的方式对它们进行排序,那么 _id 索引将有效地发挥其全部作用。

As an elaboration on why you have to make sure you always order the two UUID values the same way, when comparing subdocuments { a:1, b:2 }is not equal to { b:2, a:1 }- you could have a collection where two documents had those values for _id. So if you store _id with field a first, then you must always keep that order in all of your documents and queries.

为了详细说明为什么必须确保始终以相同的方式对两个 UUID 值进行排序,当比较子文档时{ a:1, b:2 }不等于{ b:2, a:1 }- 您可以拥有一个集合,其中两个文档具有 _id 的这些值。因此,如果您首先将 _id 与字段 a 一起存储,那么您必须始终在所有文档和查询中保持该顺序。

The other caution is that index on _id:1will be usable for query:

另一个注意事项是 index on_id:1将可用于查询:

db.collection.find({_id:{a:1,b:2}}) 

but it will notbe usable for query

但它不能用于查询

db.collection.find({"_id.a":1, "_id.b":2})

回答by i3arnon

I have an option 4 for you:

我有一个选项 4 给你:

Use the automatic _idfield and add 2 single field indexes for both uuid's instead of a single composite index.

使用自动_id字段并为两个 uuid 而不是单个复合索引添加 2 个单字段索引。

  1. The _idindex would be sequential (although that's less important in MongoDB), easily shardable, and you can let MongoDBmanage it.
  2. The 2 uuid indexes let you to make any kind of query you need (with the first one, with the second or with both in any order) and they take up less space than 1 compound index.
  3. In case you use both indexes (and other ones as well) in the same query MongoDBwill intersect them(new in v2.6) as if you were using a compound index.
  1. _id指数是连续的(尽管这不太重要的MongoDB),容易shardable,你可以让MongoDB管理它。
  2. 2 个 uuid 索引使您可以进行所需的任何类型的查询(第一个查询、第二个查询或两者以任意顺序),并且它们占用的空间少于 1 个复合索引。
  3. 如果您在同一个查询中同时使用两个索引(以及其他索引),它们MongoDB与它们相交(v2.6 中的新功能),就像您使用复合索引一样。

回答by Boris

I'd go for the 2 option and there is why

我会选择 2 选项,这就是为什么

  1. Having two separate fields instead of the one concatenated from both uuids as suggested in 1st, will leave you the flexibility to create other combinations of indexes to support the future query requests or if turns out, that the cardinality of one key is higher then another.
  2. having non sequential keys could help you to avoid the hotspots while inserting in sharded environment, so its not such a bad option. Sharding is the best way, for my opinion, to scale inserts and updates on the collections, since the write locking is on database level (prior to 2.6) or collection level (2.6 version)
  1. 使用两个单独的字段而不是第一个中建议的从两个 uuid 连接的字段,将使您能够灵活地创建其他索引组合以支持未来的查询请求,或者如果结果证明一个键的基数高于另一个。
  2. 具有非顺序键可以帮助您在插入分片环境时避免热点,因此它不是一个糟糕的选择。在我看来,分片是在集合上扩展插入和更新的最佳方式,因为写锁定是在数据库级别(2.6 之前)或集合级别(2.6 版本)

回答by Brent

I would've gone with option 2. You can still make an index that handles both the UUID fields, and performance should be the same as a compound primary key, except it'll be much easier to work with.

我会选择选项 2。您仍然可以创建一个处理两个 UUID 字段的索引,并且性能应该与复合主键相同,除非它更容易使用。

Also, in my experience, I've never regretted giving something a unique ID, even if it wasn't strictly required. Perhaps that's an unpopular opinion though.

此外,根据我的经验,我从不后悔给某些东西一个唯一的 ID,即使它不是严格要求的。也许这是一个不受欢迎的意见。