在 MongoDB 中使用 UUID 而不是 ObjectID

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28895067/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 20:19:40  来源:igfitidea点击:

Using UUIDs instead of ObjectIDs in MongoDB

mongodb

提问by Christina

We are migrating a database from MySQL to MongoDB for performance reasons and considering what to use for IDs of the MongoDB documents. We are debating between using ObjectIDs, which is the MongoDB default, or using UUIDs instead (which is what we have been using up until now in MySQL). So far, the arguments we have to support any of these options are the following:

出于性能原因,我们正在将数据库从 MySQL 迁移到 MongoDB,并考虑将什么用于 MongoDB 文档的 ID。我们正在争论是使用 ObjectID,这是 MongoDB 的默认设置,还是使用 UUID(这是我们迄今为止在 MySQL 中一直使用的)。到目前为止,我们必须支持这些选项中的任何一个的论点如下:

ObjectIDs:ObjectIDs are the MongoDB default and I assume (although I'm not sure) that this is for a reason, meaning that I expect that MongoDB can handle them more efficiently than UUIDs or has another reason for preferring them. I also found this stackoverflow answerthat mentions that usage of ObjectIDs makes indexing more efficient, it would be nice however to have some metrics on how much this "more efficient" is.

ObjectIDs:ObjectIDs 是 MongoDB 的默认值,我假设(虽然我不确定)这是有原因的,这意味着我希望 MongoDB 可以比 UUIDs 更有效地处理它们,或者有另一个更喜欢它们的原因。我还发现这个 stackoverflow 答案提到使用 ObjectIDs 使索引更有效,但是有一些关于这种“更有效”的指标会很好。

UUIDs:Our basic argument in favour of using UUIDs (and it is a quite important one) is that they are supported, one way or another, by virtually any database. This means that if some way down the road we decide to switch from MongoDB to something else for whatever reason and we already have an API that retrieves documents from the DB based on their IDs nothing changes for the clients of this API since the IDs can continue to be exactly the same. If we were to use ObjectIDs I'm not really sure how we would go about migrating them to another DB.

UUID:我们支持使用 UUID 的基本论点(这是一个非常重要的论点)是,几乎任何数据库都以一种或另一种方式支持它们。这意味着,如果我们决定从 MongoDB 切换到其他东西,无论出于何种原因,我们已经有一个 API 可以根据它们的 ID 从数据库中检索文档,因此该 API 的客户端没有任何变化,因为 ID 可以继续完全一样。如果我们要使用 ObjectID,我不确定我们将如何将它们迁移到另一个数据库。

Does anyone have any insights on whether one of these options may be better than the other and why? Have you ever used UUIDs in MongoDB instead of ObjectIDs and if yes what were the advantages / problems you came across?

有没有人对这些选项中的一个是否可能比另一个更好以及为什么更好有任何见解?您是否曾经在 MongoDB 中使用过 UUID 而不是 ObjectID,如果是,您遇到了哪些优点/问题?

采纳答案by Molomby

I think this is a great idea and so does Mongo; they list UUIDs as one of the common options for the _idfield.

我认为这是一个好主意,Mongo 也是如此;他们将 UUID 列为_id字段的常用选项之一。

Considerations:

注意事项:

  • Performance-- As other answers mention, benchmarksshow UUIDs cause a performance drop for inserts. In the worst case measured (going from 10M to 20M docs in a collection) they've about ~2-3x slower -- the difference between inserting 2,000 (UUID) and 7,500 (ObjectID) docs per second. This is a large difference but it's significance depends entirely on you use case. Will you be batch inserting millions of docs at a time? For most apps I've build the common case is inserting individual documents. In that test the difference is muchsmaller (6,250 -vs- 7,500; ~20%). The ID type is simply not the limiting factor.
  • Portability-- Other DBs certainly do tend to have good UUID support so portability would be improved. Alternatively, since UUIDs are larger (more bits) it is possible to repack an ObjectID into the "shape" of a UUID. This approach isn't as nice as direct portability but it does give you a path forward.
  • 性能——正如其他答案所提到的,基准测试显示 UUID 会导致插入的性能下降。在测量的最坏情况下(从 10M 到 20M 文档在一个集合中),它们慢了大约 2-3 倍——每秒插入 2,000 (UUID) 和 7,500 (ObjectID) 文档之间的差异。这是一个很大的差异,但其重要性完全取决于您的用例。您会一次批量插入数百万个文档吗?对于我构建的大多数应用程序,常见的情况是插入单个文档。在该测试中,差异小得多(6,250 - 7,500;~20%)。ID 类型根本不是限制因素。
  • 可移植性——其他 DB 确实倾向于具有良好的 UUID 支持,因此可移植性会得到改善。或者,由于 UUID 更大(更多位),因此可以将ObjectID 重新打包为 UUID 的“形状”。这种方法不如直接可移植性好,但它确实为您提供了前进的道路。

Counter to some of the other answers:

与其他一些答案相反:

  • UUIDs have native support-- You can use the UUID()functionin the Mongo Shell exactly the same way you'd use ObjectID(); to convert a string into equivalent BSON object.
  • UUIDs are not especially large-- They're 128 bit compared to ObjectIDs which are 96 bit. (They should be encoded using binary subtype 0x04.)
  • UUIDs can include a timestamp-- Specifically, UUIDv1 encodes a timestamp with 60 bits of precision, compared to 32 bits in ObjectIDs. This is over 6 orders of magnitude more precision, so nano-seconds instead of seconds. It can actually be a decent way of storing create timestamps with more accuracy than Mongo/JS Date objects support, however...
    • The build in UUID()function only generates v4 (random) UUIDs so, to leverage this this, you'd to lean on on your app or Mongo driver for ID creation.
    • Unlike ObjectIDs, because of the way UUIDs are chunked, the timestamp doesn't give you a natural order. This can be good or bad depending on your use case.
    • Including timestamps in your IDs is often a Bad Idea. You end up leaking the created time of documents anywhere an ID is exposed. To make maters worse, v1 UUIDs also encode a unique identifier for the machine they're generated on which can expose additional information about your infrastructure (eg. number of servers). Of course ObjectIDs also encode a timestamp so this is partly true for them too.
  • UUID 具有本机支持——您可以在 Mongo Shell 中以完全相同的方式使用该UUID()函数ObjectID();将字符串转换为等效的 BSON 对象。
  • UUID 并不是特别大——与 96 位的 ObjectID 相比,它们是 128 位。(它们应该使用 binary subtype 进行编码0x04。)
  • UUID 可以包含时间戳——具体来说,UUIDv1 以 60 位精度对时间戳进行编码,而 ObjectID 为 32 位。这是超过 6 个数量级的精度,因此是纳秒而不是秒。它实际上是一种比 Mongo/JS Date 对象支持更准确的存储创建时间戳的好方法,但是......
    • 内置UUID()函数仅生成 v4(随机)UUID,因此,要利用这一点,您需要依靠您的应用程序或 Mongo 驱动程序来创建 ID。
    • 与 ObjectID 不同,由于UUID 的分块方式,时间戳不会为您提供自然顺序。这可能是好是坏,具体取决于您的用例。
    • 在您的 ID 中包含时间戳通常是一个坏主意。您最终会在暴露 ID 的任何地方泄露文档的创建时间。更糟糕的是,v1 UUID 还为生成它们的机器编码了一个唯一标识符,可以公开有关您的基础架构的其他信息(例如服务器数量)。当然,ObjectID 也对时间戳进行编码,因此这对它们来说也是部分正确的。

回答by Philipp

The _idfield of MongoDB can have any value you want as long as you can guarantee that it is unique for the collection. When your data already has a natural key, there is no reason not to use this in place of the auto-generated ObjectIDs.

_idMongoDB的字段可以有任何你想要的值,只要你能保证它对于集合是唯一的。当您的数据已经有一个自然键时,没有理由不使用它来代替自动生成的 ObjectID。

ObjectIDs are provided as a reasonable default solution to safe time generating an own unique key (and to discourage beginners from trying to copy SQL's AUTO INCREMENTwhich is a bad idea in a distributed database).

ObjectIDs 作为合理的默认解决方案提供,以安全地生成自己的唯一键(并阻止初学者尝试复制 SQL AUTO INCREMENT,这在分布式数据库中是一个坏主意)。

By not using ObjectIDs you also miss out on another convenience feature: An ObjectID also includes an unix timestamp when it was generated, and many drivers provide a funtion to extract it and convert it to a date. This can sometimes make a separate create-datefield redundant.

如果不使用 ObjectID,您还会错过另一个便利功能:ObjectID 还包括生成时的 unix 时间戳,并且许多驱动程序提供了提取它并将其转换为日期的功能。这有时会使单独的create-date字段变得多余。

But when neither is a concern for you, you are free to use your UUIDs as _idfield.

但是,当您不关心这两者时,您可以自由地使用您的 UUID 作为_id字段。

回答by sws

Consider the amount of data you would store in each case.

考虑您将在每种情况下存储的数据量。

A MongoDB ObjectIDis 12 bytes in size, is packed for storage, and its parts are organized for performance (i.e. timestamp is stored first, which is a logical ordering criteria).

一个 MongoDB ObjectID大小为 12 字节,为存储而打包,其部分是为性能而组织的(即先存储时间戳,这是一个逻辑排序标准)。

Conversely, a standard UUID is 36 bytes, contains dashes and is typically stored as a string. Further, even if you strip non-numeric characters and intend to store numerically, you must still content with its "indexy" portion (the part of a UUID v1 that is timestamp-based) is in the middle of the UUID, and doesn't lend itself well to sorting. There are studiesdone which allow for performant UUID storage, and I even wrote a Node.js libraryto assist in its management.

相反,标准 UUID 为 36 字节,包含破折号,通常存储为字符串。此外,即使您去除非数字字符并打算以数字方式存储,您仍然必须满足于其“索引”部分(基于时间戳的 UUID v1 部分)位于 UUID 的中间,并且不会它非常适合排序。有研究完成其允许高性能UUID存储,我甚至写了Node.js的库,以协助其管理。

If you're intend on using a UUID, consider reorganizing it for optimal indexing and sorting; otherwise you'll likely hit a performance wall.

如果您打算使用 UUID,请考虑重新组织它以获得最佳索引和排序;否则你很可能会碰到性能墙。

回答by Eli

I found these Benchmarkssometime ago when I had the same question. They basically show that using a Guid instead of ObjectId causes Index Performance drop.

前一段时间,当我遇到同样的问题时,我发现了这些基准。他们基本上表明使用 Guid 而不是 ObjectId 会导致索引性能下降。

I would anyways recommend that you customize the Benchmarks to imitate your specific real life scenario and see how the numbers look like, one cannot rely 100% on generic Benchmarks.

无论如何,我建议您自定义基准以模仿您特定的现实生活场景并查看数字的样子,不能 100% 依赖通用基准。

回答by Buzz Moschetti

We must be careful to distinguish the cost of MongoDB inserting a thing vs. the cost to generate the thing in the first place plusthat cost relative to the size of the payload. Below is a little matrix that shows method of generating the _idcrossed against the size of an optional extra bytes worth of payload. Tests are using javascript only, conducted on MacBook Pro localhost for 100,000 inserts using insertManyof batches of 100 without transactions to try to remove network, chatty, and other factors. Two runs with batch = 1 were also done just to highlight the dramatic difference.

我们必须小心区分 MongoDB 插入事物的成本与首先生成事物的成本加上相对于有效负载大小的成本。下面是一个小矩阵,显示_id了根据可选的额外字节值的有效载荷的大小生成交叉的方法。测试仅使用 javascript,在 MacBook Pro 本地主机上进行 100,000 次插入,使用insertMany100 个批次,没有事务,以尝试消除网络、闲聊和其他因素。还进行了批处理 = 1 的两次运行,以突出显着差异。


Method                                                                                         
A  :  Simple int:          _id:0, _id:1, ...                                                   
B  :  ObjectId             _id:ObjectId("5e0e6a804888946fa61a1976"), ...                       
C  :  Simple string:       _id:"A0", _id:"A1", ...                                             

D  :  UUID length string   _id:"9575edcc-cb70-4d63-97ed-ee5d624de87b0", ...                    
      (but not actually                                                                        
      generated by UUID()                                                                      

E  :  Real generated UUID  _id: UUID("35992974-21ea-4f61-b715-2dfaed663b73"), ...              
      (stored UUID() object)                                                                   

F  :  Real generated UUID  _id: "6b16f733-ff24-4172-83f9-e4f96ace6775"                         
      (stored as string, e.g.                                                                  
      UUID().toString().substr(6,36)                                                           

Time in milliseconds to perform 100,000 inserts on fresh (empty) collection.

Extra                M E T H O D   (Batch = 100)                                                               
Payload   A     B     C     D     E     F       % drop A to F                                  
--------  ----  ----  ----  ----  ----  ----    ------------                                   
None      2379  2386  2418  2492  3472  4267    80%                                            
512       2934  2928  3048  3128  4151  4870    66%                                            
1024      3249  3309  3375  3390  4847  5237    61%                                            
2048      3953  3832  3987  4342  5448  5888    49% 
4096      6299  6343  6199  6449  7634  8640    37%                                            
8192      9716  9292  9397 10816 11212 11321    16% 

Extra              M E T H O D   (Batch = 1)                                          
Payload   A      B      C      D      E      F       % drop A to F              
--------  -----  -----  -----  -----  -----  -----                              
None      48006  48419  49136  48757  50649  51280   6.8%                       
1024      50986  50894  49383  49373  51200  51821   1.2%                       


This was a quicky test but it seems clear that basic strings and ints as _idare roughly the same speed but actually generatinga UUID adds time -- especially if you take the string version of the UUID()object, e.g. UUID().toString().substr(6,36)It is also worth noting that constructing an ObjectIdappears to be as quick.

这是一个快速测试,但很明显,基本字符串和整数_id的速度大致相同,但实际生成UUID 会增加时间——尤其是如果您采用UUID()对象的字符串版本,例如UUID().toString().substr(6,36),还值得注意的是,构造一个ObjectId出现要尽快。