database 在两个不同的集合中生成重复的 Mongo ObjectId 的可能性?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4677237/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Possibility of duplicate Mongo ObjectId's being generated in two different collections?
提问by Anthony Hyman
Is it possible for the same exact Mongo ObjectId to be generated for a document in two different collections? I realize that it's definitely very unlikely, but is it possible?
是否可以为两个不同集合中的文档生成完全相同的 Mongo ObjectId?我意识到这绝对是不太可能的,但有可能吗?
Without getting too specific, the reason I ask is that with an application that I'm working on we show public profiles of elected officials who we hope to convert into full fledged users of our site. We have separate collections for users and the elected officials who aren't currently members of our site. There are various other documents containing various pieces of data about the elected officials that all map back to the person using their elected official ObjectId.
Without getting too specific, the reason I ask is that with an application that I'm working on we show public profiles of elected officials who we hope to convert into full fledged users of our site. We have separate collections for users and the elected officials who aren't currently members of our site. There are various other documents containing various pieces of data about the elected officials that all map back to the person using their elected official ObjectId.
After creating the account we still highlight the data that's associated to the elected official but they now also are a part of the users collection with a corresponding users ObjectId to map their profile to interactions with our application.
After creating the account we still highlight the data that's associated to the elected official but they now also are a part of the users collection with a corresponding users ObjectId to map their profile to interactions with our application.
We had begun converting our application from MySql to Mongo a few months ago and while we're in transition we store the legacy MySql id for both of these data types and we're also starting to now store the elected official Mongo ObjectId in the users document to map back to the elected official data.
几个月前,我们开始将应用程序从 MySql 转换为 Mongo,在过渡期间,我们为这两种数据类型存储了遗留的 MySql id,现在我们也开始将选定的官方 Mongo ObjectId 存储在用户中document to map back to the elected official data.
I was pondering just specifying the new user ObjectId as the previous elected official ObjectId to make things simpler but wanted to make sure that it wasn't possible to have a collision with any existing user ObjectId.
I was pondering just specifying the new user ObjectId as the previous elected official ObjectId to make things simpler but wanted to make sure that it wasn't possible to have a collision with any existing user ObjectId.
Thanks for your insight.
感谢您的洞察力。
Edit: Shortly after posting this question, I realized that my proposed solution wasn't a very good idea. It would be better to just keep the current schema that we have in place and just link to the elected official '_id' in the users document.
编辑:发布这个问题后不久,我意识到我提出的解决方案不是一个好主意。最好只保留我们现有的当前模式,并链接到用户文档中选定的官方“_id”。
回答by Raj Advani
Short Answer
简答
Just to add a direct response to your initial question: YES, if you use BSON Object ID generation, then for most driversthe IDs are almost certainly going to be unique across collections. See below for what "almost certainly" means.
只是为您最初的问题添加一个直接回答:是的,如果您使用 BSON 对象 ID 生成,那么对于大多数驱动程序,ID 几乎肯定会在集合中是唯一的。请参阅下文了解“几乎可以肯定”的含义。
Long Answer
长答案
The BSON Object ID's generated by Mongo DB drivers are highly likely to be unique across collections. This is mainly because of the last 3 bytes of the ID, which for most driversis generated via a static incrementing counter. That counter is collection-independent; it's global. The Java driver, for example, uses a randomly initialized, static AtomicInteger.
Mongo DB 驱动程序生成的 BSON 对象 ID 很可能在集合中是唯一的。这主要是因为 ID 的最后 3 个字节,对于大多数驱动程序,它是通过静态递增计数器生成的。该计数器与集合无关;它是全球性的。例如,Java 驱动程序使用随机初始化的静态 AtomicInteger。
So why, in the Mongo docs, do they say that the IDs are "highly likely" to be unique, instead of outright saying that they WILL be unique? Three possibilities can occur where you won't get a unique ID (please let me know if there are more):
那么,为什么在 Mongo 文档中,他们说 ID“极有可能”是唯一的,而不是直接说它们将是唯一的?您无法获得唯一 ID 的三种可能性(如果还有更多,请告诉我):
Before this discussion, recall that the BSON Object ID consists of:
在此讨论之前,请记住 BSON 对象 ID 包括:
[4 bytes seconds since epoch, 3 bytes machine hash, 2 bytes process ID, 3 bytes counter]
[自纪元以来的 4 字节秒数,3 字节机器哈希,2 字节进程 ID,3 字节计数器]
Here are the three possibilities, so you judge for yourself how likely it is to get a dupe:
以下是三种可能性,因此您自己判断被骗的可能性有多大:
1) Counter overflow: there are 3 bytes in the counter. If you happen to insert over 16,777,216 (2^24) documents in a single second, on the same machine, in the same process, then you may overflow the incrementing counter bytes and end up with two Object IDs that share the same time, machine, process, and counter values.
1) 计数器溢出:计数器中有 3 个字节。如果您碰巧在同一台机器上的同一进程中在一秒钟内插入超过 16,777,216 (2^24) 个文档,那么您可能会溢出递增的计数器字节并最终得到两个共享同一时间的对象 ID,机器、进程和计数器值。
2) Counter non-incrementing: some Mongo drivers use random numbers instead of incrementing numbers for the counter bytes. In these cases, there is a 1/16,777,216 chance of generating a non-unique ID, but only if those two IDs are generated in the same second (i.e. before the time section of the ID updates to the next second), on the same machine, in the same process.
2) 计数器非递增:一些 Mongo 驱动程序使用随机数而不是计数器字节的递增数字。在这些情况下,有 1/16,777,216 的机会生成非唯一 ID,但前提是这两个 ID 是在同一秒(即 ID 的时间段更新到下一秒之前)生成的,并且机,在同一个过程中。
3) Machine and process hash to the same values. The machine ID and process ID values may, in some highly unlikely scenario, map to the same values for two different machines. If this occurs, and at the same time the two counters on the two different machines, during the same second, generate the same value, then you'll end up with a duplicate ID.
3)机器和进程散列到相同的值。在某些极不可能的情况下,机器 ID 和进程 ID 值可能映射到两台不同机器的相同值。如果发生这种情况,并且同时两台不同机器上的两个计数器在同一秒内生成相同的值,那么您最终会得到重复的 ID。
These are the three scenarios to watch out for. Scenario 1 and 3 seem highly unlikely, and scenario 2 is totally avoidable if you're using the right driver. You'll have to check the source of the driver to know for sure.
这是需要注意的三种情况。场景 1 和 3 似乎不太可能发生,如果您使用正确的驱动程序,则场景 2 是完全可以避免的。您必须检查驱动程序的来源才能确定。
回答by mstearn
ObjectIds are generated client-side in a manner similar to UUID but with some nicer properties for storage in a database such as roughly increasing order and encoding their creation time for free. The key thing for your use case is that they are designed to guarantee uniqueness to a high probability even if they are generated on different machines.
ObjectIds 以类似于 UUID 的方式在客户端生成,但具有一些更好的存储在数据库中的属性,例如粗略增加顺序和免费编码它们的创建时间。您的用例的关键在于它们旨在保证高概率的唯一性,即使它们是在不同的机器上生成的。
Now if you were referring to the _id field in general, we do not require uniqueness across collections so it is safe to reuse the old _id. As a concrete example, if you have two collections, colors
and fruits
, both could simultaneously have an object like {_id: 'orange'}
.
现在,如果您通常指的是 _id 字段,我们不需要跨集合的唯一性,因此可以安全地重用旧的 _id。作为一个具体的例子,如果您有两个集合,colors
并且fruits
,则两者都可以同时拥有一个像{_id: 'orange'}
.
In case you want to know more about how ObjectIds are created, here is the spec: http://www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification
如果您想了解更多关于 ObjectIds 是如何创建的,这里是规范:http: //www.mongodb.org/display/DOCS/Object+IDs#ObjectIDs-BSONObjectIDSpecification
回答by DenverMatt
In case anyone is having problems with duplicate Mongo ObjectIDs, you should know that despite the unlikelihood of dups happening in Mongo itself, it is possible to have duplicate _id's generated with PHP in Mongo.
如果有人遇到重复的 Mongo ObjectID 问题,您应该知道,尽管 Mongo 本身不太可能发生重复,但在 Mongo 中使用 PHP 生成重复的 _id 是可能的。
The use-case where this has happened with regularity for me is when I'm looping through a dataset and attempting to inject the data into a collection.
对我来说,这种情况经常发生的用例是当我循环遍历数据集并尝试将数据注入到集合中时。
The array that holds the injection data must be explicitly reset on each iteration - even if you aren't specifying the _id value. For some reason, the INSERT process adds the Mongo _id to the array as if it were a global variable (even if the array doesn't have global scope). This can affect you even if you are calling the insertion in a separate function call where you would normally expect the values of the array not to persist back to the calling function.
保存注入数据的数组必须在每次迭代时显式重置 - 即使您没有指定 _id 值。出于某种原因,INSERT 过程将 Mongo _id 添加到数组中,就好像它是一个全局变量一样(即使数组没有全局范围)。即使您在单独的函数调用中调用插入,这也会对您产生影响,您通常希望数组的值不会持久返回到调用函数。
There are three solutions to this:
对此,有以下三种解决方案:
- You can
unset()
the _id field from the array - You can reinitialize the entire array with
array()
each time you loop through your dataset - You can explicitly define the _id value yourself (taking care to define it in such a way that you don't generate dups yourself).
- 您可以
unset()
从数组中获取 _id 字段 array()
每次循环遍历数据集时,您都可以重新初始化整个数组- 您可以自己明确定义 _id 值(注意以不自己生成重复项的方式定义它)。
My guess is that this is a bug in the PHP interface, and not so much an issue with Mongo, but if you run into this problem, just unset the _id and you should be fine.
我的猜测是这是 PHP 界面中的一个错误,与 Mongo 并没有太大的问题,但是如果遇到这个问题,只需取消设置 _id 就可以了。
回答by slacy
There's no guarantee whatsoever about ObjectId uniqueness across collections. Even if it's probabilistically very unlikely, it would be a very poor application design that relied on _id uniqueness across collections.
不能保证跨集合的 ObjectId 唯一性。即使它在概率上非常不可能,它也将是一个非常糟糕的应用程序设计,它依赖于跨集合的 _id 唯一性。
One can easily test this in the mongo shell:
可以在 mongo shell 中轻松测试这一点:
MongoDB shell version: 1.6.5
connecting to: test
> db.foo.insert({_id: 'abc'})
> db.bar.insert({_id: 'abc'})
> db.foo.find({_id: 'abc'})
{ "_id" : "abc" }
> db.bar.find({_id: 'abc'})
{ "_id" : "abc" }
> db.foo.insert({_id: 'abc', data:'xyz'})
E11000 duplicate key error index: test.foo.$_id_ dup key: { : "abc" }
So, absolutely don't rely on _id's being unique across collections, and since you don't control the ObjectId generation function, don't rely on it.
因此,绝对不要依赖 _id 在集合中是唯一的,并且由于您不控制 ObjectId 生成函数,因此不要依赖它。
It's possible to create something that's more like a uuid, and if you do that manually, you could have some better guarantee of uniqueness.
可以创建更像 uuid 的东西,如果您手动执行此操作,您可以更好地保证唯一性。
Remember that you can put objects of different "types" in the same collection, so why not just put your two "tables" in the same collection. They would share the same _id space, and thus, would be guaranteed unique. Switching from "prospective" to "registered" would be a simple flipping of a field...
请记住,您可以将不同“类型”的对象放在同一个集合中,那么为什么不将两个“表”放在同一个集合中。它们将共享相同的 _id 空间,因此将保证唯一。从“预期”切换到“注册”将是一个简单的领域翻转......