在 MongoDB 中创建自定义对象 ID

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12211138/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 12:48:32  来源:igfitidea点击:

Creating custom Object ID in MongoDB

mongodb

提问by Joe

I am creating a service for which I will use MongoDB as a storage backend. The service will produce a hash of the user input and then see if that same hash (+ input) already exists in our dataset.

我正在创建一个服务,我将使用 MongoDB 作为存储后端。该服务将生成用户输入的散列,然后查看我们的数据集中是否已存在相同的散列(+ 输入)。

The hash will be unique yet random ( = non-incremental/sequential), so my question is:

散列将是唯一但随机的(= 非增量/顺序),所以我的问题是:

  1. Is it -legitimate- to use a random value for an Object ID? Example:
  1. 为对象 ID 使用随机值是否合法?例子:

$object_id = new MongoId(HEX-OF-96BIT-HASH);

$object_id = new MongoId(HEX-OF-96BIT-HASH);

Or will MongoDB treat the ObjectID differently from other server-produced ones, since a "real" ObjectID also contains timestamps, machine_id, etc?

或者 MongoDB 是否会将 ObjectID 与其他服务器生成的 ObjectID 区别对待,因为“真实”的 ObjectID 还包含时间戳、machine_id 等?

What are the pros and cons of using a 'random' value? I guess it would be statistically slower for the engine to update the index on inserts when the new _id's are not in any way incremental - am I correct on that?

使用“随机”值的优缺点是什么?我想当新的 _id 不是增量时,引擎更新插入索引在统计上会更慢 - 我是否正确?

回答by DhruvPathak

Yes it is perfectly fine to use a random value for an object id, if some value is present in _idfield of a document being stored, it is treated as objectId.

是的,对对象 id 使用随机值是完全没问题的,如果_id正在存储的文档的字段中存在某个值,则将其视为 objectId。

Since _idfield is always indexed, and primary key, you need to make sure that different objectid is generated for each object. There are some guidelines to optimize user defined object ids :

由于_id字段始终是索引的,并且是主键,因此您需要确保为每个对象生成不同的 objectid。有一些优化用户定义的对象 ID 的指南:

https://docs.mongodb.com/manual/core/document/#the-id-field.

https://docs.mongodb.com/manual/core/document/#the-id-field

回答by Sim

While any values, including hashes, can be used for the _idfield, I would recommend against using random values for two reasons:

虽然任何值(包括哈希)都可以用于该_id字段,但我建议不要使用随机值,原因有两个:

  1. You may need to develop a collision-management strategy in the case you produce identical random values for two different objects. In the question, you imply that you'll generate IDs using a some type of a hash algorithm. I would not consider these values "random" as they are based on the content you are digesting with the hash. The probability of a collision then is a function of the diversity of content and the hash algorithm. If you are using something like MD5 or SHA-1, I wouldn't worry about the algorithm, just the content you are hashing. If you need to develop a collision-management strategy then you definitely should not use random or hash-based IDs as collision management in a clustered environment is complicated and requires additional queries.

  2. Random values as well as hash values are purposefully meant to be dispersed on the number line. That (a) will require more of the B-tree index to be kept in memory at all times and (b) may cause variable insert performance due to B-tree rebalancing. MongoDB is optimized to handle ObjectIDs, which come in ascending order (with one second time granularity). You're likely better off sticking with them.

  1. 如果您为两个不同的对象生成相同的随机值,您可能需要制定碰撞管理策略。在这个问题中,您暗示您将使用某种类型的哈希算法生成 ID。我不会认为这些值是“随机的”,因为它们基于您使用哈希消化的内容。冲突的概率是内容多样性和散列算法的函数。如果您使用的是 MD5 或 SHA-1 之类的东西,我不会担心算法,只担心您正在散列的内容。如果您需要开发冲突管理策略,那么您绝对不应该使用随机或基于哈希的 ID,因为集群环境中的冲突管理很复杂并且需要额外的查询。

  2. 随机值和哈希值有意分散在数轴上。这 (a) 将需要更多的 B 树索引始终保留在内存中,并且 (b) 可能会由于 B 树重新平衡而导致可变的插入性能。MongoDB 经过优化以处理按升序排列的 ObjectID(时间粒度为一秒)。你可能最好坚持使用它们。

回答by Sammaye

Whether it is good or bad depends upon it's uniqueness. Of course the ObjectId provided by MongoDB is quite unique so this is a good thing. So long as you can replicate that uniqueness then you should be fine.

它的好坏取决于它的独特性。当然 MongoDB 提供的 ObjectId 是非常独特的,所以这是一件好事。只要您可以复制这种独特性,那么您应该没问题。

There are no inherent risks/performance loses by using your own ID. I guess using it in string form might use up more index/storage/querying power but there you are using it in MongoID (ObjectId) form which should preserve the strengths of not storing it in a simple string.

使用您自己的 ID 不会带来固有的风险/性能损失。我想以字符串形式使用它可能会消耗更多的索引/存储/查询能力,但是您在 MongoID (ObjectId) 形式中使用它应该保留不将其存储在简单字符串中的优势。

回答by Joe

I just found out an answer to one of my questions, regarding indexing performance:

我刚刚找到了一个关于索引性能的问题的答案:

If the _id's are in a somewhat well defined order, on inserts the entire b-tree for the _id index need not be loaded. BSON ObjectIds have this property.

如果 _id 的顺序有些明确,则在插入时不需要加载 _id 索引的整个 b 树。BSON ObjectIds 有这个属性。

Source: http://www.mongodb.org/display/DOCS/Optimizing+Object+IDs

来源:http: //www.mongodb.org/display/DOCS/Optimizing+Object+IDs