我应该在 MongoDB 中实现自动递增吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6645277/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Should I implement auto-incrementing in MongoDB?
提问by Josh Smith
I'm making the switch to MongoDB from MySQL. A familiar architecture to me for a very basic users
table would have auto-incrementing of the uid
. See Mongo's own documentation for this use case.
我正在从 MySQL 切换到 MongoDB。对于一个非常基本的users
表,我熟悉的架构会自动递增uid
. 有关此用例,请参阅 Mongo 自己的文档。
I'm wondering whether this is the best architectural decision. From a UX standpoint, I like having UIDs as external references, for example in shorter URLs: http://example.com/users/12345
我想知道这是否是最好的架构决策。从用户体验的角度来看,我喜欢将 UID 作为外部引用,例如在较短的 URL 中:http://example.com/users/12345
Is there a third way? Someone in IRC Freenode's #mongodb
suggested creating a range of IDs and caching them. I'm unsure of how to actually implement that, or whether there's another route I can go. I don't necessarily even need the _id
itself to be incremented this way. As long as the users
all have a unique numerical uid
within the document, I would be happy.
有没有第三种方式?IRC Freenode 中的某个人#mongodb
建议创建一系列 ID 并缓存它们。我不确定如何实际实施,或者我是否可以走另一条路线。我什至不一定需要_id
以这种方式增加本身。只要文档中的users
所有数字都有唯一的数字uid
,我就会很高兴。
采纳答案by kheya
Josh, No auto-increment id in MongoDB and there are good reasons. I would say go with ObjectIds which are unique in the cluster.
Josh,MongoDB 中没有自动递增 id,这是有充分理由的。我会说使用集群中唯一的 ObjectIds。
You can add auto increment by a sequence collection and using findAndModify to get the next id to use. This will definitely add complexities to your application and may also affect the ability to shard your database.
您可以通过序列集合添加自动增量并使用 findAndModify 获取下一个要使用的 id。这肯定会增加您的应用程序的复杂性,并且还可能影响对数据库进行分片的能力。
As long as you can guarantee that your generated ids will be unique, you will be fine. But the headache will be there.
只要你能保证你生成的 id 是唯一的,你会没事的。但头疼会在那里。
You can look at this post for more info about this question in the dedicated google group for MongoDB:
您可以在 MongoDB 的专用 google 组中查看此帖子以获取有关此问题的更多信息:
Hope this helps.
希望这可以帮助。
Thanks
谢谢
回答by expert
I strongly disagree with author of selected answer that No auto-increment id in MongoDB and there are good reasons. We don't know reasons why 10gen didn't encourage usage of auto-incremented IDs. It's speculation. I think 10gen made this choice because it's just easier to ensure uniqueness of 12-byte IDs in clustered environment. It's default solution that fits most newcomers therefore increases product adoption which is good for 10gen's business.
我强烈不同意所选答案的作者在 MongoDB中没有自动递增 id 并且有充分的理由。我们不知道 10gen 不鼓励使用自动递增 ID 的原因。是猜测。我认为 10gen 做出这个选择是因为在集群环境中确保 12 字节 ID 的唯一性更容易。这是适合大多数新手的默认解决方案,因此增加了产品采用率,这对 10gen 的业务有利。
Now let me tell everyone about my experience with ObjectIds in commercial environment.
下面我给大家讲讲我在商业环境中使用ObjectIds的经验。
I'm building social network. We have roughly 6M users and each user has roughly 20 friends.
我正在建立社交网络。我们有大约 600 万用户,每个用户大约有 20 个朋友。
Now imagine we have a collection which stores relationship between users (who follows who). It looks like this
现在假设我们有一个存储用户(谁跟随谁)之间关系的集合。看起来像这样
_id : ObjectId
user_id : ObjectId
followee_id : ObjectId
on which we have unique composite index {user_id, followee_id}
. We can estimate size of this index to be 12*2*6M*20 = 2GB. Now that's index for fast look-up of people I follow. For fast look-up of people that follow me I need reverse index. That's another 2GB.
我们有唯一的复合索引{user_id, followee_id}
。我们可以估计这个索引的大小为 12*2*6M*20 = 2GB。现在这是快速查找我关注的人的索引。为了快速查找跟随我的人,我需要反向索引。那是另一个 2GB。
And this is just the beginning. I have to carry these IDs everywhere. We have activity cluster where we store your News Feed. That's every event you or your friends do. Imagine how much space it takes.
而这仅仅是个开始。我必须随身携带这些身。我们有活动集群,用于存储您的新闻提要。这就是您或您的朋友所做的每一件事。想象一下它需要多少空间。
And finally one of our engineers made an unconscious decision and decided to store references as strings that represent ObjectId which doubles its size.
最后,我们的一位工程师做出了一个无意识的决定,决定将引用存储为表示 ObjectId 的字符串,从而使其大小加倍。
What happens if an index does not fit into RAM? Nothing good, says 10gen:
如果索引不适合 RAM 会发生什么?没什么好说的,10gen 说:
When an index is too large to fit into RAM, MongoDB must read the index from disk, which is a much slower operation than reading from RAM. Keep in mind an index fits into RAM when your server has RAM available for the index combined with the rest of the working set.
当索引太大而无法放入 RAM 时,MongoDB 必须从磁盘读取索引,这比从 RAM 读取要慢得多。请记住,当您的服务器具有可用于索引与其余工作集的 RAM 时,索引适合 RAM。
That means reads are slow. Lock contention goes up. Writes gets slower as well. Seeing lock contention in 80%-nish is no longer shock to me.
这意味着读取速度很慢。锁争用上升。写入也会变慢。在 80%-nish 中看到锁争用不再让我感到震惊。
Before you know it you ended up with 460GB cluster which you have to split to shards and which is quite hard to manipulate.
在不知不觉中,您最终得到了 460GB 的集群,您必须将其拆分为分片,而且很难操作。
Facebook uses 64-bit long as user id :) There is a reason for that. You can generate sequential IDs
Facebook 使用 64 位长作为用户 ID :) 这是有原因的。您可以生成顺序 ID
- using 10gen's advice.
- using mysql as storage of counters (if you concerned about speed take a look at handlersocket)
- using ID generating service you built or using something like Snowflakeby Twitter.
- 使用10gen 的建议。
- 使用 mysql 作为计数器的存储(如果您关心速度,请查看handlersocket)
- 使用您构建的 ID 生成服务或使用Twitter 的Snowflake 之类的服务。
So here is my general advice to everyone. Please please make your data as small as possible. When you grow it will save you lots of sleepless nights.
所以这是我给大家的一般建议。请让您的数据尽可能小。当你成长时,它会为你节省很多不眠之夜。
回答by Gates VP
So, there's a fundamental problem with "auto-increment" IDs. When you have 10 different servers (shardsin MongoDB), who picks the next ID?
因此,“自动递增”ID 存在一个基本问题。当您有 10 个不同的服务器(MongoDB 中的分片)时,谁选择下一个 ID?
If you want a single set of auto-incrementing IDs, you have to have a single authority for picking those IDs. In MySQL, this is generally pretty easy as you just have one server accepting writes. But big deployments of MongoDB are running sharding which doesn't have this "central authority".
如果您想要一组自动递增的 ID,您必须拥有选择这些 ID 的单一权限。在 MySQL 中,这通常很容易,因为您只有一台服务器接受写入。但是 MongoDB 的大型部署正在运行没有这种“中央权威”的分片。
MongoDB, uses 12-byte ObjectIds
so that each server can create new documents uniquely without relying on a single authority.
MongoDB 使用 12 字节,ObjectIds
因此每个服务器都可以唯一地创建新文档,而无需依赖单一权限。
So here's the big question: "can you afford to have a single authority"?
所以这里有一个大问题:“你能负担得起拥有一个权威吗”?
If so, then you can use findAndModify
to keep track of the "last highest ID" and then you can insert with that.
如果是这样,那么您可以使用findAndModify
来跟踪“最后一个最高 ID”,然后您可以插入它。
That's the process described in your link. The obvious weakness here is that you technically have to do two writes for each insert. This may not scale very well, you probably want to avoid it on data with a high insertion rate. It may work for users, it probably won't work for tracking clicks.
这就是您的链接中描述的过程。这里明显的弱点是,从技术上讲,您必须为每个插入进行两次写入。这可能无法很好地扩展,您可能希望在具有高插入率的数据上避免它。它可能适用于用户,它可能不适用于跟踪点击。
回答by Andreas Jung
There is nothing like an auto-increment in MongoDB but you may store your own counters in a dedicated collection and $inc the related value of counter as needed. Since $inc is an atomic operation you won't see duplicates.
MongoDB 中没有像自动增量那样的东西,但您可以将自己的计数器存储在专用集合中,并根据需要 $inc 计数器的相关值。由于 $inc 是原子操作,因此您不会看到重复项。
回答by Gabe Rainbow
The default Mongo ObjectId -- the one used in the _id field -- is incrementing.
默认的 Mongo ObjectId —— _id 字段中使用的那个 —— 是递增的。
Mongo uses a timestamp ( seconds since the Unix epoch) as the first 4-byte portion of its 4-3-2-3 composition, very similar (if not exactly) the same composition as a Version 1 UUID. And that ObjectId is generated at time of insert (if no other type of _id is provided by the user/client)
Mongo 使用时间戳(自 Unix 纪元以来的秒数)作为其 4-3-2-3 组合的第一个 4 字节部分,与版本 1 UUID 的组合非常相似(如果不完全相同)。并且 ObjectId 在插入时生成(如果用户/客户端没有提供其他类型的 _id)
Thus the ObjectId is ordinal in nature; further, the default sort is based on this incrementing timestamp.
因此 ObjectId 本质上是有序的;此外,默认排序基于此递增时间戳。
One might consider it an updated version of the auto-incrementing (index++) ids used in many dbms.
人们可能会认为它是许多 dbms 中使用的自动递增 (index++) id 的更新版本。