mongodb 在 Mongo 中,分片和复制有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11571273/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In Mongo what is the difference between sharding and replication?
提问by nickponline
Replication seems to be a lot simpler than sharding, unless I am missing the benefits of what sharding is actually trying to achieve. Don't they both provide horizontal scaling?
复制似乎比分片简单得多,除非我错过了分片实际试图实现的好处。他们不是都提供水平缩放吗?
回答by Stennie
In the context of scaling MongoDB:
在扩展 MongoDB 的背景下:
replicationcreates additional copies of the data and allows for automatic failover to another node. Replication may help with horizontal scaling of reads if you are OK to read data that potentially isn't the latest.
shardingallows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key. It's important to choose a good shard key. For example, a poor choice of shard key could lead to "hot spots" of data only being written on a single shard.
复制会创建额外的数据副本,并允许自动故障转移到另一个节点。如果您可以读取可能不是最新的数据,复制可能有助于读取的水平扩展。
分片允许通过使用分片键跨多个服务器对数据进行分区来横向扩展数据写入。到是很重要的选择一个好的片键。例如,分片键选择不当可能会导致数据的“热点”仅写入单个分片。
A sharded environment does add more complexitybecause MongoDB now has to manage distributing data and requests between shards -- additional configuration and routing processes are added to manage those aspects.
分片环境确实增加了更多的复杂性,因为 MongoDB 现在必须管理分片之间的数据和请求分布——添加了额外的配置和路由过程来管理这些方面。
Replication and sharding are typically combined to created a sharded clusterwhere each shard is supported by a replica set.
复制和分片通常结合起来创建一个分片集群,其中每个分片都由一个副本集支持。
From a client application point of view you also have some control in relation to the replication/sharding interaction, in particular:
从客户端应用程序的角度来看,您还可以对复制/分片交互进行一些控制,特别是:
回答by Akusi
Consider you have a great music collection on your hard disk, you store the music in logical order based on year of release in different folders. You are concerned that your collection will be lost if drive fails. So you get a new disk and occasionally copy the entire collection keeping the same folder structure.
考虑到您的硬盘上有一个很棒的音乐收藏,您可以根据不同文件夹中的发行年份按逻辑顺序存储音乐。您担心如果驱动器出现故障,您的收藏将会丢失。所以你会得到一个新磁盘,偶尔会复制整个集合,保持相同的文件夹结构。
Sharding >> Keeping your music files in different folders
分片 >> 将您的音乐文件保存在不同的文件夹中
Replication >> Syncing your collection to other drives
复制 >> 将您的收藏同步到其他驱动器
回答by MrKurt
Replication is a mostly traditional master/slave setup, data is synced to backup members and if the primary fails one of them can take its place. It is a reasonably simple tool. It's primarily meant for redundancy, although you can scale reads by adding replica set members. That's a little complicated, but works very well for some apps.
复制主要是传统的主/从设置,数据同步到备份成员,如果主要成员发生故障,其中一个可以取代它。这是一个相当简单的工具。它主要用于冗余,尽管您可以通过添加副本集成员来扩展读取。这有点复杂,但对某些应用程序非常有效。
Sharding sits on top of replication, usually. "Shards" in MongoDB are just replica sets with something called a "router" in front of them. Your application will connect to the router, issue queries, and it will decide which replica set (shard) to forward things on to. It's significantly more complex than a single replica set because you have the router and config servers to deal with (these keep track of what data is stored where).
通常,分片位于复制之上。MongoDB 中的“分片”只是副本集,它们前面有一个叫做“路由器”的东西。您的应用程序将连接到路由器,发出查询,并决定将内容转发到哪个副本集(分片)。它比单个副本集复杂得多,因为您需要处理路由器和配置服务器(这些服务器会跟踪哪些数据存储在何处)。
If you want to scale Mongo horizontally, you'd shard. 10gen likes to call the router/config server setup auto-sharding. It's possible to do a more ghetto form of sharding where you have the app decide which DB to write to as well.
如果你想水平扩展 Mongo,你会分片。10gen 喜欢调用路由器/配置服务器设置自动分片。可以进行更多贫民区形式的分片,您可以让应用程序决定写入哪个数据库。
回答by xameeramir
Sharding
分片
Sharding is a technique of splitting up a large collection amongst multiple servers. When we shard, we deploy multiple mongod
servers. And in the front, mongos
which is a router. The application talks to this router. This router then talks to various servers, the mongod
s. The application and the mongos
are usually co-located on the same server. We can have multiple mongos
services running on the same machine. It's also recommended to keep set of multiple mongod
s (together called replica set), instead of one single mongod
on each server. A replica set keeps the data in sync across several different instances so that if one of them goes down, we won't lose any data. Logically, each replica set can be seen as a shard. It's transparent to the application, the way MongoDB
chooses to shard is we choose a shard key.
分片是一种在多个服务器之间拆分大型集合的技术。当我们分片时,我们部署了多个mongod
服务器。而在前面,mongos
这是一个路由器。应用程序与该路由器对话。该路由器然后与各种服务器通信mongod
。应用程序和 应用程序mongos
通常位于同一台服务器上。我们可以mongos
在同一台机器上运行多个服务。还建议保留一组多个mongod
s(统称为副本集),而不是mongod
在每个服务器上保留一个。副本集在多个不同实例之间保持数据同步,因此如果其中一个实例发生故障,我们不会丢失任何数据。逻辑上,每个副本集可以被看作是一个碎片. 它对应用程序是透明的,MongoDB
选择分片的方式是我们选择一个分片键。
Assume, for student
collection we have stdt_id
as the shard key or it could be a compound key. And the mongos
server, it's a range based system. So based on the stdt_id
that we send as the shard key, it'll send the request to the right mongod
instance.
假设,对于student
集合,我们将其stdt_id
作为分片键,也可以是复合键。而mongos
服务器,它是一个基于范围的系统。因此,基于stdt_id
我们作为分片键发送的 ,它会将请求发送到正确的mongod
实例。
So, what do we need to really know as a developer?
那么,作为开发人员,我们真正需要了解什么?
insert
must include a shard key, so if it's a multi-parted shard key, we must include the entire shard key- we've to understand what the shard key is on collection itself
- for an
update
,remove
,find
- ifmongos
is not given a shard key - then it's going to have to broadcast the request to all the different shards that cover the collection. - for an
update
- if we don't specify the entire shard key, we have to make it a multi update so that it knows that it needs to broadcast it
insert
必须包含一个分片键,所以如果它是一个多部分的分片键,我们必须包含整个分片键- 我们必须了解集合本身的分片键是什么
- 对于
update
,remove
,find
- 如果mongos
没有给出分片键 - 那么它将必须将请求广播到涵盖集合的所有不同分片。 - 对于一个
update
- 如果我们不指定整个分片键,我们必须使它成为一个多重更新,以便它知道它需要广播它
回答by Sayat Satybald
Whenever you're thinking about sharding or replication, you need to think in the context of writers/update operations. If you don't need to scale writes then replications, as it fairly simpler, is a good choice for you.
每当您考虑分片或复制时,您都需要考虑编写者/更新操作的上下文。如果您不需要扩展写入,那么复制(因为它相当简单)对您来说是一个不错的选择。
On the other hand, if you workload mostly updates/writes then at some point you'll hit a write bottleneck. If write request comes Mongo blocks other writes request. Those write request blocks until the first request will be done. If you want to scale this writes and want parallelize it then you need to implement sharding.
另一方面,如果您的工作负载主要是更新/写入,那么在某些时候您会遇到写入瓶颈。如果写入请求到来 Mongo 会阻止其他写入请求。这些写请求会阻塞,直到第一个请求完成。如果您想扩展此写入并对其进行并行化,则需要实施分片。