使用 MongoDB 代替 MS SQL Server 的优缺点
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13190468/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pros and Cons of using MongoDB instead of MS SQL Server
提问by theGeekster
I am new to NoSQL world and thinking of replacing my MS Sql Server database to MongoDB. My application (written in .Net C#) interacts with IP Cameras and records meta data for each image coming from Camera, into MS SQL Database. On average, i am inserting about 86400 records per day for each camera and in current database schema I have created separate table for separate Camera images, e.g. Camera_1_Images, Camera_2_Images ... Camera_N_Images. Single image record consists of simple metadata info. like AutoId, FilePath, CreationDate. To add more details to this, my application initiates separate process (.exe) for each camera and each process inserts 1 record per second in relative table in database.
我是 NoSQL 世界的新手,正在考虑将我的 MS Sql Server 数据库替换为 MongoDB。我的应用程序(用 .Net C# 编写)与 IP 摄像机交互并将来自摄像机的每个图像的元数据记录到 MS SQL 数据库中。平均而言,我每天为每台摄像机插入大约 86400 条记录,并且在当前的数据库模式中,我为单独的摄像机图像创建了单独的表,例如 Camera_1_Images、Camera_2_Images ... Camera_N_Images。单个图像记录由简单的元数据信息组成。像 AutoId、FilePath、CreationDate。为了添加更多细节,我的应用程序为每个摄像头启动了单独的进程 (.exe),每个进程每秒在数据库的相关表中插入 1 条记录。
I need suggestions from (MongoDB) experts on following concerns:
我需要(MongoDB)专家就以下问题提出建议:
to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?
What should be the specs of server (CPU, RAM, Disk)? any suggestion?
Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?
Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images? I am thinking on this with respect to splitting reads and writes on separate databases. Because all read requests might be served by second database and writes to first one. Will it benefit or not? If yes then any idea to ensure that both databases are synced always.
判断 MongoDB 是否适合保存此类数据,这些数据最终将根据时间范围进行查询(例如,检索特定小时内特定相机的所有图像)?关于我的案例的基于文档的架构设计有什么建议吗?
服务器的规格应该是什么(CPU、RAM、磁盘)?有什么建议吗?
对于这种情况,我是否应该考虑分片/复制(同时考虑写入同步副本集的性能)?
在同一台机器上使用多个数据库有什么好处,这样一个数据库将保存所有摄像机当天的图像,第二个将用于存档前一天的图像?我正在考虑在单独的数据库上拆分读取和写入。因为所有读取请求都可能由第二个数据库提供服务并写入第一个数据库。它会不会受益?如果是,那么有什么想法可以确保两个数据库始终同步。
Any other suggestions are welcomed please.
欢迎任何其他建议。
采纳答案by Aravind Yarram
I am myself a starter on NoSQL databases. So I am answering this at the expense of potential down votes but it will be a great learning experience for me.
我自己是 NoSQL 数据库的初学者。所以我以牺牲潜在的反对票为代价来回答这个问题,但这对我来说将是一次很好的学习经历。
Before trying my best to answer your questions I should say that if MS SQL Server is working well for you then stick with it. You have not mentioned any valid reason WHY you want to use MongoDB except the fact that you learnt about it as a document oriented db. Moreover I see that you have almost the same set of meta-data you are capturing for each camera i.e. your schema is dynamic.
在尽力回答您的问题之前,我应该说,如果 MS SQL Server 对您来说运行良好,那么请坚持下去。您没有提到为什么要使用 MongoDB 的任何正当理由,除了您将其作为面向文档的 db 了解的事实。此外,我看到您为每个相机捕获的元数据几乎相同,即您的架构是动态的。
- to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?
- 判断 MongoDB 是否适合保存此类数据,这些数据最终将根据时间范围进行查询(例如,检索特定小时内特定相机的所有图像)?关于我的案例的基于文档的架构设计有什么建议吗?
MongoDB being a document oriented db, is good at querying withinan aggregate (you call it document). Since you already are storing each camera's data in its own table, in MongoDB you will have a separate collectioncreated for each camera. Here is howyou perform date range queries.
MongoDB 是一个面向文档的数据库,擅长在聚合内进行查询(你称之为文档)。由于您已经将每个摄像头的数据存储在其自己的表中,因此在 MongoDB 中,您将为每个摄像头创建一个单独的集合。以下是执行日期范围查询的方法。
- What should be the specs of server (CPU, RAM, Disk)? any suggestion?
- 服务器的规格应该是什么(CPU、RAM、磁盘)?有什么建议吗?
All NoSQL data bases are built to scale-outon commodity hardware. But by the way you have asked the question, you might be thinking of improving performance by scaling-up. You can start with a reasonable machine and as the load increases, you can keep adding more servers (scaling-out). You no need to plan and buy a high end server.
所有 NoSQL 数据库都是为在商品硬件上横向扩展而构建的。但是,按照您提出的问题的方式,您可能正在考虑通过扩展来提高性能。您可以从一台合理的机器开始,随着负载的增加,您可以继续添加更多服务器(横向扩展)。您无需计划和购买高端服务器。
- Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?
- 对于这种情况,我是否应该考虑分片/复制(同时考虑写入同步副本集的性能)?
MongoDB locks the entire dbfor a single write (but yields for other operations) and is meant for systems which have more reads than writes. So this depends upon how your system is. There are multiple ways of sharding and should be domain specific. A generic answer is not possible. However some examples can be given like sharding by geography, by branches etc.
MongoDB为单个写入锁定整个数据库(但会为其他操作产生收益),并且适用于读取多于写入的系统。所以这取决于你的系统如何。有多种分片方式,并且应该是特定于域的。一般的答案是不可能的。但是,可以给出一些示例,例如按地理、按分支等进行分片。
Also read A plain english introduction to CAP Theorem
Updated with answer to the comment on sharding
更新了对分片评论的回答
According to their documentation, You should consider deploying a sharded cluster, if:
根据他们的文档,您应该考虑部署分片集群,如果:
- your data set approaches or exceeds the storage capacity of a single node in your system.
- the size of your system's active working set will soon exceed the capacity of the maximum amount of RAM for your system.
- your system has a large amount of write activity, a single MongoDB instance cannot write data fast enough to meet demand, and all other approaches have not reduced contention.
- 您的数据集接近或超过系统中单个节点的存储容量。
- 您系统的活动工作集的大小将很快超过系统的最大 RAM 容量。
- 您的系统有大量的写入活动,单个 MongoDB 实例无法足够快地写入数据以满足需求,并且所有其他方法都没有减少争用。
So based upon the last point yes. The auto-sharding feature is built to scale writes. In that case, you have a write lock per shard, not per database. But mine is a theoretical answer. I suggest you take consultation from 10gen.com group.
所以基于最后一点是的。自动分片功能旨在扩展写入。在这种情况下,每个shard都有一个写锁,而不是每个database。但我的是一个理论上的答案。建议你去10gen.com群咨询。
回答by Sammaye
to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)?
判断 MongoDB 是否适合保存此类数据,这些数据最终将根据时间范围进行查询(例如,检索特定小时内特定相机的所有图像)?
This quiestion is too subjective for me to answer. From personal experience with numerous SQL solutions (ironically not MS SQL) I would say they are both equally as good, if done right.
这个问题太主观了,我无法回答。从个人对众多 SQL 解决方案(具有讽刺意味的是不是 MS SQL)的经验来看,如果做得对,我会说它们都一样好。
Also:
还:
What should be the specs of server (CPU, RAM, Disk)? any suggestion?
服务器的规格应该是什么(CPU、RAM、磁盘)?有什么建议吗?
Depends on too many variables that only you know, however a small cluster of commodity hardware works quite well. I cannot really give a factual response to this question and it will come down to your testing.
取决于太多只有您知道的变量,但是一小部分商品硬件可以很好地工作。我无法真正对这个问题做出真实的回答,这将归结为您的测试。
As for a schema I would go for a document of the structure:
至于模式,我会去寻找结构的文档:
{
_id: {},
camera_name: "my awesome camera",
images: [
{
url: "http://I_like_S3_here.amazons3.com/my_image.png" ,
// All your other fields per image
}
]
}
This should be quite easy to mantain and update so long as you are not embedding much deeper since then it could become a bit of pain, however, that depends upon your queries.
这应该很容易维护和更新,只要您没有嵌入更深,从那时起它可能会变得有点痛苦,但是,这取决于您的查询。
Not only that but this should be good for sharding since you have all the data you need in one document, if you were to shard on _id
you could probably get the perfect setup here.
不仅如此,这应该有利于分片,因为您在一个文档中拥有所需的所有数据,如果您要分片,_id
您可能会在这里获得完美的设置。
Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?
对于这种情况,我是否应该考虑分片/复制(同时考虑写入同步副本集的性能)?
Possibly, many people assume they need to shard when in reality they just need to be more intelligent in how they design the database. MongoDB is very free form so there are a lot of ways to do it wrong, but that being said, there are also a lot of ways of dong it right. I personally would keep sharding in mind. Replication can be very useful too.
可能很多人认为他们需要分片,但实际上他们只需要在如何设计数据库方面更加智能。MongoDB 是非常自由的形式,所以有很多方法可以做错,但话虽如此,也有很多方法可以做对。我个人会牢记分片。复制也非常有用。
Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images?
在同一台机器上使用多个数据库有什么好处,这样一个数据库将保存所有摄像机当天的图像,第二个将用于存档前一天的图像?
Even though MongoDBs write lock is on DB level (currently) I would say: No. The right document structure and the right sharding/replication (if needed) should be able to handle this in a single document based collection(s) under a single DB. Not only that but you can direct writes and reads within a cluster to certain servers so as to create a concurrency situation between certain machines in your cluster. I would promote the correct usage of MongoDBs concurrency features over DB separation.
即使 MongoDB 的写锁是在 DB 级别(当前)我会说:不。正确的文档结构和正确的分片/复制(如果需要)应该能够在单个基于文档的集合中处理这个D B。不仅如此,您还可以将集群内的写入和读取定向到某些服务器,从而在集群中的某些机器之间创建并发情况。我会提倡正确使用 MongoDB 并发功能而不是数据库分离。
Edit
编辑
After reading the question again I omitted from my solution that you are inserting 80k+ images for each camera a day. As such instead of the embedded option I would actually make a row per image in a collection called images
and then a camera
collection and query the two like you would in SQL.
再次阅读该问题后,我从我的解决方案中省略了您每天为每台相机插入 80k+ 图像。因此,我实际上会在一个名为的集合中为每个图像创建一行,而不是嵌入选项images
,然后创建一个camera
集合并像在 SQL 中一样查询这两个选项。
Sharding the images
collection should be just as easy on camera_id
.
对images
集合进行分片应该同样容易camera_id
。
Also make sure you take you working set into consideration with your server.
还要确保您将工作集与服务器考虑在内。
回答by Chien-Wei Huang
to tell if MongoDB is good for holding such data, which eventually will be queried against time ranges (e.g. retrieve all images of a particular camera between a specified hour)? Any suggestions about Document Based schema design for my case?
判断 MongoDB 是否适合保存此类数据,这些数据最终将根据时间范围进行查询(例如,检索特定小时内特定相机的所有图像)?关于我的案例的基于文档的架构设计有什么建议吗?
MongoDB can do this. For better performance, you can set an index on your time field.
MongoDB 可以做到这一点。为了获得更好的性能,您可以在时间字段上设置索引。
What should be the specs of server (CPU, RAM, Disk)? any suggestion?
服务器的规格应该是什么(CPU、RAM、磁盘)?有什么建议吗?
I think RAM and Disk would be important.
我认为 RAM 和磁盘会很重要。
- If you don't want to do
sharding
toscale out
, you should consider a larger size of disk so you can store all your data in it. - Your hot data should can fit into your RAM. If not, then you should consider a larger RAM because the performance of MongoDB mainly depends on RAM.
- 如果你不想做的
sharding
到scale out
,你应该考虑磁盘的尺寸较大,因此您可以在其中存储的所有数据。 - 您的热数据应该可以放入您的 RAM 中。如果不是,那么你应该考虑更大的RAM,因为MongoDB的性能主要取决于RAM。
Should i consider Sharding/Replication for this scenario (while considering the performance in writing to synch replica sets)?
对于这种情况,我是否应该考虑分片/复制(同时考虑写入同步副本集的性能)?
I don't know many cameras do you have, even 1000 inserts/second with total 1000 cameras should still be easy to MongoDB. If you are concerning insert performance, I don't think you need to do sharding(Except the data size are too big that you have to separate them into several machines).
我不知道你有多少摄像头,即使 1000 次插入/秒,总共 1000 个摄像头,MongoDB 仍然应该很容易。如果您关心插入性能,我认为您不需要进行分片(除非数据量太大以至于您必须将它们分成几台机器)。
Another problem is the read frequency of your application. It it is very high, then you can consider sharding or replication here. And you can use (timestamp + camera_id) as your sharding key if your query only on one camera in a time range.
另一个问题是应用程序的读取频率。它很高,那么您可以在这里考虑分片或复制。如果您只在一个时间范围内的一台相机上查询,您可以使用 (timestamp + camera_id) 作为您的分片键。
Are there any benefits of using multiple databases on same machine, so that one database will hold images of current day for all cameras, and the second one will be used to archive previous day images?
在同一台机器上使用多个数据库有什么好处,这样一个数据库将保存所有摄像机当天的图像,第二个将用于存档前一天的图像?
You can separate the table into two collections(archive
and current
). And set index only on archive
if you only query date on archive
. Without the overhead of index creation, the current
collection should benefit with insert.
您可以将表分成两个集合(archive
和current
)。并且仅archive
当您仅在 上查询日期时才设置索引archive
。如果没有索引创建的开销,current
集合应该会受益于插入。
And you can write a daily program to dump the current
data into archive
.
您可以编写一个日常程序将current
数据转储到archive
.