使用具有大量 JSON 字段的 MongoDB 与 MySQL?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12934385/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using MongoDB vs MySQL with lots of JSON fields?
提问by DhruvPathak
There is a microblogging type of application. Two main basic database stores zeroed upon are: MySQL or MongoDB.
有一种微博类型的应用程序。两个主要的基本数据库存储是:MySQL 或 MongoDB。
I am planning to denormalize lot of data I.e. A vote done on a post is stored in a voting table, also a count is incremented in the main posts table. There are other actions involved with the post too (e.g. Like, vote down).
我计划对大量数据进行非规范化,即对帖子进行的投票存储在投票表中,并且主帖子表中的计数也会增加。帖子还涉及其他操作(例如,点赞、否决)。
If I use MySQL, some of the data better suits as JSON than fixed schema, for faster lookups.
如果我使用 MySQL,某些数据更适合作为 JSON 而不是固定模式,以便更快地查找。
E.g.
例如
POST_ID | activity_data
213423424 | { 'likes': {'count':213,'recent_likers' :
['john','Hyman',..fixed list of recent N users]} , 'smiles' :
{'count':345,'recent_smilers' :
['mary','Hyman',..fixed list of recent N users]} }
There are other components of the application as well, where usage of JSON is being proposed. So, to update a JSON field, the sequence is:
应用程序的其他组件也被提议使用 JSON。因此,要更新 JSON 字段,顺序是:
Read the JSON in python script.
Update the JSON
Store the JSON back into MySQL.
在 python 脚本中读取 JSON。
更新 JSON
将 JSON 存储回 MySQL。
It would have been single operation in MongoDB with atomic operations like $push,$inc,$pulletc. Also
document structure of MongoDB suits my data well.
这本来是MongoDB中单次操作与像原子操作$push,$inc,$pull等等。同样的MongoDB的文档结构适合我的数据很好。
My considerations while choosing the data store.
我在选择数据存储时的考虑。
Regarding MySQL:
关于 MySQL:
- Stable and familiar.
- Backup and restore is easy.
- Some future schema changes can be avoided using some fields as schemaless JSON.
- May have to use layer of memcached early.
- JSON blobs will be static in some tables like main Posts, however will be updated alot in some other tables like Post votes and likes.
- 稳定而熟悉。
- 备份和恢复很容易。
- 使用一些字段作为无模式 JSON 可以避免一些未来的模式更改。
- 可能要早点使用memcached层。
- JSON blob 在某些表(如主帖子)中将是静态的,但会在其他一些表(如投票和喜欢)中大量更新。
Regarding MongoDB:
关于 MongoDB:
- Better suited to store schema less data as documents.
- Caching might be avoided till a later stage.
- Sometimes the app may become write intensive, MongoDB can perform better at those points where unsafe writes are not an issue.
- Not sure about stability and reliability.
- Not sure about how easy is it to backup and restore.
- 更适合将模式较少的数据存储为文档。
- 缓存可能会在稍后阶段避免。
- 有时应用程序可能会变得写入密集,MongoDB 可以在不安全写入不是问题的那些点上表现得更好。
- 不确定稳定性和可靠性。
- 不确定备份和恢复有多容易。
Questions:
问题:
- Shall we chose MongoDB if half of data is schemaless, and is being stored as JSON if using MySQL?
Some of the data like main posts is critical, so it will be saved using safe writes, the counters etc will be saved using unsafe writes. Is this policy based on importance of data, and write intensiveness correct?
How easy is it to monitor, backup and restore MongoDB as compared to MySQL? We need to plan periodic backups ( say daily ), and restore them with ease in case of disaster. What are the best options I have with MongoDB to make it a safe bet for the application.
- 如果一半数据是无模式的,并且如果使用 MySQL,我们将作为 JSON 存储,我们是否应该选择 MongoDB?
一些数据(如主要帖子)至关重要,因此将使用安全写入保存,计数器等将使用不安全写入保存。这个基于数据重要性和写入密集度的策略是否正确?
与 MySQL 相比,监控、备份和恢复 MongoDB 有多容易?我们需要计划定期备份(比如每天),并在发生灾难时轻松恢复它们。我对 MongoDB 的最佳选择是什么,使其成为应用程序的安全选择。
Stability, backup, snapshots, restoring, wider adoption I.e.database durability are the reasons pointing me to use MySQL as RDBMS+NoSql even though a NoSQL document storage could serve my purpose better.
稳定性、备份、快照、恢复、更广泛的采用 Iedatabase 持久性是我将 MySQL 用作 RDBMS+NoSql 的原因,即使 NoSQL 文档存储可以更好地满足我的目的。
Please focus your views on the choice between MySQL and MongoDB considering the database design I have in mind. I know there could be better ways to plan database design with either RDBMS or MongoDB documents. But that is not the current focus of my question.
考虑到我想到的数据库设计,请把你的观点集中在 MySQL 和 MongoDB 之间的选择上。我知道可以有更好的方法来使用 RDBMS 或 MongoDB 文档来规划数据库设计。但这不是我当前问题的重点。
UPDATE: From MySQL 5.7 onwards, MySQL supports a rich native JSON datatypewhich provides data flexibility as well as rich JSON querying.
更新:从 MySQL 5.7 开始,MySQL 支持丰富的原生 JSON 数据类型,它提供了数据灵活性以及丰富的 JSON 查询。
采纳答案by Fuwjax
So, to directly answer the questions...
所以,直接回答问题...
Shall we chose mongodb if half of data is schemaless, and is being stored as JSON if using MySQL?
如果一半数据是无模式的,我们应该选择 mongodb,如果使用 MySQL,我们将作为 JSON 存储吗?
Schemaless storage is certainly a compelling reason to go with MongoDB, but as you've pointed out, it's fairly easy to store JSON in a RDBMS as well. The power behind MongoDB is in the rich queries against schemaless storage.
无模式存储当然是使用 MongoDB 的一个令人信服的理由,但正如您所指出的,在 RDBMS 中存储 JSON 也相当容易。MongoDB 背后的强大功能在于针对无模式存储的丰富查询。
If I might point out a small flaw in the illustration about updating a JSON field, it's not simply a matter of getting the current value, updating the document and then pushing it back to the database. The process must all be wrapped in a transaction. Transactions tend to be fairly straightforward, until you start denormalizing your database. Then something as simple as recording an upvote can lock tables all over your schema.
如果我可以指出关于更新 JSON 字段的插图中的一个小缺陷,这不仅仅是获取当前值、更新文档然后将其推回数据库的问题。该过程必须全部包含在一个事务中。事务往往相当简单,直到您开始对数据库进行非规范化。然后像记录赞成票这样简单的事情就可以锁定整个架构中的表。
With MongoDB, there are no transactions. But operations can almost always be structured in a way that allow for atomic updates. This usually involves some dramatic shifts from the SQL paradigms, but in my opinion they're fairly obvious once you stop trying to force objects into tables. At the very least, lots of other folks have run into the same problems you'll be facing, and the Mongo community tends to be fairly open and vocal about the challenges they've overcome.
使用 MongoDB,没有事务。但是操作几乎总是可以以允许原子更新的方式构建。这通常涉及与 SQL 范式的一些戏剧性转变,但在我看来,一旦您停止尝试将对象强制放入表中,它们就相当明显了。至少,许多其他人遇到了您将面临的相同问题,并且 Mongo 社区往往对他们所克服的挑战持相当开放和直言不讳的态度。
Some of the data like main posts is critical , so it will be saved using safe writes , the counters etc will be saved using unsafe writes. Is this policy based on importance of data, and write intensiveness correct?
一些像主要帖子这样的数据是关键的,所以它将使用安全写入保存,计数器等将使用不安全写入保存。这个基于数据重要性和写入密集度的策略是否正确?
By "safe writes" I assume you mean the option to turn on an automatic "getLastError()" after every write. We have a very thin wrapper over a DBCollection that allows us fine grained control over when getLastError() is called. However, our policy is not based on how "important" data is, but rather whether the code following the query is expecting any modifications to be immediately visible in the following reads.
通过“安全写入”,我假设您的意思是在每次写入后打开自动“getLastError()”的选项。我们在 DBCollection 上有一个非常薄的包装器,它允许我们对何时调用 getLastError() 进行细粒度控制。但是,我们的策略不是基于数据的“重要”程度,而是基于查询后面的代码是否期望任何修改在以下读取中立即可见。
Generally speaking, this is still a poor indicator, and we have instead migrated to findAndModify() for the same behavior. On the occasion where we still explicitly call getLastError() it is when the database is likely to reject a write, such as when we insert() with an _id that may be a duplicate.
一般来说,这仍然是一个糟糕的指标,我们已经迁移到 findAndModify() 以实现相同的行为。在我们仍然显式调用 getLastError() 的情况下,数据库可能会拒绝写入,例如当我们使用可能重复的 _id 的 insert() 时。
How easy is it to monitor,backup and restore Mongodb as compared to mysql? We need to plan periodic backups (say daily), and restore them with ease in case of disaster. What are the best options I have with mongoDb to make it a safe bet for the application?
与 mysql 相比,监控、备份和恢复 Mongodb 有多容易?我们需要计划定期备份(比如每天),并在发生灾难时轻松恢复它们。我对 mongoDb 的最佳选择是什么,使其成为应用程序的安全选择?
I'm afraid I can't speak to whether our backup/restore policy is effective as we have not had to restore yet. We're following the MongoDB recommendations for backing up; @mark-hillick has done a great job of summarizing those. We're using replica sets, and we have migrated MongoDB versions as well as introduced new replica members. So far we've had no downtime, so I'm not sure I can speak well to this point.
恐怕我不能说我们的备份/恢复策略是否有效,因为我们还没有恢复。我们正在遵循 MongoDB 的备份建议;@mark-hillick 在总结这些方面做得很好。我们正在使用副本集,并且我们已经迁移了 MongoDB 版本并引入了新的副本成员。到目前为止,我们还没有停机,所以我不确定我是否能说得好。
Stability,backup,snapshots,restoring,wider adoption i.e.database durability are the reasons pointing me to use MySQL as RDBMS+NoSql even though a NoSQL document storage could serve my purpose better.
稳定性、备份、快照、恢复、更广泛的采用 iedatabase 持久性是我将 MySQL 用作 RDBMS+NoSql 的原因,即使 NoSQL 文档存储可以更好地满足我的目的。
So, in my experience, MongoDB offers storage of schemaless data with a set of query primitives rich enough that transactions can often be replaced by atomic operations. It's been tough to unlearn 10+ years worth of SQL experience, but every problem I've encountered has been addressed by the community or 10gen directly. We have not lost data or had any downtime that I can recall.
因此,根据我的经验,MongoDB 通过一组足够丰富的查询原语提供无模式数据的存储,事务通常可以被原子操作取代。忘记 10 多年的 SQL 经验是很困难的,但我遇到的每个问题都已由社区或 10gen 直接解决。我们没有丢失数据或有任何我记得的停机时间。
To put it simply, MongoDB is hands down the best data storage ecosystem I have ever used in terms of querying, maintenance, scalability, and reliability. Unless I had an application that was so clearly relational that I could not in good conscience use anything other than SQL, I would make every effort to use MongoDB.
简而言之,在查询、维护、可扩展性和可靠性方面,MongoDB 是我用过的最好的数据存储生态系统。除非我有一个应用程序的关系如此明显以至于我不能凭良心使用 SQL 以外的任何东西,否则我会尽一切努力使用 MongoDB。
I don't work for 10gen, but I'm very grateful for the folks who do.
我不为 10gen 工作,但我非常感谢为 10gen 工作的人。
回答by Mark Hillick
I'm not going to comment on the comparisons (I work for 10gen and don't feel it's appropriate for me to do so), however, I will answer the specific MongoDB questions so that you can better make your decision.
我不会对这些比较发表评论(我为 10gen 工作,我觉得这样做不合适),但是,我会回答具体的 MongoDB 问题,以便您可以更好地做出决定。
Back-Up
备份
Documentation hereis very thorough, covering many aspects:
这里的文档非常详尽,涵盖了很多方面:
- Block-Level Methods (LVM makes it very easy and quite a lot of folk do this)
- With/Without Journaling
- EBS Snapshots
- General Snapshots
- Replication (technically not back-up, however, a lot of folk use replica sets for their redundancy and back-up - not recommending this but it is done)
- 块级方法(LVM 使它变得非常简单,很多人都这样做)
- 有/没有日记
- EBS 快照
- 一般快照
- 复制(从技术上讲不是备份,但是,很多人使用副本集来实现冗余和备份 - 不推荐这样做,但已完成)
Until recently, there is no MongoDB equivalent of mylvmbackupbut a nice guy wrote one :) In his words
直到最近,还没有 MongoDB 等价物,mylvmbackup但是一个好人写了一个 :) 用他的话来说
Early days so far: it's just a glorified shell script and needs way more error checking. But already it works for me and I figured I'd share the joy. Bug reports, patches & suggestions welcome.
到目前为止的早期:它只是一个美化的 shell 脚本,需要更多的错误检查。但它已经对我有用,我想我会分享快乐。欢迎错误报告、补丁和建议。
Get yourself a copy from here.
从这里给自己一个副本。
Restores
恢复
mongodumpis completely documented hereand mongorestore is here.
mongodump完整记录在这里,mongorestore 在这里。
mongodumpwill not contain the indexes but does contain the system.indexes collection so mongorestore can rebuild the indexes when you restore the bson file. The bson file is the actual data whereas mongoexport/mongoimportare not type-safe so it could be anything (techically speaking) :)
mongodump将不包含索引,但包含 system.indexes 集合,因此 mongorestore 可以在您恢复 bson 文件时重建索引。bson 文件是实际数据,mongoexport/mongoimport但不是类型安全的,所以它可以是任何东西(从技术上讲):)
Monitoring
监控
Documented here.
记录在这里。
I like Cacti but afaik, the Cacti templates have not kept up with the changes in MongoDB and so rely on old syntax so post 2.0.4, I believe there are issues.
我喜欢 Cacti,但是 afaik,Cacti 模板没有跟上 MongoDB 中的变化,因此依赖旧语法,所以在 2.0.4 之后,我相信存在问题。
Nagios works well but it's Nagios so you either love or hate it. A lot of folk use Nagios and it seems to provide them with great visiblity.
Nagios 运行良好,但它是 Nagios,所以您要么喜欢它,要么讨厌它。许多人使用 Nagios,它似乎为他们提供了很好的可见性。
I've heard of some folk looking at Zappix but I've never used it so can't comment.
我听说有些人在看 Zappix,但我从未使用过它,因此无法发表评论。
Additionally, you can use MMS, which is free and hosted externally. Your MongoDB instances run an agent and one of those agents communicate (using python code) over https to mms.10gen.com. We use MMS to view all performance statistics on the MongoDB instances and it is very beneficial from a high-level wide view as well as offering the ability to drill down. It's simple to install and you don't have to run any hardware for this. Many customers run it and some compliment it with Cacti/Nagios.
此外,您还可以使用免费的外部托管的 MMS。您的 MongoDB 实例运行一个代理,其中一个代理通过 https 与 mms.10gen.com 通信(使用 python 代码)。我们使用 MMS 来查看 MongoDB 实例的所有性能统计信息,从高层次的广泛视图以及提供向下钻取的能力来看,这是非常有益的。安装很简单,您无需为此运行任何硬件。许多客户运行它,有些客户用 Cacti/Nagios 称赞它。
Help information on MMS can be found here(it's a very detailed, inclusive document).
可以在此处找到有关 MMS 的帮助信息(这是一份非常详细的包容性文档)。
回答by DavidA
One of the disadvantages of a mysql solution with stored json is that you will not be able to efficiently search on the json data. If you store it all in mongodb, you can create indexes and/or queries on all of your data including the json.
使用存储的 json 的 mysql 解决方案的缺点之一是您将无法有效地搜索 json 数据。如果您将其全部存储在 mongodb 中,则可以对包括 json 在内的所有数据创建索引和/或查询。
Mongo's writes work very well, and really the only thing you lose vs mysql is transaction support, and thus the ability to rollback multipart saves. However, if you are able to commit your changes in atomic operations, then there isn't a data safety issue. If you are replicated, mongo provides an "eventually consistent" promise such that the slaves will eventually mirror the master.
Mongo 的编写工作非常好,与 mysql 相比,您真正失去的唯一一件事是事务支持,因此回滚多部分保存的能力。但是,如果您能够在原子操作中提交更改,那么就不存在数据安全问题。如果您被复制,mongo 会提供“最终一致”的承诺,以便从服务器最终会镜像主服务器。
Mongodb doesn't provide native enforcement or cascading of certain db constructs such as foreign keys, so you have to manage those yourself (such as either through composition, which is one of mongo's strenghts), or through use of dbrefs.
Mongodb 不提供某些 db 结构(例如外键)的本地实施或级联,因此您必须自己管理这些结构(例如通过组合,这是 mongo 的优势之一),或通过使用 dbrefs。
If you really need transaction support and robust 'safe' writes, yet still desire the flexibility provided by nosql, you might consider a hybrid solution. This would allow you to use mysql as your main post store, and then use mongodb as your 'schemaless' store. Here is a link to a doc discussing hybrid mongo/rdbms solutions: http://www.10gen.com/events/hybrid-applicationsThe article is from 10gen's site, but you can find other examples simply by doing a quick google search.
如果您确实需要事务支持和健壮的“安全”写入,但仍需要 nosql 提供的灵活性,您可以考虑使用混合解决方案。这将允许您使用 mysql 作为您的主要帖子存储,然后使用 mongodb 作为您的“无模式”存储。这是讨论混合 mongo/rdbms 解决方案的文档的链接: http://www.10gen.com/events/hybrid-applications 该文章来自 10gen 的站点,但您可以通过快速 google 搜索找到其他示例。
Update 5/28/2019
2019 年 5 月 28 日更新
The here have been a number of changes to both MySQL and Mongodb since this answer was posted, so the pros/cons between them have become even blurrier. This update doesn't really help with the original question, but I am doing it to make sure any new readers have a bit more recent information.
自从发布这个答案以来,MySQL 和 Mongodb 都发生了许多变化,因此它们之间的优缺点变得更加模糊。此更新对原始问题没有真正帮助,但我这样做是为了确保任何新读者都能获得更多最新信息。
MongoDB now supports transactions: https://docs.mongodb.com/manual/core/transactions/
MongoDB 现在支持事务:https: //docs.mongodb.com/manual/core/transactions/
MySql now supports indexing and searching json fields: https://dev.mysql.com/doc/refman/5.7/en/json.html
MySql 现在支持索引和搜索 json 字段:https: //dev.mysql.com/doc/refman/5.7/en/json.html

