database MongoDB 与 Cassandra

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2892729/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:44:36  来源:igfitidea点击:

MongoDB vs. Cassandra

mongodbdatabase-designcassandradatabase

提问by ming yeow

I am evaluating what might be the best migration option.

我正在评估什么可能是最好的迁移选项。

Currently, I am on a sharded MySQL (horizontal partition), with most of my data stored in JSON blobs. I do not have any complex SQL queries (already migrated away after since I partitioned my db).

目前,我在一个分片 MySQL(水平分区)上,我的大部分数据都存储在 JSON blob 中。我没有任何复杂的 SQL 查询(自从我对我的数据库进行分区后已经迁移了)。

Right now, it seems like both MongoDB and Cassandra would be likely options. My situation:

现在,MongoDB 和 Cassandra 似乎都是可能的选择。我的情况:

  • Lots of reads in every query, less regular writes
  • Not worried about "massive" scalability
  • More concerned about simple setup, maintenance and code
  • Minimize hardware/server cost
  • 每个查询中的大量读取,较少的常规写入
  • 不担心“大规模”可扩展性
  • 更关心简单的设置、维护和代码
  • 最小化硬件/服务器成本

采纳答案by Michael

Lots of reads in every query, fewer regular writes

每个查询中的大量读取,较少的常规写入

Both databases perform well on reads where the hot data set fits in memory. Both also emphasize join-less data models (and encourage denormalization instead), and both provide indexes on documentsor rows, although MongoDB's indexes are currently more flexible.

这两个数据库在热数据集适合内存的读取上表现良好。两者都强调无连接数据模型(并鼓励反规范化),并且都提供文档的索引,尽管 MongoDB 的索引目前更加灵活。

Cassandra's storage engine provides constant-time writes no matter how big your data set grows. Writes are more problematic in MongoDB, partly because of the b-tree based storage engine, but more because of the multi-granularity lockingit does.

无论您的数据集增长多大,Cassandra 的存储引擎都提供恒定时间写入。MongoDB 中的写入问题更多,部分原因是基于 b-tree 的存储引擎,但更多是因为它的多粒度锁定

For analytics, MongoDB provides a custom map/reduce implementation; Cassandra provides native Hadoop support, including for Hive(a SQL data warehouse built on Hadoop map/reduce) and Pig(a Hadoop-specific analysis language that many think is a better fit for map/reduce workloads than SQL). Cassandra also supports use of Spark.

对于分析,MongoDB 提供了自定义的 map/reduce 实现;Cassandra 提供原生 Hadoop 支持,包括Hive(一种基于 Hadoop map/reduce 的 SQL 数据仓库)和Pig(一种 Hadoop 特定的分析语言,许多人认为它比 SQL 更适合 map/reduce 工作负载)。Cassandra 还支持使用Spark

Not worried about "massive" scalability

不担心“大规模”可扩展性

If you're looking at a single server, MongoDB is probably a better fit. For those more concerned about scaling, Cassandra's no-single-point-of-failure architecture will be easier to set up and more reliable. (MongoDB's global write lock tends to become more painful, too.) Cassandra also gives a lot more control over how your replication works, including support for multiple data centers.

如果您正在查看单个服务器,MongoDB 可能更合适。对于那些更关心扩展性的人来说,Cassandra 的无单点故障架构将更容易设置且更可靠。(MongoDB 的全局写锁也往往变得更加痛苦。)Cassandra 还提供了对复制工作方式的更多控制,包括对多个数据中心的支持。

More concerned about simple setup, maintenance and code

更关心简单的设置、维护和代码

Both are trivial to set up, with reasonable out-of-the-box defaults for a single server. Cassandra is simpler to set up in a multi-server configuration since there are no special-role nodes to worry about.

两者的设置都很简单,单个服务器具有合理的开箱即用默认值。Cassandra 在多服务器配置中设置更简单,因为不需要担心特殊角色节点。

If you're presently using JSON blobs, MongoDB is an insanely good match for your use case, given that it uses BSON to store the data. You'll be able to have richer and more queryable data than you would in your present database. This would be the most significant win for Mongo.

如果您目前正在使用 JSON blob,那么 MongoDB 非常适合您的用例,因为它使用 BSON 来存储数据。与现有数据库相比,您将能够拥有更丰富、更可查询的数据。这将是 Mongo 最重要的胜利。

回答by Richard K.

I've used MongoDB extensively (for the past 6 months), building a hierarchical data management system, and I can vouch for both the ease of setup (install it, run it, use it!) and the speed. As long as you think about indexes carefully, it can absolutely scream along, speed-wise.

我广泛使用 MongoDB(过去 6 个月),构建了分层数据管理系统,我可以保证设置的简便性(安装、运行、使用!)和速度。只要您仔细考虑索引,它绝对可以快速前进。

I gather that Cassandra, due to its use with large-scale projects like Twitter, has better scaling functionality, although the MongoDB team is working on parity there. I should point out that I've not used Cassandra beyond the trial-run stage, so I can't speak for the detail.

我认为 Cassandra,由于它与 Twitter 等大型项目一起使用,具有更好的扩展功能,尽管 MongoDB 团队正在那里进行奇偶校验。我应该指出,我没有在试运行阶段之后使用 Cassandra,所以我不能说细节。

The real swinger for me, when we were assessing NoSQL databases, was the querying - Cassandra is basically just a giant key/value store, and querying is a bit fiddly (at least compared to MongoDB), so for performance you'd have to duplicate quite a lot of data as a sort of manual index. MongoDB, on the other hand, uses a "query by example" model.

当我们评估 NoSQL 数据库时,对我来说真正的摇摆人是查询 - Cassandra 基本上只是一个巨大的键/值存储,查询有点繁琐(至少与 MongoDB 相比),因此为了性能,您必须复制相当多的数据作为一种手动索引。另一方面,MongoDB 使用“示例查询”模型。

For example, say you've got a Collection (MongoDB parlance for the equivalent to a RDMS table) containing Users. MongoDB stores records as Documents, which are basically binary JSON objects. e.g:

例如,假设您有一个包含用户的集合(MongoDB 用语相当于 RDMS 表)。MongoDB 将记录存储为 Documents,它基本上是二进制 JSON 对象。例如:

{
   FirstName: "John",
   LastName: "Smith",
   Email: "[email protected]",
   Groups: ["Admin", "User", "SuperUser"]
}

If you wanted to find all of the users called Smith who have Admin rights, you'd just create a new document (at the admin console using Javascript, or in production using the language of your choice):

如果您想找到所有拥有管理员权限的名为 Smith 的用户,您只需创建一个新文档(在管理控制台中使用 Javascript,或在生产中使用您选择的语言):

{
   LastName: "Smith",
   Groups: "Admin"
}

...and then run the query. That's it. There are added operators for comparisons, RegEx filtering etc, but it's all pretty simple, and the Wiki-based documentation is pretty good.

...然后运行查询。就是这样。添加了用于比较、正则表达式过滤等的运算符,但这一切都非常简单,而且基于 Wiki 的文档非常好。

回答by Jason Grant Taylor

Why choose between a traditional database and a NoSQL data store? Use both! The problem with NoSQL solutions (beyond the initial learning curve) is the lack of transactions -- you do all updates to MySQL and have MySQL populate a NoSQL data store for reads -- you then benefit from each technology's strengths. This does add more complexity, but you already have the MySQL side -- just add MongoDB, Cassandra, etc to the mix.

为什么要在传统数据库和 NoSQL 数据存储之间进行选择?两个都用!NoSQL 解决方案的问题(超出最初的学习曲线)是缺乏事务——您对 MySQL 进行所有更新,并让 MySQL 填充 NoSQL 数据存储以进行读取——然后您可以从每种技术的优势中受益。这确实增加了更多的复杂性,但您已经有了 MySQL 端——只需将 MongoDB、Cassandra 等添加到组合中。

NoSQL datastores generally scale way better than a traditional DB for the same otherwise specs -- there is a reason why Facebook, Twitter, Google, and most start-ups are using NoSQL solutions. It's not just geeks getting high on new tech.

对于相同的其他规格,NoSQL 数据存储的扩展性通常比传统数据库好得多——这是 Facebook、Twitter、Google 和大多数初创企业使用 NoSQL 解决方案的原因。这不仅仅是极客对新技术的兴趣。

回答by Kostja

I'm probably going to be an odd man out, but I think you need to stay with MySQL. You haven't described a real problem you need to solve, and MySQL/InnoDB is an excellent storage back-end even for blob/json data.

我可能会成为一个奇怪的人,但我认为你需要继续使用 MySQL。您还没有描述您需要解决的实际问题,而且 MySQL/InnoDB 是一个出色的存储后端,即使对于 blob/json 数据也是如此。

There is a common trick among Web engineers to try to use more NoSQL as soon as realization comes that not all features of an RDBMS are used. This alone is not a good reason, since most often NoSQL databases have rather poor data engines (what MySQL calls a storage engine).

一旦意识到并非 RDBMS 的所有功能都被使用,Web 工程师就有一个共同的技巧,即尝试使用更多的 NoSQL。这本身并不是一个很好的理由,因为大多数情况下,NoSQL 数据库的数据引擎相当糟糕(MySQL 称之为存储引擎)。

Now, if you're not of that kind, then please specify what is missingin MySQL and you're looking for in a different database (like, auto-sharding, automatic failover, multi-master replication, a weaker data consistency guarantee in cluster paying off in higher write throughput, etc).

现在,如果您不是那种类型,那么请指定MySQL 中缺少的内容以及您在不同的数据库中寻找的内容(例如,自动分片、自动故障转移、多主复制、较弱的数据一致性保证)集群以更高的写入吞吐量等方式获得回报)。

回答by dalton

I haven't used Cassandra, but I have used MongoDB and think it's awesome.

我没用过 Cassandra,但我用过 MongoDB 并认为它很棒。

If you're after simple setup, this is it: You simply untar MongoDB and run the mongod daemon and that's it ... it's running.

如果您进行简单的设置,就是这样:您只需解压 MongoDB 并运行 mongod 守护程序,就这样……它正在运行。

Obviously that's only a starter, but to get you started it's easy.

显然,这只是一个开始,但要让您开始,这很容易。

回答by GrayWizardx

I saw a presentation on mongodb yesterday. I can definitely say that setup was "simple", as simple as unpacking it and firing it up. Done.

昨天看了一个mongodb的介绍。我可以肯定地说设置是“简单的”,就像打开包装并启动它一样简单。完毕。

I believe that both mongodb and cassandra will run on virtually any regular linux hardware so you should not find to much barrier in that area.

我相信 mongodb 和 cassandra 几乎都可以在任何常规 linux 硬件上运行,因此您在该领域不会发现太多障碍。

I think in this case, at the end of the day, it will come down to which do you personally feel more comfortable with and which has a toolset that you prefer. As far as the presentation on mongodb, the presenter indicated that the toolset for mongodb was pretty light and that there werent many (they said any really) tools similar to whats available for MySQL. This was of course their experience so YMMV. One thing that I did like about mongodb was that there seemed to be lots of language support for it (Python, and .NET being the two that I primarily use).

我认为在这种情况下,归根结底,这将归结为您个人觉得哪个更舒服,哪个拥有您更喜欢的工具集。就mongodb 的介绍而言,主持人表示mongodb 的工具集非常简单,并且没有很多(他们说是真的)类似于MySQL 可用的工具。这当然是他们的经历所以 YMMV。我喜欢 mongodb 的一件事是它似乎有很多语言支持(Python 和 .NET 是我主要使用的两种)。

The list of sites using mongodb is pretty impressive, and I know that twitter just switched to using cassandra.

使用 mongodb 的站点列表令人印象深刻,我知道 twitter 刚刚切换到使用 cassandra。