database 什么时候不使用 Cassandra?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2634955/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
When NOT to use Cassandra?
提问by JimJim
回答by Ajay Tiwari
There is nothing like a silver bullet, everything is built to solve specific problems and has its own pros and cons. It is up to you, what problem statement you have and what is the best fitting solution for that problem.
没有什么能比得上银弹,一切都是为了解决特定问题而构建的,各有优缺点。这取决于您,您有什么问题陈述以及该问题的最佳解决方案是什么。
I will try to answer your questions one by one in the same order you asked them. Since Cassandra is based on the NoSQL family of databases, it's important you understand why use a NoSQL database before I answer your questions.
我会尽量按照你问的顺序一一回答你的问题。由于 Cassandra 基于 NoSQL 系列数据库,因此在我回答您的问题之前,您了解为什么使用 NoSQL 数据库很重要。
Why use NoSQL
为什么使用 NoSQL
In the case of RDBMS, making a choice is quite easy because all the databases like MySQL, Oracle, MS SQL, PostgreSQL in this category offer almost the same kind of solutions oriented toward ACID properties. When it comes to NoSQL, the decision becomes difficult because every NoSQL database offers different solutions and you have to understand which one is best suited for your app/system requirements. For example, MongoDB is fit for use cases where your system demands a schema-less document store. HBase might be fit for search engines, analyzing log data, or any place where scanning huge, two-dimensional join-less tables is a requirement. Redis is built to provide In-Memory search for varieties of data structures like trees, queues, linked lists, etc and can be a good fit for making real-time leaderboards, pub-sub kind of system. Similarly there are other databases in this category (Including Cassandra) which are fit for different problem statements. Now lets move to the original questions, and answer them one by one.
在 RDBMS 的情况下,做出选择非常容易,因为该类别中的所有数据库(如 MySQL、Oracle、MS SQL、PostgreSQL)都提供了几乎相同类型的面向 ACID 属性的解决方案。当涉及到 NoSQL 时,决定变得困难,因为每个 NoSQL 数据库都提供不同的解决方案,您必须了解哪一个最适合您的应用程序/系统要求。例如,MongoDB 适用于您的系统需要无模式文档存储的用例。HBase 可能适用于搜索引擎、分析日志数据或任何需要扫描巨大的二维无连接表的地方。Redis 旨在为各种数据结构(如树、队列、链表等)提供内存中搜索,并且非常适合制作实时排行榜、发布订阅类型的系统。同样,该类别中还有其他数据库(包括 Cassandra)适用于不同的问题陈述。现在让我们转到原始问题,并一一回答。
When to use Cassandra
何时使用 Cassandra
Being a part of the NoSQL family, Cassandra offers a solution for problems where one of your requirements is to have a very heavy write system and you want to have a quite responsive reporting system on top of that stored data. Consider the use case of Web analytics where log data is stored for each request and you want to built an analytical platform around it to count hits per hour, by browser, by IP, etc in a real time manner. You can refer to thisblog post to understand more about the use cases where Cassandra fits in.
作为 NoSQL 家族的一员,Cassandra 为以下问题提供了解决方案:您的需求之一是拥有一个非常繁重的写入系统,并且您希望在存储的数据之上拥有一个响应迅速的报告系统。考虑 Web 分析的用例,其中为每个请求存储日志数据,并且您希望围绕它构建一个分析平台,以实时计算每小时、浏览器、IP 等的点击量。您可以参考此博客文章以了解有关 Cassandra 适合的用例的更多信息。
When to Use a RDMS instead of Cassandra
何时使用 RDMS 而不是 Cassandra
Cassandra is based on a NoSQL database and does not provide ACID and relational data properties. If you have a strong requirement for ACID properties (for example Financial data), Cassandra would not be a fit in that case. Obviously, you can make a workaround for that, however you will end up writing lots of application code to simulate ACID properties and will lose on time to market badly. Also managing that kind of system with Cassandra would be complex and tedious for you.
Cassandra 基于 NoSQL 数据库,不提供 ACID 和关系数据属性。如果您对 ACID 属性(例如财务数据)有很强的要求,那么 Cassandra 将不适合这种情况。显然,您可以为此制定一个解决方法,但是您最终会编写大量应用程序代码来模拟 ACID 属性,并且会严重浪费上市时间。此外,使用 Cassandra 管理这种系统对您来说既复杂又乏味。
When not to use Cassandra
何时不使用 Cassandra
I don't think it needs to be answered if the above explanation makes sense.
如果上述解释有意义,我认为不需要回答。
回答by Nathan Hurst
When evaluating distributed data systems, you have to consider the CAP theorem - you can pick two of the following: consistency, availability, and partition tolerance.
在评估分布式数据系统时,您必须考虑 CAP 定理 - 您可以选择以下两项:一致性、可用性和分区容错性。
Cassandra is an available, partition-tolerant system that supports eventual consistency. For more information see this blog post I wrote: Visual Guide to NoSQL Systems.
Cassandra 是一个可用的分区容忍系统,支持最终一致性。有关更多信息,请参阅我写的这篇博文:NoSQL 系统可视化指南。
回答by Vagif Verdi
Cassandra is the answer to a particular problem: What do you do when you have so much data that it does not fit on one server ? How do you store all your data on many servers and do not break your bank account and not make your developers insane ? Facebook gets 4 Terabyte of new compressed data EVERY DAY. And this number most likely will grow more than twice within a year.
Cassandra 是一个特定问题的答案:当你有太多的数据无法放在一台服务器上时,你会怎么做?您如何将所有数据存储在多台服务器上,同时不破坏您的银行帐户,也不会使您的开发人员发疯?Facebook 每天都会获得 4 TB 的新压缩数据。而这个数字很可能会在一年内增长两倍以上。
If you do not have this much data or if you have millions to pay for Enterprise Oracle/DB2 cluster installation and specialists required to set it up and maintain it, then you are fine with SQL database.
如果您没有这么多数据,或者您有数百万美元要支付 Enterprise Oracle/DB2 集群安装费用以及设置和维护它所需的专家,那么您可以使用 SQL 数据库。
However Facebook no longer uses cassandra and now uses MySQL almost exclusively moving the partitioning up in the application stack for faster performance and better control.
然而,Facebook 不再使用 cassandra,现在使用 MySQL 几乎只在应用程序堆栈中向上移动分区以获得更快的性能和更好的控制。
回答by Tom Clarkson
The general idea of NoSQL is that you should use whichever data store is the best fit for your application. If you have a table of financial data, use SQL. If you have objects that would require complex/slow queries to map to a relational schema, use an object or key/value store.
NoSQL 的总体思路是,您应该使用最适合您的应用程序的数据存储。如果您有财务数据表,请使用 SQL。如果您的对象需要复杂/缓慢的查询来映射到关系模式,请使用对象或键/值存储。
Of course just about any real world problem you run into is somewhere in between those two extremes and neither solution will be perfect. You need to consider the capabilities of each store and the consequences of using one over the other, which will be very much specific to the problem you are trying to solve.
当然,您遇到的任何现实世界问题都介于这两个极端之间,而且两种解决方案都不是完美的。您需要考虑每个商店的功能以及使用其中一个商店的后果,这将非常特定于您要解决的问题。
回答by Nadav Har'El
Besides the answers given above about when to use and when not to use Cassandra, if you do decide to use Cassandra you may want to consider not using Cassandra itself, but one of the its many cousins out there.
除了上面给出的关于何时使用和何时不使用 Cassandra 的答案之外,如果您决定使用 Cassandra,您可能需要考虑不使用 Cassandra 本身,而是使用它的众多表亲之一。
Some answers above already pointed to various "NoSQL" systems which share many properties with Cassandra, with some small or large differences, and may be better than Cassandra itself for your specific needs.
上面的一些答案已经指出了各种“NoSQL”系统,它们与 Cassandra 共享许多属性,但有一些或大或小的差异,并且可能比 Cassandra 本身更好地满足您的特定需求。
Additionally, recently (several years after this question was originally asked), a Cassandra clone called Scylla (see https://en.wikipedia.org/wiki/Scylla_(database)) was released. Scylla is an open-source re-implementation of Cassandra in C++, which claims to have significantly higher throughput and lower latencies than the original Java Cassandra, while being mostly compatible with it (in features, APIs, and file formats). So if you're already considering Cassandra, you may want to consider Scylla as well.
此外,最近(最初提出这个问题几年后),一个名为 Scylla(见https://en.wikipedia.org/wiki/Scylla_(database))的 Cassandra 克隆被发布。Scylla 是 Cassandra 在 C++ 中的开源重新实现,它声称比原始 Java Cassandra 具有更高的吞吐量和更低的延迟,同时与它(在功能、API 和文件格式方面)大部分兼容。因此,如果您已经在考虑 Cassandra,那么您可能还想考虑 Scylla。
回答by Warren
Talking with someone in the midst of deploying Cassandra, it doesn't handle the many-to-many well. They are doing a hack job to do their initial testing. I spoke with a Cassandra consultant about this and he said he wouldn't recommend it if you had this problem set.
与正在部署 Cassandra 的人交谈时,它不能很好地处理多对多问题。他们正在做一个黑客工作来进行他们的初步测试。我与 Cassandra 顾问讨论过这个问题,他说如果你有这个问题,他不会推荐它。
回答by Rahul Singh
You should ask your self the following questions:
你应该问自己以下问题:
- (Volume, Velocity)Will you be writing and reading TONS of information , so much information that no one computer could handle the writes.
- (Global)Will you need this writing and reading capability around the world so that the writes in one part of the world are accessible in another part of the world?
- (Reliability)Do you need this database to be up and running all the time and never go down regardless of which Cloud, which country, whether it's VM , Container, or Bare metal?
- (Scale-ability)Do you need this database to be able to continue to grow easily and scale linearly
- (Consistency)Do you need TUNABLE consistency where some writes can happen asynchronously where as others need to be certified?
- (Skill)Are you willing to do what it takes to learn this technology and the data modeling that goes with creating a globally distributed database that can be fast for everyone, everywhere?
- (音量,速度)你会写和读成吨的信息,这么多的信息,没有一台计算机可以处理写入。
- (全球)您是否需要这种在世界范围内的写入和读取能力,以便在世界的另一个地方可以访问世界上的一个地方的写入内容?
- (可靠性)您是否需要这个数据库始终启动并运行,并且无论哪个云,哪个国家,无论是虚拟机、容器还是裸机,都不会宕机?
- (Scale-ability)你是否需要这个数据库能够继续轻松增长并线性扩展
- (一致性)您是否需要 TUNABLE 一致性,其中某些写入可以异步发生,而其他写入需要认证?
- (技能)您是否愿意尽一切努力来学习这项技术以及创建一个全球分布式数据库的数据建模,该数据库可以为每个人、任何地方的人提供快速的服务?
If for any of these questions you thought "maybe" or "no," you should use something else. If you had "hell yes" as an answer to all of them, then you should use Cassandra.
如果对于这些问题中的任何一个,您认为“可能”或“否”,您应该使用其他方法。如果您对所有这些问题的回答都是“地狱是的”,那么您应该使用 Cassandra。
Use RDBMS when you can do everything on one box. It's probably easier than most and anyone can work with it.
当您可以在一个盒子上完成所有事情时,请使用 RDBMS。它可能比大多数人更容易,任何人都可以使用它。
回答by rai.skumar
I will focus here on some of the important aspects which can help you to decide if you really need Cassandra. The list is not exhaustive, just some of the points which I have at top of my mind-
我将在这里重点介绍一些重要方面,它们可以帮助您确定是否真的需要 Cassandra。该列表并非详尽无遗,只是我最想到的一些要点-
Don't consider Cassandra as the first choice when you have a strict requirement on the relationship (across your dataset).
Cassandra by default is AP system (of CAP). But, it supports tunable consistency which means it can be configured to support as CP as well. So don't ignore it just because you read somewhere that it's AP and you are looking for CP systems.Cassandra is more accurately termed “tuneably consistent,” which means it allows you to easily decide the level of consistency you require, in balance with the level of availability.
Don't use Cassandra if your scale is not much or if you can deal with a non-distributed DB.
Think harder if your team thinks that all your problems will be solved if you use distributed DBs like Cassandra. To start with these DBs is very simple as it comes with many defaults but optimizing and mastering it for solving a specific problem would require a good (if not a lot) amount of engineering effort.
Cassandra is column-oriented but at the same time each row also has a unique key. So, it might be helpful to think of it as an indexed, row-oriented store. You can even use it as a document store.
Cassandra doesn't force you to define the fields beforehand. So, if you are in a startup mode or your features are evolving (as in agile) - Cassandra embraces it. So better, first think about queries and then think about data to answer them.
Cassandra is optimized for really high throughput on writes. If your use case is read-heavy (like cache) then Cassandra might not be an ideal choice.
当您对关系(跨数据集)有严格要求时,不要将 Cassandra 视为首选。
Cassandra 默认是 AP 系统(CAP)。但是,它支持可调一致性,这意味着它也可以配置为支持 CP。所以不要因为你在某处读到它是 AP 而你正在寻找 CP 系统而忽略它。Cassandra 被更准确地称为“可调一致性”,这意味着它可以让您轻松地决定所需的一致性级别,并与可用性级别保持平衡。
如果您的规模不大或者您可以处理非分布式数据库,请不要使用 Cassandra。
如果您的团队认为如果您使用像 Cassandra 这样的分布式数据库,您的所有问题都将得到解决,请多想想。开始使用这些 DB 非常简单,因为它带有许多默认值,但是优化和掌握它以解决特定问题需要大量的工程工作(如果不是很多的话)。
Cassandra 是面向列的,但同时每一行也有一个唯一的键。因此,将其视为索引的、面向行的存储可能会有所帮助。您甚至可以将其用作文档存储。
Cassandra 不会强迫您事先定义字段。因此,如果您处于启动模式或您的功能正在发展(如敏捷) - Cassandra 会接受它。所以更好的是,首先考虑查询,然后考虑数据来回答它们。
Cassandra 针对真正的高写入吞吐量进行了优化。如果您的用例需要大量读取(如缓存),那么 Cassandra 可能不是理想的选择。
回答by sinelaw
Heavy single query vs. gazillion light queryload is another point to consider, in addition to other answers here. It's inherently harder to automatically optimize a single query in a NoSql-style DB. I've used MongoDB and ran into performance issues when trying to calculate a complex query. I haven't used Cassandra but I expect it to have the same issue.
除了这里的其他答案之外,繁重的单个查询与无数的轻查询负载是另一个需要考虑的问题。在 NoSql 风格的数据库中自动优化单个查询本身就比较困难。我使用过 MongoDB 并在尝试计算复杂查询时遇到了性能问题。我没有使用过 Cassandra,但我希望它有同样的问题。
On the other hand, if your load is expected to be that of very many small queries, and you want to be able to easily scale out, you could take advantage of eventual consistency that is offered by most NoSql DBs. Note that eventual consistency is not really a feature of a non-relational data model, but it is much easier to implement and to set up in a NoSql-based system.
另一方面,如果您的负载预计是非常多的小查询,并且您希望能够轻松扩展,您可以利用大多数 NoSql DB 提供的最终一致性。请注意,最终一致性并不是非关系数据模型的真正特征,但在基于 NoSql 的系统中实现和设置要容易得多。
For a single, very heavy query, any modern RDBMS engine can do a decent job parallelizing parts of the query and take advantage of as much CPU and memory you throw at it (on a single machine). NoSql databases don't have enough information about the structure of the data to be able to make assumptions that will allow truly intelligent parallelization of a big query. They do allow you to easily scale out more servers (or cores) but once the query hits a complexity level you are basically forced to split it apart manually to parts that the NoSql engine knows how to deal with intelligently.
对于单个非常繁重的查询,任何现代 RDBMS 引擎都可以很好地并行化查询的各个部分,并利用您投入的尽可能多的 CPU 和内存(在一台机器上)。NoSql 数据库没有足够的关于数据结构的信息,无法做出允许大查询真正智能并行化的假设。它们确实允许您轻松扩展更多服务器(或内核),但是一旦查询达到复杂级别,您基本上被迫手动将其拆分为 NoSql 引擎知道如何智能处理的部分。
In my experience with MongoDB, in the end because of the complexity of the query there wasn't much Mongo could do to optimize it and run parts of it on multiple data. Mongo parallelizes multiple queriesbut isn't so good at optimizing a single one.
根据我对 MongoDB 的经验,最终由于查询的复杂性,Mongo 无法优化它并在多个数据上运行它的一部分。Mongo 可以并行化多个查询,但不太擅长优化单个查询。
回答by CodeFarmer
Let's read some real world cases:
让我们阅读一些现实世界的案例:
http://planetcassandra.org/apache-cassandra-use-cases/
http://planetcassandra.org/apache-cassandra-use-cases/
In this article: http://planetcassandra.org/blog/post/agentis-energy-stores-over-15-billion-records-of-time-series-usage-data-in-apache-cassandra
在本文中:http: //planetcassandra.org/blog/post/agentis-energy-stores-over-15-billion-records-of-time-series-usage-data-in-apache-cassandra
They elaborated the reason why they didn't choose MySql is because db synchronization is too slow.
他们详细阐述了不选择MySql的原因是db同步太慢。
(Also due to 2-phrase commit, FK, PK)
(也由于 2-phrase commit、FK、PK)
Cassandra is based on Amazon Dynamo paper
Cassandra 基于 Amazon Dynamo 论文
Features:
特征:
Stability
稳定
High availability
高可用性
Backup performs well
备份表现良好
Read and Write is better than HBase, (BigTable clone in java).
读写比 HBase 好,(Java 中的 BigTable 克隆)。
wiki http://en.wikipedia.org/wiki/Apache_Cassandra
维基http://en.wikipedia.org/wiki/Apache_Cassandra
Their Conclusionis:
他们的结论是:
We looked at HBase, Dynamo, Mongo and Cassandra.
Cassandra was simply the best storage solution for the majority of our data.
As of 2018,
截至 2018 年,
I would recommend using ScyllaDB to replace classic cassandra, if you need back support.
如果您需要支持,我建议使用 ScyllaDB 来替换经典的 cassandra。
Postgres kv plugin is also quick than cassandra. How ever won't have multi-instance scalability.
Postgres kv 插件也比 cassandra 快。如何永远不会有多实例可扩展性。