database 每种类型数据库的实际示例(真实案例)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18198960/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Practical example for each type of database (real cases)
提问by daniel__
There are several types of database for different purposes, however normally MySQL is used to everything, because is the most well know Database. Just to give an example in my company an application of big data has a MySQL database at an initial stage, what is unbelievable and will bring serious consequences to the company. Why MySQL? Just because no one know how (and when) should use another DBMS.
有多种类型的数据库用于不同的目的,但是通常 MySQL 用于所有内容,因为它是最广为人知的数据库。举个例子,我公司的一个大数据应用,初期有一个MySQL数据库,这简直难以置信,会给公司带来严重的后果。为什么是 MySQL?只是因为没有人知道应该如何(以及何时)使用另一个 DBMS。
So, my question is not about vendors, but type of databases. Can you give me an practical example of specific situations (or apps) for each type of database where is highly recommended to use it?
所以,我的问题不是关于供应商,而是关于数据库的类型。你能给我举一个实际例子,说明强烈推荐使用它的每种类型的数据库的特定情况(或应用程序)吗?
Example:
例子:
? A social network should use the type X because of Y.
? 由于 Y,社交网络应该使用类型 X。
? MongoDB or couch DB can't support transactions, so Document DB is not good to an app for a bank or auctions site.
? MongoDB 或 couch DB 无法支持事务,因此 Document DB 不适用于银行或拍卖网站的应用程序。
And so on...
等等...
Relational:MySQL, PostgreSQL, SQLite, Firebird, MariaDB, Oracle DB, SQL server, IBM DB2, IBM Informix, Teradata
关系:MySQL、PostgreSQL、SQLite、Firebird、MariaDB、Oracle DB、SQL 服务器、IBM DB2、IBM Informix、Teradata
Object:ZODB, DB4O, Eloquera, Versant, Objectivity DB, VelocityDB
对象:ZODB、DB4O、Eloquera、Versant、 Objectivity DB、VelocityDB
Graph databases:AllegroGraph, Neo4j, OrientDB, InfiniteGraph, graphbase, sparkledb, flockdb, BrightstarDB
图数据库:AllegroGraph、Neo4j、OrientDB、InfiniteGraph、graphbase、sparkledb、flockdb、BrightstarDB
Key value-stores:Amazon DynamoDB, Redis, Riak, Voldemort, FoundationDB, leveldb, BangDB, KAI, hamsterdb, Tarantool, Maxtable, HyperDex, Genomu, Memcachedb
关键值存储:Amazon DynamoDB、Redis、Riak、Voldemort、FoundationDB、leveldb、BangDB、KAI、hamsterdb、Tarantool、Maxtable、HyperDex、Genomu、Memcachedb
Column family:Big table, Hbase, hyper table, Cassandra, Apache Accumulo
列族:大表、Hbase、超表、Cassandra、Apache Accumulo
RDF Stores:Apache Jena, Sesame
RDF 存储:Apache Jena、Sesame
Multimodel Databases:arangodb, Datomic, Orient DB, FatDB, AlchemyDB
多模型数据库:arangodb、Datomic、Orient DB、FatDB、AlchemyDB
Document:Mongo DB, Couch DB, Rethink DB, Raven DB, terrastore, Jas DB, Raptor DB, djon DB, EJDB, denso DB, Couchbase
文档:Mongo DB、Couch DB、Rethink DB、Raven DB、terrastore、Jas DB、Raptor DB、djon DB、EJDB、denso DB、Couchbase
XML Databases:BaseX, Sedna, eXist
Hierarchical:InterSystems Caché, GT.Mthanks to @Laurent Parenteau
层次结构:InterSystems Caché,GT.M感谢@Laurent Parenteau
回答by daniel__
I found two impressive articles about this subject. All credits to highscalability.com. The information in this answer is transcribed from these articles:
我发现了两篇关于这个主题的令人印象深刻的文章。归功于highscalability.com。此答案中的信息转录自以下文章:
35+ Use Cases For Choosing Your Next NoSQL Database
What The Heck Are You Actually Using NoSQL For?
If Your Application Needs...
如果您的应用程序需要...
? complex transactionsbecause you can't afford to lose data or if you would like a simple transaction programming model then look at a Relational or Grid database.
? 复杂的事务,因为您无法承受丢失数据,或者如果您想要一个简单的事务编程模型,那么请查看关系或网格数据库。
? Example:an inventory system that might want full ACID. I was very unhappy when I bought a product and they said later they were out of stock. I did not want a compensated transaction. I wanted my item!
? 示例:可能需要完整ACID的库存系统。当我购买产品时,我很不高兴,后来他们说他们缺货了。我不想要有偿交易。我想要我的物品!
? to scalethen NoSQL or SQL can work. Look for systems that support scale-out, partitioning, live addition and removal of machines, load balancing, automatic sharding and rebalancing, and fault tolerance.
? 扩展然后 NoSQL 或 SQL 可以工作。寻找支持横向扩展、分区、实时添加和删除机器、负载平衡、自动分片和重新平衡以及容错的系统。
? to alwaysbe able to writeto a database because you need high availability then look at BigtableClones which feature eventual consistency.
? 为了始终能够写入数据库,因为您需要高可用性,然后查看具有最终一致性的BigtableClones。
? to handle lots of small continuous reads and writes, that may be volatile, then look at Document or Key-value or databases offering fast in-memory access. Also, consider SSD.
? 处理大量小的连续读取和写入,这可能是不稳定的,然后查看文档或键值或提供快速内存访问的数据库。另外,请考虑SSD。
? to implement social network operationsthen you first may want a Graph database or second, a database like Riakthat supports relationships. An in-memory relational database with simple SQL joins might suffice for small data sets. Redis' set and list operations could work too.
? 要实现社交网络操作,您首先可能需要一个 Graph 数据库,或者第二个,一个像Riak这样支持关系的数据库。具有简单 SQL 连接的内存关系数据库可能足以满足小型数据集的需求。Redis的设置和列表操作也可以工作。
? to operate over a wide variety of access patterns and data typesthen look at a Document database, they generally are flexible and perform well.
? 要对各种访问模式和数据类型进行操作,然后查看文档数据库,它们通常很灵活且性能良好。
? powerful offline reporting with large datasetsthen look at Hadoopfirst and second, products that support MapReduce. Supporting MapReduce isn't the same as being good at it.
? 具有大型数据集的强大离线报告,然后首先查看Hadoop,其次是支持MapReduce 的产品。支持 MapReduce 并不等于擅长它。
? to span multiple data-centersthen look at BigtableClones and other products that offer a distributed option that can handle the long latencies and are partition tolerant.
? 要跨越多个数据中心,然后查看BigtableClones 和其他提供分布式选项的产品,这些选项可以处理长延迟并具有分区容错性。
? to build CRUDapps then look at a Document database, they make it easy to access complex data without joins.
? 构建CRUD应用程序然后查看文档数据库,它们可以轻松访问复杂数据而无需连接。
? built-in searchthen look at Riak.
? 内置搜索然后查看Riak。
? to operate on data structureslike lists, sets, queues, publish-subscribe then look at Redis. Useful for distributed locking, capped logs, and a lot more.
? 对列表、集合、队列、发布订阅等数据结构进行操作,然后查看Redis。适用于分布式锁定、上限日志等等。
? programmer friendlinessin the form of programmer-friendly data types like JSON, HTTP, REST, Javascript then first look at Document databases and then Key-value Databases.
? 程序员友好的程序员友好的数据类型,如JSON,HTTP,REST,JavaScript,然后再先看看文档数据库,然后键-值数据库的形式。
? transactionscombined with materialized viewsfor real-timedata feeds then look at VoltDB. Great for data-rollups and time windowing.
? 交易联合物化视图的实时数据馈送再看看VoltDB。非常适合数据汇总和时间窗口。
? enterprise-level support and SLAsthen look for a product that makes a point of catering to that market. Membaseis an example.
? 然后,企业级支持和 SLA会寻找能够满足该市场需求的产品。Membase就是一个例子。
? to log continuous streamsof data that may have no consistency guarantees necessary at all then look at BigtableClones because they generally work on distributed file systems that can handle a lot of writes.
? 记录可能根本不需要一致性保证的连续数据流,然后查看BigtableClones,因为它们通常在可以处理大量写入的分布式文件系统上工作。
? to be as simple as possibleto operate then look for a hosted or PaaSsolution because they will do all the work for you.
? 为了尽可能简单地操作,然后寻找托管或PaaS解决方案,因为它们将为您完成所有工作。
? to be sold to enterprise customersthen consider a Relational Database because they are used to relational technology.
? 要出售给企业客户,然后考虑使用关系数据库,因为他们习惯于关系技术。
? to dynamically build relationshipsbetween objects that have dynamic propertiesthen consider a Graph Database because often they will not require a schema and models can be built incrementally through programming.
? 要在具有动态属性的对象之间动态构建关系,请考虑使用图形数据库,因为它们通常不需要模式,并且模型可以通过编程逐步构建。
? to support large mediathen look storage services like S3. NoSQLsystems tend not to handle large BLOBS, though MongoDBhas a file service.
? 支持大媒体然后看像S3这样的存储服务。尽管MongoDB有文件服务,但NoSQL系统往往不处理大型BLOBS。
? to bulk uploadlots of data quickly and efficiently then look for a product that supports that scenario. Most will not because they don't support bulk operations.
? 快速有效地批量上传大量数据,然后寻找支持该场景的产品。大多数不会,因为它们不支持批量操作。
? an easier upgrade paththen use a fluid schema system like a Document Database or a Key-value Database because it supports optional fields, adding fields, and field deletions without the need to build an entire schema migration framework.
? 一个更简单的升级路径然后使用流动模式系统,如文档数据库或键值数据库,因为它支持可选字段、添加字段和字段删除,而无需构建整个模式迁移框架。
? to implement integrity constraintsthen pick a database that supports SQL DDL, implement them in stored procedures, or implement them in application code.
? 要实现完整性约束,然后选择支持 SQL DDL的数据库,在存储过程中实现它们,或在应用程序代码中实现它们。
? a very deep join depththen use a Graph Database because they support blisteringly fast navigation between entities.
? 一个非常深的深度一起,然后用一个图形数据库,因为他们支持实体之间极快的导航。
? to move behavior close to the dataso the data doesn't have to be moved over the network then look at stored procedures of one kind or another. These can be found in Relational, Grid, Document, and even Key-value databases.
? 将行为移动到数据附近,这样数据就不必通过网络移动,然后查看一种或另一种存储过程。这些可以在关系、网格、文档甚至键值数据库中找到。
? to cache or store BLOBdata then look at a Key-value store. Caching can for bits of web pages, or to save complex objects that were expensive to join in a relational database, to reduce latency, and so on.
? 到高速缓存或存储BLOB数据,然后看一个key-value存储。缓存可以用于网页的位,或者保存在关系数据库中加入昂贵的复杂对象,以减少延迟,等等。
? a proven track recordlike not corrupting data and just generally working then pick an established product and when you hit scaling (or other issues) use one of the common workarounds (scale-up, tuning, memcached, sharding, denormalization, etc).
? 一个经过验证的记录,例如不破坏数据和一般工作然后选择一个成熟的产品,当您遇到扩展(或其他问题)时,使用一种常见的解决方法(扩展、调整、memcached、分片、非规范化等)。
? fluid data typesbecause your data isn't tabular in nature, or requires a flexible number of columns, or has a complex structure, or varies by user (or whatever), then look at Document, Key-value, and BigtableClone databases. Each has a lot of flexibility in their data types.
? 流体数据类型,因为您的数据本质上不是表格,或者需要灵活的列数,或者具有复杂的结构,或者因用户而异(或其他),然后查看文档、键值和BigtableClone 数据库。每个都在其数据类型上有很大的灵活性。
? other business units to run quick relational queriesso you don't have to reimplement everything then use a database that supports SQL.
? 其他业务部门运行快速关系查询,因此您不必重新实现所有内容,然后使用支持 SQL 的数据库。
? to operate in the cloud and automatically take full advantage of cloud features then we may not be there yet.
? 要在云中运行并自动充分利用云功能,那么我们可能还没有做到。
? support for secondary indexesso you can look up data by different keys then look at relational databases and Cassandra's new secondary indexsupport.
? 支持二级索引,因此您可以通过不同的键查找数据,然后查看关系数据库和Cassandra的新二级索引支持。
? create an ever-growing set of data(really BigData) that rarely gets accessed then look at BigtableClone which will spread the data over a distributed file system.
? 创建一个很少被访问的不断增长的数据集(实际上是BigData)然后查看BigtableClone,它将数据分布在分布式文件系统上。
? to integrate with other servicesthen check if the database provides some sort of write-behind syncing feature so you can capture database changes and feed them into other systems to ensure consistency.
? 要与其他服务进行整合,然后检查数据库提供某种滞后写同步功能,让你可以捕捉数据库的变化,并将其送入其他系统,以确保一致性。
? fault tolerancecheck how durable writes are in the face power failures, partitions, and other failure scenarios.
? 容错检查在面对电源故障、分区和其他故障场景时写入的持久性如何。
? to push the technological envelope in a direction nobody seems to be going then build it yourself because that's what it takes to be great sometimes.
? 将技术推向一个似乎没有人会去的方向,然后自己构建它,因为有时这就是伟大所需要的。
? to work on a mobile platformthen look at CouchDB/Mobile couchbase.
? 要在移动平台上工作,请查看 CouchDB/ Mobile couchbase。
General Use Cases (NoSQL)
一般用例 (NoSQL)
? Bigness. NoSQL is seen as a key part of a new data stack supporting: big data, big numbers of users, big numbers of computers, big supply chains, big science, and so on. When something becomes so massive that it must become massively distributed, NoSQL is there, though not all NoSQL systems are targeting big. Bigness can be across many different dimensions, not just using a lot of disk space.
? 大。NoSQL 被视为支持新数据堆栈的关键部分:大数据、大量用户、大量计算机、大供应链、大科学等等。当某些东西变得如此庞大以至于必须大规模分布时,NoSQL 就在那里,尽管并非所有 NoSQL 系统都以大为目标。大可以跨越许多不同的维度,而不仅仅是使用大量的磁盘空间。
? Massive write performance.This is probably the canonical usage based on Google's influence. High volume. Facebook needs to store 135 billion messages a month(in 2010). Twitter, for example, has the problem of storing 7 TB/data per day(in 2010)with the prospect of this requirement doubling multiple times per year. This is the data is too big to fit on one node problem. At 80 MB/s it takes a day to store 7TB so writes need to be distributed over a cluster, which implies key-value access, MapReduce, replication, fault tolerance, consistency issues, and all the rest. For faster writes in-memory systems can be used.
? 巨大的写入性能。这可能是基于 Google 影响的规范用法。高音量。Facebook每月需要存储1350 亿条消息(2010 年)。例如,Twitter 存在每天存储7 TB/数据(2010 年)的问题,而且这一需求每年会翻倍。这是数据太大而无法解决一个节点问题。以 80 MB/s 的速度存储 7TB 需要一天时间,因此写入需要分布在集群上,这意味着键值访问、MapReduce、复制、容错、一致性问题等等。对于更快的写入,可以使用内存系统。
? Fast key-value access.This is probably the second most cited virtue of NoSQL in the general mind set. When latency is important it's hard to beat hashing on a key and reading the value directly from memory or in as little as one disk seek. Not every NoSQL product is about fast access, some are more about reliability, for example. but what people have wanted for a long time was a better memcached and many NoSQL systems offer that.
? 快速键值访问。这可能是 NoSQL 在一般心态中被引用次数第二多的优点。当延迟很重要时,很难击败对键进行散列并直接从内存中读取值或仅在一次磁盘搜索中读取值。并非每个 NoSQL 产品都与快速访问有关,例如,有些产品更多地与可靠性有关。但是人们长期以来一直想要的是更好的 memcached,许多 NoSQL 系统都提供了它。
? Flexible schema and flexible datatypes.NoSQL products support a whole range of new data types, and this is a major area of innovation in NoSQL. We have: column-oriented, graph, advanced data structures, document-oriented, and key-value. Complex objects can be easily stored without a lot of mapping. Developers love avoiding complex schemas and ORMframeworks. Lack of structure allows for much more flexibility. We also have program- and programmer-friendly compatible datatypes like JSON.
? 灵活的模式和灵活的数据类型。NoSQL 产品支持一系列新的数据类型,这是 NoSQL 的一个主要创新领域。我们有:面向列、图形、高级数据结构、面向文档和键值。无需大量映射即可轻松存储复杂对象。开发人员喜欢避免使用复杂的模式和ORM框架。缺乏结构允许更大的灵活性。我们还有程序和程序员友好的兼容数据类型,如 JSON。
? Schema migration.Schemalessness makes it easier to deal with schema migrations without so much worrying. Schemas are in a sense dynamic because they are imposed by the application at run-time, so different parts of an application can have a different view of the schema.
? 架构迁移。无模式使处理模式迁移变得更容易,而不必担心。模式在某种意义上是动态的,因为它们是由应用程序在运行时强加的,因此应用程序的不同部分可以有不同的模式视图。
? Write availability.Do your writes need to succeed no matter what? Then we can get into partitioning, CAP, eventual consistencyand all that jazz.
? 写入可用性。无论如何,您的写作是否需要成功?然后我们可以进入分区、CAP、最终一致性和所有爵士乐。
? Easier maintainability, administration and operations.This is very product specific, but many NoSQL vendors are trying to gain adoption by making it easy for developers to adopt them. They are spending a lot of effort on ease of use, minimal administration, and automated operations. This can lead to lower operations costs as special code doesn't have to be written to scale a system that was never intended to be used that way.
? 更易于维护、管理和操作。这是非常特定于产品的,但许多 NoSQL 供应商正试图通过让开发人员轻松采用它们来获得采用。他们在易用性、最小化管理和自动化操作上花费了大量精力。这可以降低运营成本,因为不必编写特殊代码来扩展从未打算以这种方式使用的系统。
? No single point of failure.Not every product is delivering on this, but we are seeing a definite convergence on relatively easy to configure and manage high availability with automatic load balancing and cluster sizing. A perfect cloud partner.
? 没有单点故障。并非每个产品都实现了这一点,但我们看到了通过自动负载平衡和集群大小调整相对容易配置和管理高可用性的明确融合。完美的云合作伙伴。
? Generally available parallel computing.We are seeing MapReduce baked into products, which makes parallel computing something that will be a normal part of development in the future.
? 一般可用的并行计算。我们看到 MapReduce 融入产品,这使得并行计算成为未来开发的一个正常部分。
? Programmer ease of use.Accessing your data should be easy. While the relational model is intuitive for end users, like accountants, it's not very intuitive for developers. Programmers grok keys, values, JSON, Javascript stored procedures, HTTP, and so on. NoSQL is for programmers. This is a developer-led coup. The response to a database problem can't always be to hire a really knowledgeable DBA, get your schema right, denormalize a little, etc., programmers would prefer a system that they can make work for themselves. It shouldn't be so hard to make a product perform. Money is part of the issue. If it costs a lot to scale a product then won't you go with the cheaper product, that you control, that's easier to use, and that's easier to scale?
? 程序员的易用性。访问您的数据应该很容易。虽然关系模型对于最终用户(如会计师)来说是直观的,但对于开发人员来说却不是很直观。程序员熟悉键、值、JSON、Javascript 存储过程、HTTP 等。NoSQL 是为程序员准备的。这是开发商主导的政变。对数据库问题的回应并不总是聘请真正知识渊博的DBA,正确地调整架构,稍微反规范化等等,程序员更喜欢他们可以为自己工作的系统。让产品发挥作用应该不难。钱是问题的一部分。如果扩展产品的成本很高,那么您是否会选择更便宜、由您控制、更易于使用且更易于扩展的产品?
? Use the right data model for the right problem.Different data models are used to solve different problems. Much effort has been put into, for example, wedging graph operations into a relational model, but it doesn't work. Isn't it better to solve a graph problem in a graph database? We are now seeing a general strategy of trying to find the best fit between a problem and solution.
? 针对正确的问题使用正确的数据模型。不同的数据模型用于解决不同的问题。例如,已经投入了很多努力,将图操作楔入关系模型中,但它不起作用。在图数据库中解决图问题不是更好吗?我们现在看到了一种试图在问题和解决方案之间找到最佳匹配的一般策略。
? Avoid hitting the wall.Many projects hit some type of wall in their project. They've exhausted all options to make their system scale or perform properly and are wondering what next? It's comforting to select a product and an approach that can jump over the wall by linearly scaling using incrementally added resources. At one time this wasn't possible. It took custom built everything, but that's changed. We are now seeing usable out-of-the-box products that a project can readily adopt.
? 避免撞墙。许多项目在他们的项目中遇到了某种类型的障碍。他们已经用尽了使系统扩展或正常运行的所有选项,但想知道下一步是什么?选择一种产品和一种方法可以通过使用增量添加的资源线性扩展来跳过墙壁,这是令人欣慰的。一度这是不可能的。它需要定制所有东西,但这已经改变了。我们现在看到了一个项目可以很容易地采用的可用的开箱即用产品。
? Distributed systems support.Not everyone is worried about scale or performance over and above that which can be achieved by non-NoSQL systems. What they need is a distributed system that can span datacenters while handling failure scenarios without a hiccup. NoSQL systems, because they have focussed on scale, tend to exploit partitions, tend not use heavy strict consistency protocols, and so are well positioned to operate in distributed scenarios.
? 分布式系统支持。并非所有人都担心超出非 NoSQL 系统所能达到的规模或性能。他们需要的是一个分布式系统,它可以跨越数据中心,同时处理故障场景而不会出现问题。NoSQL 系统,因为它们专注于规模,倾向于利用分区,倾向于不使用严格的一致性协议,因此非常适合在分布式场景中运行。
? Tunable CAP tradeoffs.NoSQL systems are generally the only products with a "slider" for choosing where they want to land on the CAP spectrum. Relational databases pick strong consistency which means they can't tolerate a partition failure. In the end, this is a business decision and should be decided on a case by case basis. Does your app even care about consistency? Are a few drops OK? Does your app need strong or weak consistency? Is availability more important or is consistency? Will being down be more costly than being wrong? It's nice to have products that give you a choice.
? 可调 CAP 权衡。NoSQL 系统通常是唯一带有“滑块”的产品,用于选择它们希望在 CAP 范围内的位置。关系数据库选择强一致性,这意味着它们不能容忍分区故障。归根结底,这是一个商业决策,应该根据具体情况来决定。你的应用程序甚至关心一致性吗?几滴可以吗?您的应用需要强一致性还是弱一致性?可用性更重要还是一致性更重要?失败会比犯错代价更高吗?很高兴拥有让您有选择的产品。
? More Specific Use Cases
? 更具体的用例
? Managing large streams of non-transactional data: Apache logs, application logs, MySQLlogs, clickstreams, etc.
? 管理大量非事务性数据流:Apache 日志、应用程序日志、MySQL日志、点击流等。
? Syncing online and offline data. This is a niche CouchDBhas targeted.
? 在线和离线数据同步。这是CouchDB所针对的一个利基市场。
? Fast response times under all loads.
? 所有负载下的快速响应时间。
? Avoiding heavy joins for when the query load for complex joins become too large for an RDBMS.
? 当复杂连接的查询负载对于RDBMS来说太大时避免重连接。
? Soft real-time systems where low latency is critical. Games are one example.
? 低延迟至关重要的软实时系统。游戏就是一个例子。
? Applications where a wide variety of different write, read, query, and consistency patterns need to be supported. There are systems optimized for 50% reads 50% writes, 95% writes, or 95% reads. Read-only applications needing extreme speed and resiliency, simple queries, and can tolerate slightly stale data. Applications requiring moderate performance, read/write access, simple queries, completely authoritative data. A read-only application which complex query requirements.
? 需要支持各种不同的写入、读取、查询和一致性模式的应用程序。有些系统针对 50% 读取、50% 写入、95% 写入或 95% 读取进行了优化。只读应用程序需要极快的速度和弹性、简单的查询,并且可以容忍稍微过时的数据。需要中等性能、读/写访问、简单查询、完全权威数据的应用程序。一个具有复杂查询需求的只读应用程序。
? Load balance to accommodate data and usage concentrations and to help keep microprocessors busy.
? 负载平衡以适应数据和使用集中,并帮助保持微处理器忙碌。
? Real-time inserts, updates, and queries.
? 实时插入、更新和查询。
? Hierarchical data like threaded discussions and parts explosion.
? 分层数据,如线程讨论和零件爆炸。
? Dynamic table creation.
? 动态表创建。
? Two-tier applications where low latency data is made available through a fast NoSQL interface, but the data itself can be calculated and updated by high latency Hadoop apps or other low priority apps.
? 通过快速 NoSQL 接口提供低延迟数据的两层应用程序,但数据本身可由高延迟 Hadoop 应用程序或其他低优先级应用程序计算和更新。
? Sequential data reading.The right underlying data storage model needs to be selected. A B-tree may not be the best model for sequential reads.
? 顺序数据读取。需要选择正确的底层数据存储模型。B 树可能不是顺序读取的最佳模型。
? Slicing off part of service that may need better performance/scalability onto its own system. For example, user logins may need to be high performance and this feature could use a dedicated service to meet those goals.
? 将可能需要更好性能/可扩展性的部分服务切分到自己的系统上。例如,用户登录可能需要高性能,而此功能可以使用专用服务来实现这些目标。
? Caching.A high performance caching tier for websites and other applications. Example is a cache for the Data Aggregation System used by the Large Hadron Collider. Voting.
? 缓存。用于网站和其他应用程序的高性能缓存层。示例是大型强子对撞机使用的数据聚合系统的缓存。表决。
? Real-time page view counters.
? 实时页面查看计数器。
? User registration, profile, and session data.
? 用户注册、个人资料和会话数据。
? Document, catalog management and content management systems.These are facilitated by the ability to store complex documents has a whole rather than organized as relational tables. Similar logic applies to inventory, shopping carts, and other structured data types.
? 文档、目录管理和内容管理系统。这些都得益于存储复杂文档的能力,而不是将其组织为关系表。类似的逻辑适用于库存、购物车和其他结构化数据类型。
? Archiving.Storing a large continual stream of data that is still accessible on-line. Document-oriented databases with a flexible schema that can handle schema changes over time.
? 存档。存储仍可在线访问的大量连续数据流。面向文档的数据库具有灵活的架构,可以随时间处理架构更改。
? Analytics.Use MapReduce, Hive, or Pig to perform analytical queries and scale-out systems that support high write loads.
? 分析。使用 MapReduce、Hive 或 Pig 执行支持高写入负载的分析查询和横向扩展系统。
? Working with heterogeneous types of data, for example, different media types at a generic level.
? 处理异构类型的数据,例如,通用级别的不同媒体类型。
? Embedded systems. They don't want the overhead of SQL and servers, so they use something simpler for storage.
? 嵌入式系统。他们不想要 SQL 和服务器的开销,因此他们使用更简单的存储方式。
? A "market" game, where you own buildings in a town. You want the building list of someone to pop up quickly, so you partition on the owner column of the building table, so that the select is single-partitioned. But when someone buys the building of someone else you update the owner column along with price.
? 一个“市场”游戏,您在城镇中拥有建筑物。你想让某人的建筑物列表快速弹出,所以你在建筑物表的所有者列上进行分区,这样选择是单分区的。但是,当有人购买其他人的建筑物时,您会随价格一起更新所有者列。
? JPLis using SimpleDBto store roverplan attributes. References are kept to a full plan blob in S3. (source)
? JPL使用SimpleDB来存储流动站计划属性。在S3中保留对完整计划 blob 的引用。(来源)
? Federal law enforcement agencies tracking Americans in real-timeusing credit cards, loyalty cards and travel reservations.
? 联邦执法机构使用信用卡、会员卡和旅行预订实时跟踪美国人。
? Fraud detectionby comparing transactions to known patterns in real-time.
? 通过实时将交易与已知模式进行比较来检测欺诈。
? Helping diagnosethe typology of tumors by integrating the history of every patient.
? 通过整合每位患者的病史,帮助诊断肿瘤的类型。
? In-memory database for high update situations, like a websitethat displays everyone's "last active" time (for chat maybe). If users are performing some activity once every 30 sec, then you will be pretty much be at your limit with about 5000 simultaneous users.
? 在内存数据库的高更新的情况下,像一个网站,其中显示了大家的“最后活动”时间(聊天也许)。如果用户每 30 秒执行一次某些活动,那么您将几乎达到 5000 名并发用户的极限。
? Handling lower-frequency multi-partition queries using materialized views while continuing to process high-frequency streaming data.
? 使用物化视图处理低频多分区查询,同时继续处理高频流数据。
? Priority queues.
? 优先队列。
? Running calculations on cached data, using a program friendly interface, without having to go through an ORM.
? 使用程序友好的界面对缓存数据运行计算,而无需通过ORM。
? Uniq a large datasetusing simple key-value columns.
? To keep querying fast, values can be rolled-up into different time slices.
? 为了保持快速查询,可以将值汇总到不同的时间片中。
? Computing the intersection of two massive sets, where a join would be too slow.
? 计算两个大型集合的交集,其中连接太慢。
? A timeline ala Twitter.
? 一个时间表ALA的Twitter。
Redis use cases, VoltDB use cases and more find here.
Redis 使用案例、VoltDB 使用案例以及更多请在此处找到。
回答by scalabl3
This question is almost impossible to answer because of the generality. I think you are looking for some sort of easy answer problem = solution. The problem is that each "problem" becomes more and more unique as it becomes a business.
由于一般性,这个问题几乎不可能回答。我认为您正在寻找某种简单的答案问题 = 解决方案。问题在于,每个“问题”随着成为一项业务而变得越来越独特。
What do you call a social network? Twitter? Facebook? LinkedIn? Stack Overflow? They all use different solutions for different parts, and many solutions can exist that use polyglot approach. Twitter has a graph like concept, but there are only 1 degree connections, followers and following. LinkedIn on the other hand thrives on showing how people are connected beyond first degree. These are two different processing and data needs, but both are "social networks".
什么叫社交网络?推特?Facebook?领英?堆栈溢出?它们都针对不同的部分使用不同的解决方案,并且可以存在许多使用多语言方法的解决方案。Twitter 有一个类似图表的概念,但只有 1 度的联系、关注者和关注者。另一方面,LinkedIn 则在展示人们如何在一级以上建立联系方面蓬勃发展。这是两种不同的处理和数据需求,但都是“社交网络”。
If you have a "social network" but don't do any discovery mechanisms, then you can easily use any basic key-value store most likely. If you need high performance, horizontal scale, and will have secondary indexes or full-text search, you could use Couchbase.
如果你有一个“社交网络”但没有做任何发现机制,那么你很可能很容易使用任何基本的键值存储。如果您需要高性能、横向扩展,并且需要二级索引或全文搜索,您可以使用Couchbase。
If you are doing machine learning on top of the log data you are gathering, you can integrate Hadoop with Hive or Pig, or Spark/Shark. Or you can do a lambda architecture and use many different systems with Storm.
如果您正在收集的日志数据之上进行机器学习,您可以将 Hadoop 与 Hive 或 Pig 或 Spark/Shark 集成。或者,您可以使用 lambda 架构并在 Storm 中使用许多不同的系统。
If you are doing discovery via graph like queries that go beyond 2nd degree vertexes and also filter on edge properties you likely will consider graph databases on top of your primary store. However graph databases aren't good choices for session store, or as general purpose stores, so you will need a polyglot solution to be efficient.
如果您通过图形进行发现,例如超出 2 度顶点的查询,并且还过滤边缘属性,您可能会考虑在主存储之上使用图形数据库。但是,图形数据库不是会话存储或通用存储的好选择,因此您需要多语言解决方案才能提高效率。
What is the data velocity? scale? how do you want to manage it. What are the expertise you have available in the company or startup. There are a number of reasons this is not a simple question and answer.
什么是数据速度?规模?你想如何管理它。您在公司或初创公司中拥有哪些专业知识?这不是一个简单的问答,原因有很多。
回答by naXa
A short useful read specific to database selection: How to choose a NoSQL Database?. I will highlight keypoints in this answer.
特定于数据库选择的简短有用阅读:如何选择 NoSQL 数据库?. 我将在这个答案中强调关键点。
Key-Value vs Document-oriented
键值 vs 面向文档
Key-value stores
键值存储
If you have clear data structure defined such that all the data would have exactly one key, go for a key-value store. It's like you have a big Hashtable, and people mostly use it for Cache stores or clearly key based data. However, things start going a little nasty when you need query the same data on basis of multiple keys!
如果您定义了清晰的数据结构,以便所有数据都只有一个键,请使用键值存储。这就像你有一个很大的 Hashtable,人们大多将它用于缓存存储或明确基于键的数据。但是,当您需要基于多个键查询相同的数据时,事情开始变得有点麻烦!
Some key value stores are: memcached, Redis, Aerospike.
一些键值存储是:memcached、Redis、Aerospike。
Two important things about designing your data model around key-value store are:
关于围绕键值存储设计数据模型的两个重要事项是:
- You need to know all use cases in advance and you could not change the query-able fields in your data without a redesign.
- Remember, if you are going to maintain multiple keys around same data in a key-value store, updates to multiple tables/buckets/collection/whatever are NOT atomic. You need to deal with this yourself.
- 您需要提前了解所有用例,并且在不重新设计的情况下无法更改数据中的可查询字段。
- 请记住,如果您要在键值存储中围绕相同数据维护多个键,则对多个表/存储桶/集合/任何内容的更新都不是原子的。你需要自己处理这个问题。
Document-oriented
面向文档
If you are just moving away from RDBMS and want to keep your data in as object way and as close to table-like structure as possible, document-structure is the way to go! Particularly useful when you are creating an app and don't want to deal with RDBMS table design early-on (in prototyping stage) and your schema could change drastically over time. However note:
如果您只是远离 RDBMS 并希望将数据保持为对象方式并尽可能接近表结构,那么文档结构是您的最佳选择!当您正在创建应用程序并且不想在早期(在原型设计阶段)处理 RDBMS 表设计并且您的架构可能会随着时间的推移发生巨大变化时,这尤其有用。不过请注意:
- Secondary indexes may not perform as well.
- Transactions are not available.
- 二级索引可能表现不佳。
- 交易不可用。
Popular document-oriented databases are: MongoDB, Couchbase.
流行的面向文档的数据库有:MongoDB、Couchbase。
Comparing Key-value NoSQL databases
比较键值 NoSQL 数据库
memcached
内存缓存
- In-memory cache
- No persistence
- TTL supported
- client-side clustering only (client stores value at multiple nodes). Horizontally scalable through client.
- Not good for large-size values/documents
- 内存缓存
- 没有坚持
- 支持TTL
- 仅客户端集群(客户端在多个节点存储值)。通过客户端横向扩展。
- 不适合大尺寸的值/文档
Redis
Redis
- In-memory cache
- Disk supported – backup and rebuild from disk
- TTLsupported
- Super-fast (see benchmarks)
- Data structure support in addition to key-value
- Clustering support not mature enough yet. Vertically scalable (see Redis Cluster specification)
- Horizontal scaling could be tricky.
- Supports Secondary indexes
- 内存缓存
- 支持磁盘 - 从磁盘备份和重建
- 支持TTL
- 超快(参见基准测试)
- 除了键值之外的数据结构支持
- 集群支持还不够成熟。垂直可扩展(请参阅Redis 集群规范)
- 水平缩放可能很棘手。
- 支持二级索引
Aerospike
空刺
- Both in-memory & on-disk
- Extremely fast (could support >1 Million TPS on a single node)
- Horizontally scalable. Server side clustering. Sharded & replicated data
- Automatic failovers
- Supports Secondary indexes
- CAS (safe read-modify-write) operations, TTL support
- Enterprise class
- 在内存和磁盘上
- 极快(可以在单个节点上支持 >100 万 TPS)
- 水平可扩展。服务器端集群。分片和复制数据
- 自动故障转移
- 支持二级索引
- CAS(安全读-修改-写)操作,TTL 支持
- 企业级
Comparing document-oriented NoSQL databases
比较面向文档的 NoSQL 数据库
MongoDB
MongoDB
- Fast
- Mature & stable – feature rich
- Supports failovers
- Horizontally scalable reads – read from replica/secondary
- Writes not scalable horizontally unless you use mongo shards
- Supports advanced querying
- Supports multiple secondary indexes
- Shards architecture becomes tricky, not scalable beyond a point where you need secondary indexes. Elementary shard deployment need 9 nodes at minimum.
- Document-level locks are a problem if you have a very high write-rate
- 快速地
- 成熟稳定——功能丰富
- 支持故障转移
- 水平可扩展读取——从副本/辅助读取
- 除非您使用 mongo 分片,否则写入不可横向扩展
- 支持高级查询
- 支持多个二级索引
- 分片架构变得棘手,无法扩展到需要二级索引的程度。基本分片部署至少需要 9 个节点。
- 如果您的写入速率非常高,则文档级锁是一个问题
Couchbase Server
沙发基地服务器
- Fast
- Sharded cluster instead of master-slave of mongodb
- Hot failover support
- Horizontally scalable
- Supports secondary indexes through views
- Learning curve bigger than MongoDB
- Claims to be faster
- 快速地
- 分片集群代替mongodb的主从
- 热故障转移支持
- 水平可扩展
- 通过视图支持二级索引
- 学习曲线大于 MongoDB
- 声称更快

