mongodb 面向列的 NoSQL 与面向文档的 NoSQL 有何不同?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7565012/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does column-oriented NoSQL differ from document-oriented?
提问by Luke
The three types of NoSQL databases I've read about is key-value, column-oriented, and document-oriented.
我读过的三种类型的 NoSQL 数据库是键值、面向列和面向文档。
Key-value is pretty straight forward - a key with a plain value.
键值非常简单 - 具有普通值的键。
I've seen document-oriented databases described as like key-value, but the value can be a structure, like a JSON object. Each "document" can have all, some, or none of the same keys as another.
我已经看到面向文档的数据库被描述为键值,但值可以是一个结构,如 JSON 对象。每个“文档”可以具有与另一个相同的所有、一些或没有相同的键。
Column oriented seems to be very much like document oriented in that you don't specify a structure.
面向列似乎与面向文档非常相似,因为您没有指定结构。
So what is the difference between these two, and why would you use one over the other?
那么这两者之间有什么区别,为什么要使用一个而不是另一个呢?
I've specifically looked at MongoDB and Cassandra. I basically need a dynamic structure that can change, but not affect other values. At the same time I need to be able to search/filter specific keys and run reports. With CAP, AP is the most important to me. The data can "eventually" be synced across nodes, just as long as there is no conflict or loss of data. Each user would get their own "table".
我专门研究过 MongoDB 和 Cassandra。我基本上需要一个可以改变但不影响其他值的动态结构。同时,我需要能够搜索/过滤特定键并运行报告。有了CAP,AP对我来说是最重要的。只要没有冲突或数据丢失,数据就可以“最终”跨节点同步。每个用户都会得到他们自己的“表”。
采纳答案by DNA
In Cassandra, each row (addressed by a key) contains one or more "columns". Columns are themselves key-value pairs. The column names need not be predefined, i.e. the structure isn't fixed. Columns in a row are stored in sorted order according to their keys (names).
在 Cassandra 中,每一行(由一个键寻址)包含一个或多个“列”。列本身就是键值对。列名不需要预定义,即结构不是固定的。一行中的列根据它们的键(名称)按排序顺序存储。
In some cases, you may have very large numbers of columns in a row (e.g. to act as an index to enable particular kinds of query). Cassandra can handle such large structures efficiently, and you can retrieve specific ranges of columns.
在某些情况下,您可能在一行中有非常多的列(例如,作为索引以启用特定类型的查询)。Cassandra 可以有效地处理如此大的结构,并且您可以检索特定范围的列。
There is a further level of structure (not so commonly used) called super-columns, where a column contains nested (sub)columns.
还有一个更深层次的结构(不太常用)称为超级列,其中一列包含嵌套(子)列。
You can think of the overall structure as a nested hashtable/dictionary, with 2 or 3 levels of key.
您可以将整个结构视为嵌套的哈希表/字典,具有 2 或 3 级键。
Normal column family:
普通列族:
row
col col col ...
val val val ...
Super column family:
超级列族:
row
supercol supercol ...
(sub)col (sub)col ... (sub)col (sub)col ...
val val ... val val ...
There are also higher-level structures - column families and keyspaces - which can be used to divide up or group together your data.
还有更高级别的结构 - 列族和键空间 - 可用于分割或组合您的数据。
See also this Question: Cassandra: What is a subcolumn
另请参阅此问题:Cassandra:什么是子列
Or the data modelling links from http://wiki.apache.org/cassandra/ArticlesAndPresentations
或者来自http://wiki.apache.org/cassandra/ArticlesAndPresentations的数据建模链接
Re: comparison with document-oriented databases - the latter usually insert whole documents (typically JSON), whereas in Cassandra you can address individual columns or supercolumns, and update these individually, i.e. they work at a different level of granularity. Each column has its own separate timestamp/version (used to reconcile updates across the distributed cluster).
回复:与面向文档的数据库的比较 - 后者通常插入整个文档(通常是 JSON),而在 Cassandra 中,您可以处理单个列或超级列,并单独更新它们,即它们在不同的粒度级别上工作。每列都有自己独立的时间戳/版本(用于协调跨分布式集群的更新)。
The Cassandra column values are just bytes, but can be typed as ASCII, UTF8 text, numbers, dates etc.
Cassandra 列值只是字节,但可以输入为 ASCII、UTF8 文本、数字、日期等。
Of course, you could use Cassandra as a primitive document store by inserting columns containing JSON - but you wouldn't get all the features of a real document-oriented store.
当然,您可以通过插入包含 JSON 的列将 Cassandra 用作原始文档存储 - 但您不会获得真正面向文档的存储的所有功能。
回答by Theo
The main difference is that document stores (e.g. MongoDB and CouchDB) allow arbitrarily complex documents, i.e. subdocuments within subdocuments, lists with documents, etc. whereas column stores (e.g. Cassandra and HBase) only allow a fixed format, e.g. strict one-level or two-level dictionaries.
主要区别在于文档存储(例如 MongoDB 和 CouchDB)允许任意复杂的文档,即子文档中的子文档、包含文档的列表等,而列存储(例如 Cassandra 和 HBase)只允许固定格式,例如严格的一级或两级词典。
回答by user327961
In "insert", to use rdbms words, Document-based is more consistent and straight foward. Note than cassandra let you achieve consistency with the notion of quorum, but that won't apply to all column-based systems and that reduce availibility. On a write-once / read-often heavy system, go for MongoDB. Also consider it if you always plan to read the whole structure of the object. A document-based system is designed to return the whole document when you get it, and is not very strong at returning parts of the whole row.
在“插入”中,使用 rdbms 的话,基于文档的更加一致和直接。请注意,cassandra 可以让您与法定人数的概念保持一致,但这并不适用于所有基于列的系统,并且会降低可用性。在一次写入/经常读取的繁重系统上,选择 MongoDB。如果您总是计划读取对象的整个结构,也请考虑它。基于文档的系统旨在在您获得文档时返回整个文档,并且在返回整行的部分方面不是很强大。
The column-based systems like Cassandra are way better than document-based in "updates". You can change the value of a column without even reading the row that contains it. The write doesn't actualy need to be done on the same server, a row may be contained on multiple files of multiple server. On huge fast-evolving data system, go for Cassandra. Also consider it if you plan to have very big chunk of data per key, and won't need to load all of them at each query. In "select", Cassandra let you load only the column you need.
Cassandra 等基于列的系统在“更新”方面比基于文档的系统要好得多。您甚至可以在不读取包含它的行的情况下更改列的值。写入实际上不需要在同一台服务器上完成,一行可能包含在多个服务器的多个文件中。在庞大的快速发展的数据系统上,选择 Cassandra。如果您计划每个键拥有非常大的数据块,并且不需要在每次查询时加载所有数据,也请考虑它。在“选择”中,Cassandra 让您只加载您需要的列。
Also consider that Mongo DB is written in C++, and is at its second major release, while Cassandra needs to run on a JVM, and its first major release is in release candidate only since yesterday (but the 0.X releases turned in productions of major company already).
还要考虑到 Mongo DB 是用 C++ 编写的,并且是它的第二个主要版本,而 Cassandra 需要在 JVM 上运行,并且它的第一个主要版本是从昨天开始的候选版本(但是 0.X 版本变成了大公司已经)。
On the other hand, Cassandra's designed was partly based on Amazon Dynamo, and it is built at its core to be an High Availibility solution, but that does not have anything to do with the column-based format. MongoDB scales out too, but not as gracefully as Cassandra.
另一方面,Cassandra 的设计部分基于 Amazon Dynamo,其核心构建为高可用性解决方案,但这与基于列的格式没有任何关系。MongoDB 也可以横向扩展,但不如 Cassandra 优雅。