database Google 的 Bigtable 与关系数据库

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/782913/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:15:43  来源:igfitidea点击:

Google's Bigtable vs. A Relational Database

databaserelationalbigtable

提问by Daniel Kivatinos

Duplicates

重复

I don't know much about Google's Bigtable but am wondering what the difference between Google's Bigtable and relational databases like MySQL is. What are the limitations of both?

我不太了解 Google 的 Bigtable,但我想知道 Google 的 Bigtable 和 MySQL 等关系数据库之间的区别是什么。两者的限制是什么?

回答by tylerl

Bigtable is Google's invention to deal with the massive amounts of information that the company regularly deals in. A Bigtable dataset can grow to immense size (many petabytes) with storage distributed across a large number of servers. The systems using Bigtable include projects like Google's web index and Google Earth.

Bigtable 是 Google 的发明,用于处理公司经常处理的大量信息。Bigtable 数据集可以增长到巨大的大小(许多 PB),存储分布在大量服务器上。使用 Bigtable 的系统包括 Google 的网络索引和 Google 地球等项目。

According to Google whitepaperon the subject:

根据谷歌关于这个主题的白皮书

A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.

Bigtable 是一个稀疏的、分布式的、持久的多维排序映射。地图由行键、列键和时间戳索引;映射中的每个值都是一个未解释的字节数组。

The internal mechanics of Bigtable versus, say, MySQL are so dissimilar as to make comparison difficult, and the intended goals don't overlap much either. But you can think of Bigtable a bit like a single-table database. Imagine, for example, the difficulties you would run into if you tried to implement Google's entire web search system with a MySQL database -- Bigtable was built around solving those problems.

Bigtable 与 MySQL 的内部机制是如此不同,以至于很难进行比较,而且预期目标也没有太多重叠。但是您可以将 Bigtable 想成有点像单表数据库。例如,想象一下,如果您尝试使用 MySQL 数据库实现 Google 的整个网络搜索系统,将会遇到哪些困难——Bigtable 就是围绕解决这些问题而构建的。

Bigtable datasets can be queried from services like AppEngine using a language called GQL ("gee-kwal") which is a based on a subset of SQL. Conspicuously missing from GQL is any sort of JOINcommand. Because of the distributed nature of a Bigtable database, performing a join between two tables would be terribly inefficient. Instead, the programmer has to implement such logic in his application, or design his application so as to not need it.

Bigtable 数据集可以使用一种称为 GQL(“gee-kwal”)的语言从 AppEngine 等服务中查询,该语言基于 SQL 的子集。GQL 中明显缺少的是任何类型的JOIN命令。由于 Bigtable 数据库的分布式特性,在两个表之间执行连接会非常低效。相反,程序员必须在他的应用程序中实现这样的逻辑,或者设计他的应用程序以便不需要它。

回答by Miguel Ping

Google's BigTable and other similar projects (ex: CouchDB, HBase) are database systems that are oriented so that data is mostly denormalized(ie, duplicated and grouped).

Google 的 BigTable 和其他类似项目(例如:CouchDBHBase)是面向数据库系统,因此数据大多是非规范化的(即,重复和分组)。

The main advantages are: - Join operations are less costly because of the denormalization - Replication/distribution of data is less costly because of data independence (ie, if you want to distribute data across two nodes, you probably won't have the problem of having an entity in one node and other related entity in another node because similar data is grouped)

主要优点是: - 由于非规范化,连接操作的成本较低 - 由于数据独立,数据的复制/分发成本较低(即,如果您想跨两个节点分发数据,您可能不会遇到在一个节点中有一个实体,在另一个节点中有其他相关实体,因为相似的数据被分组)

This kind of systems are indicated for applications that need to achieve optimal scale (ie, you add more nodes to the system and performance increases proportionally). In an RDBMS like MySQL or Oracle, when you start adding more nodes if you join two tables that are not in the same node, the join cost is higher. This becomes important when you are dealing with high volumes.

这种系统适用于需要实现最佳规模的应用程序(即,您向系统添加更多节点并按比例提高性能)。在像 MySQL 或 Oracle 这样的 RDBMS 中,如果您加入两个不在同一节点中的表,当您开始添加更多节点时,加入成本更高。当您处理高容量时,这变得很重要。

RDBMS' are nice because of the richness of the storage model (tables, joins, fks). Distributed databases are nice because of the ease of scale.

由于存储模型(表、连接、fks)的丰富性,RDBMS 非常好。分布式数据库很好,因为它易于扩展。