SQL 什么是列式数据库?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2133017/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:11:58  来源:igfitidea点击:

What is a columnar database?

sqldatabase

提问by Raj More

I have been working with warehousing for a while now.

我从事仓储工作已经有一段时间了。

I am intrigued by Columnar Databases and the speed that they have to offer for data retrievals.

我对列式数据库以及它们为数据检索提供的速度很感兴趣。

I have multi-part question:

我有多个问题:

  • How do Columnar Databases work?
  • How do they differ from relational databases?
  • 列式数据库如何工作?
  • 它们与关系数据库有何不同?

采纳答案by mjv

How do Columnar Databases work?
Columnar database is a conceptrather a particular architecture/implementation. In other words, there isn't one particular description on how these databases work; indeed, several are build upon traditional, row-oriented, DBMS, simply storing the info in tables with one (or rather often two) columns (and adding the necessary layer to access the columnar data in an easy fashion).

列式数据库如何工作?
列式数据库是一个概念,而不是一个特定的架构/实现。换句话说,对于这些数据库的工作方式没有一个特定的描述。事实上,有几个是建立在传统的、面向行的 DBMS 之上的,只是将信息存储在具有一列(或者更常见的是两列)的表中(并添加必要的层以简单的方式访问列式数据)。

How do they differ from relational databases?They generally differ from traditional (row-oriented) databases with regards to ...

它们与关系数据库有何不同?它们通常与传统(面向行)数据库的不同之处在于......

  • performance...
  • storage requirements ...
  • ease of modification of the schema ...
  • 表现...
  • 存储要求...
  • 易于修改架构...

...in specific use cases of DBMSes.
In particular they offer advantages in the areas mentioned when the typical use is to compute aggregate values on a limited number of columns, as opposed to try and retrieve all/most columns for a given entity.

...在 DBMS 的特定用例中
特别是当典型用途是在有限数量的列上计算聚合值时,它们在提到的领域提供了优势,而不是尝试检索给定实体的所有/大多数列。

Is there a trial version of a columnar database I can install to play around? (I am on Windows 7)Yes, there are commercial, free and also open-source implementation of columnar databases. See the list at the end of the Wikipedia articlefor starter.
Beware that several of these implementations were introduced to address a particular need(say very small footprint, highly compressible distribution of data, or spare matrix emulation etc.) rather than provide a general purpose column-oriented DBMS per-se.

我可以安装试用版的柱状数据库吗?(我使用的是 Windows 7)是的,列式数据库有商业的、免费的和开源的实现。请参阅维基百科文章末尾的列表以了解初学者。
请注意,这些实现中的一些是为了满足特定需求(比如非常小的占用空间、高度可压缩的数据分布或备用矩阵仿真等)而不是提供通用的面向列的 DBMS 本身。

Note: The remark about the "single purpose orientation" of several columnar DBMSes is not a critique of these implementations, but rather an additional indication that such an approach for DBMSes strays from the more "natural" (and certainly more broadly used) approach to storing record entities. As a result, this approach is used when the row-oriented approach isn't satisfactory, and therefore and tends to
a) be targeted for a particular purpose b) receive less resources/interest than work on "General Purpose", "Tried and Tested", tabular approach.

注意:关于几个柱状 DBMS 的“单一用途方向”的评论并不是对这些实现的批评,而是额外的迹象表明,这种 DBMS 的方法偏离了更“自然”(当然也更广泛使用)的方法存储记录实体。因此,当面向行的方法不令人满意时,会使用这种方法,因此并且倾向于
a) 针对特定目的 b) 获得的资源/兴趣少于“通用”、“尝试过和测试”,表格方法。

Tentatively, the Entity-Attribute-Value(EAV) data model, may be an alternative storage strategy which you may want to consider. Although distinct from the "pure" Columnar DB model, EAV shares several of the characteristics of Columnar DBs.

暂时,实体-属性-值(EAV) 数据模型可能是您可能想要考虑的替代存储策略。尽管与“纯”列式 DB 模型不同,但 EAV 具有列式 DB 的几个特征。

回答by Paul Mansour

How do columnar databases work?The defining concept of a column-store is that the values of a table are stored contiguously by column. Thus the classic supplier table from CJ Date's supplier and parts database:

列式数据库如何工作?列存储的定义概念是表的值按列连续存储。因此,来自 CJ Date 的供应商和零件数据库的经典供应商表:

SNO  STATUS CITY    SNAME
---  ------ ----    -----
S1       20 London  Smith
S2       10 Paris   Jones
S3       30 Paris   Blake
S4       20 London  Clark
S5       30 Athens  Adams

would be stored on disk or in memory something like:

将存储在磁盘或内存中,例如:

S1S2S3S4S5;2010302030;LondonParisParisLondonAthens;SmithJonesBlakeClarkAdams 

This is in contrast to a traditional rowstore which would store the data more like this:

这与传统的行存储形成对比,后者更像这样存储数据:

S120LondonSmith;S210ParisJones;S330ParisBlake;S420LondonClark;S530AthensAdams

From this simple concept flows all of the fundamental differences in performance, for better or worse, between a column-store and a row-store. For example, a column store will excel at doing aggregations like totals and averages, but inserting a single row can be expensive, while the inverse holds true for row-stores. This should be apparent from the above diagram.

从这个简单的概念中可以看出列存储和行存储之间性能的所有基本差异,无论好坏。例如,列存储在进行总计和平均值等聚合方面表现出色,但插入单行可能会很昂贵,而行存储则相反。从上图应该可以看出这一点。

How do they differ from relational databases?A relation database is a logical concept. A columnar database, or column-store, is a physical concept. Thus the two terms are not comparable in any meaningful way. Column- oriented DMBSs may be relational or not, just as row-oriented DBMS's may adhere more or less to relational principles.

它们与关系数据库有何不同?关系数据库是一个逻辑概念。列式数据库或列存储是一个物理概念。因此,这两个术语在任何意义上都没有可比性。面向列的 DMBS 可能是相关的,也可能不是,正如面向行的 DBMS 可能或多或少地遵循关系原则。

回答by hari_sree

I would say the best candidate to understand about column oriented databases is to check HBase (Apache Hbase) . You an checkout the code and explore further to find out about the implementation .

我想说了解面向列的数据库的最佳候选人是检查 HBase ( Apache Hbase)。您可以查看代码并进一步探索以了解实现。

回答by kim stanick

Also, Columnar DBs have a built in affinity for data compression, and the loading process is unique. Here's an articleI wrote in 2008 that explains a bit more.

此外,列式 DB 具有内置的数据压缩亲和性,并且加载过程是独一无二的。这是我在 2008 年写的一篇文章,其中解释了更多。

You may also be interested in a new report from IDC's Carl Olofson on 3rd generation DBMS technology. It discusses columnar, et al. If you're not an IDC client you can get it free on our site. He's doing a webinar on June 16th, too (also on our site).

您可能还对 IDC 的 Carl Olofson 关于第三代 DBMS 技术的新报告感兴趣。它讨论了柱状等。如果您不是 IDC 客户,您可以在我们的网站上免费获得。他也在 6 月 16 日举办网络研讨会(也在我们的网站上)。

(BTW, one comment above lists asterdata but I don't think they are columnar.)

(顺便说一句,上面的一条评论列出了asterdata,但我认为它们不是柱状的。)

回答by S.Lott

Product information. This may help. These were to featured products on a Google search.

产品信息。这可能会有所帮助。这些是在 Google 搜索中展示的特色产品。

http://www.vertica.com/

http://www.vertica.com/

http://www.paraccel.com/

http://www.paraccel.com/

http://www.asterdata.com/index.php

http://www.asterdata.com/index.php

回答by user2987828

kxis another columnar database, for example used in the financial sector. The licence is somewhat $50K last time I checked, though. No optimisation needed, no index needed, because kx has powerful operators (matlab equivalents: .*, kron, bsxfun, ...).

kx是另一个列式数据库,例如用于金融部门。不过,我上次检查时许可证是 5 万美元。无需优化,无需索引,因为 kx 具有强大的运算符(matlab 等效项:.*, kron, bsxfun, ...)。

回答by Razan Paul

To understand what is column oriented database, it is better to contrast it with row oriented database.

要了解什么是面向列的数据库,最好将其与面向行的数据库进行对比。

Row oriented databases(e.g. MS SQL Server and SQLite) are designed to efficiently return data for an entire row. It does it by storing all the columns values of a row together. Row-oriented databases are well-suited for?OLTP systems (e.g., retail sales, and financial transaction systems).

面向行的数据库(例如 MS SQL Server 和 SQLite)旨在高效地返回整行的数据。它通过将一行的所有列值存储在一起来实现。面向行的数据库非常适合 OLTP 系统(例如,零售和金融交易系统)。

Column oriented databasesare designed to efficiently return data for a limited number of columns. It does it by storing all of the values of a column together. Two widely used Column oriented databases are Apache Hbase and Google BigTable (used by Google for its Search, Analytics, Maps and Gmail). They are suitable for the big data projects. A column oriented database will excel at read operations on a limited number of columns, however write operation will be expensive compared to row oriented databases.

面向列的数据库旨在有效地返回有限数量的列的数据。它通过将一列的所有值存储在一起来实现。两个广泛使用的面向列的数据库是 Apache Hbase 和 Google BigTable(被 Google 用于其搜索、分析、地图和 Gmail)。它们适用于大数据项目。面向列的数据库将擅长对有限数量的列进行读取操作,但是与面向行的数据库相比,写入操作的成本会很高。

For more: https://en.wikipedia.org/wiki/Column-oriented_DBMS

更多信息:https: //en.wikipedia.org/wiki/Column-oriented_DBMS