database 稀疏数据/数据存储/数据库是什么意思？

Question

提问by Jai

Have been reading up on Hadoop and HBase lately, and came across this term-

最近一直在阅读 Hadoop 和 HBase，并且遇到了这个术语-

HBase is an open-source, distributed, sparse, column-oriented store...

HBase 是一个开源的、分布式的、稀疏的、面向列的存储...

What do they mean by sparse? Does it have something to do with a sparse matrix? I am guessing it is a property of the type of data it can store efficiently, and hence, would like to know more about it.

他们所说的稀疏是什么意思？它与稀疏矩阵有关吗？我猜它是它可以有效存储的数据类型的一个属性，因此，想了解更多关于它的信息。

Answer 1

回答by Peter Wone

In a regular database, rows are sparse but columns are not. When a row is created, storage is allocated for every column, irrespective of whether a value exists for that field (a field being storage allocated for the intersection of a row and and a column).

在常规数据库中，行是稀疏的，但列不是。创建行时，会为每一列分配存储空间，而不管该字段是否存在值（字段是为行和列的交集分配的存储空间）。

This allows fixed length rows greatly improving read and write times. Variable length data types are handled with an analogue of pointers.

这允许固定长度的行大大提高读取和写入时间。可变长度数据类型用指针的类似物处理。

Sparse columns will incur a performance penalty and are unlikely to save you much disk space because the space required to indicate NULL is smaller than the 64-bit pointer required for the linked-list style of chained pointer architecture typically used to implement very large non-contiguous storage.

稀疏列将导致性能损失，并且不太可能为您节省大量磁盘空间，因为指示 NULL 所需的空间小于链式指针体系结构的链表样式所需的 64 位指针，通常用于实现非常大的非连续存储。

Storage is cheap. Performance isn't.

存储便宜。性能不行。

Answer 2

回答by David

At the storage level, all data is stored as a key-value pair. Each storage file contains an index so that it knows where each key-value starts and how long it is.

在存储级别，所有数据都存储为键值对。每个存储文件都包含一个索引，以便它知道每个键值的开始位置和长度。

As a consequence of this, if you have very long keys (e.g. a full URL), and a lot of columns associated with that key, you could be wasting some space. This is ameliorated somewhat by turning compression on.

因此，如果您有很长的键（例如一个完整的 URL），并且有很多与该键相关联的列，那么您可能会浪费一些空间。通过打开压缩可以稍微改善这种情况。

See: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

参见：http: //www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html

for more information on HBase storage

有关 HBase 存储的更多信息

Answer 3

回答by Donald Miner

Sparse in respect to HBase is indeed used in the same context as a sparse matrix. It basically means that fields that are null are free to store (in terms of space).

相对于 HBase 的稀疏确实在与稀疏矩阵相同的上下文中使用。它基本上意味着空字段可以自由存储（就空间而言）。

I found a couple of blog posts that touch on this subject in a bit more detail:

我发现了一些更详细地涉及这个主题的博客文章：

http://blog.rapleaf.com/dev/2008/03/11/matching-impedance-when-to-use-hbase/

http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable

Answer 4

回答by Kh.Taheri

The best article I have seen, which explains many databases terms as well.

我见过的最好的文章，它也解释了许多数据库术语。

> http://jimbojw.com/#understanding%20hbase

Answer 5

回答by Ashish Singh

There are two way of data storing in the tables it will be either Sparse data and Dense data. example for sparse data.

表中有两种数据存储方式，即稀疏数据和密集数据。稀疏数据的例子。

Suppose we have to perform a operation on a table containing sales data for transaction by employee between the month jan2015 to nov 2015 then after triggering the query we will get data which satisfies above timestamp condition if employee didnt made any transaction then the whole row will return blank

假设我们必须对包含 2015 年 1 月 2015 年 11 月之间员工交易的销售数据的表执行操作，然后在触发查询后，如果员工没有进行任何交易，我们将获得满足上述时间戳条件的数据，则整行将返回空白的

eg. EMPNo Name Product Date Quantity

例如。EMPNo 名称产品日期数量

 1234  Mike    Hbase    2014/12/01     1
 5678                                        
 3454  Jole    Flume    2015/09/12   3

the row with empno5678 have no data and rest of the rows contains the data if we consider whole table with blanks row and populated row then we can termed it as sparse data.

empno5678 的行没有数据，其余的行包含数据，如果我们考虑带有空白行和填充行的整个表，那么我们可以将其称为稀疏数据。

If we take only populated data then it is termed as dense data.

如果我们只采用填充数据，那么它被称为密集数据。

database 稀疏数据/数据存储/数据库是什么意思？

提问by Jai

回答by Peter Wone

回答by David

回答by Donald Miner

回答by Kh.Taheri

回答by Ashish Singh

相关推荐

最近更新

标签

database 稀疏数据/数据存储/数据库是什么意思？

提问by Jai

回答by Peter Wone

回答by David

回答by Donald Miner

回答by Kh.Taheri

回答by Ashish Singh

相关推荐

database 实体框架代码优先数据库中的默认数据

database 抑制 Visual Studio 2010 中的数据库项目错误和警告

database 在多用户系统中并发执行数据库事务是什么意思？为什么需要并发控制？

database 从功能依赖确定键

相关推荐

最近更新

标签