database 稀疏数据/数据存储/数据库是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6587007/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is meant by sparse data/ datastore/ database?
提问by Jai
Have been reading up on Hadoop and HBase lately, and came across this term-
最近一直在阅读 Hadoop 和 HBase,并且遇到了这个术语-
HBase is an open-source, distributed, sparse, column-oriented store...
HBase 是一个开源的、分布式的、稀疏的、面向列的存储...
What do they mean by sparse? Does it have something to do with a sparse matrix? I am guessing it is a property of the type of data it can store efficiently, and hence, would like to know more about it.
他们所说的稀疏是什么意思?它与稀疏矩阵有关吗?我猜它是它可以有效存储的数据类型的一个属性,因此,想了解更多关于它的信息。
回答by Peter Wone
In a regular database, rows are sparse but columns are not. When a row is created, storage is allocated for every column, irrespective of whether a value exists for that field (a field being storage allocated for the intersection of a row and and a column).
在常规数据库中,行是稀疏的,但列不是。创建行时,会为每一列分配存储空间,而不管该字段是否存在值(字段是为行和列的交集分配的存储空间)。
This allows fixed length rows greatly improving read and write times. Variable length data types are handled with an analogue of pointers.
这允许固定长度的行大大提高读取和写入时间。可变长度数据类型用指针的类似物处理。
Sparse columns will incur a performance penalty and are unlikely to save you much disk space because the space required to indicate NULL is smaller than the 64-bit pointer required for the linked-list style of chained pointer architecture typically used to implement very large non-contiguous storage.
稀疏列将导致性能损失,并且不太可能为您节省大量磁盘空间,因为指示 NULL 所需的空间小于链式指针体系结构的链表样式所需的 64 位指针,通常用于实现非常大的非连续存储。
Storage is cheap. Performance isn't.
存储便宜。性能不行。
回答by David
At the storage level, all data is stored as a key-value pair. Each storage file contains an index so that it knows where each key-value starts and how long it is.
在存储级别,所有数据都存储为键值对。每个存储文件都包含一个索引,以便它知道每个键值的开始位置和长度。
As a consequence of this, if you have very long keys (e.g. a full URL), and a lot of columns associated with that key, you could be wasting some space. This is ameliorated somewhat by turning compression on.
因此,如果您有很长的键(例如一个完整的 URL),并且有很多与该键相关联的列,那么您可能会浪费一些空间。通过打开压缩可以稍微改善这种情况。
See: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
参见:http: //www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
for more information on HBase storage
有关 HBase 存储的更多信息
回答by Donald Miner
Sparse in respect to HBase is indeed used in the same context as a sparse matrix. It basically means that fields that are null are free to store (in terms of space).
相对于 HBase 的稀疏确实在与稀疏矩阵相同的上下文中使用。它基本上意味着空字段可以自由存储(就空间而言)。
I found a couple of blog posts that touch on this subject in a bit more detail:
我发现了一些更详细地涉及这个主题的博客文章:
http://blog.rapleaf.com/dev/2008/03/11/matching-impedance-when-to-use-hbase/
http://blog.rapleaf.com/dev/2008/03/11/matching-impedance-when-to-use-hbase/
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
回答by Kh.Taheri
The best article I have seen, which explains many databases terms as well.
我见过的最好的文章,它也解释了许多数据库术语。
回答by Ashish Singh
There are two way of data storing in the tables it will be either Sparse data and Dense data. example for sparse data.
表中有两种数据存储方式,即稀疏数据和密集数据。稀疏数据的例子。
Suppose we have to perform a operation on a table containing sales data for transaction by employee between the month jan2015 to nov 2015 then after triggering the query we will get data which satisfies above timestamp condition if employee didnt made any transaction then the whole row will return blank
假设我们必须对包含 2015 年 1 月 2015 年 11 月之间员工交易的销售数据的表执行操作,然后在触发查询后,如果员工没有进行任何交易,我们将获得满足上述时间戳条件的数据,则整行将返回空白的
eg. EMPNo Name Product Date Quantity
例如。EMPNo 名称 产品 日期 数量
1234 Mike Hbase 2014/12/01 1
5678
3454 Jole Flume 2015/09/12 3
the row with empno5678 have no data and rest of the rows contains the data if we consider whole table with blanks row and populated row then we can termed it as sparse data.
empno5678 的行没有数据,其余的行包含数据,如果我们考虑带有空白行和填充行的整个表,那么我们可以将其称为稀疏数据。
If we take only populated data then it is termed as dense data.
如果我们只采用填充数据,那么它被称为密集数据。

