为什么以及何时应该使用 SPARSE COLUMN?(SQL SERVER 2008)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1398453/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 03:31:44  来源:igfitidea点击:

Why & When should I use SPARSE COLUMN? (SQL SERVER 2008)

sqlsql-serverdatabase-designsql-server-2008

提问by priyanka.sarkar

After going thru some tutorials on SQL Server 2008's new feature "SPARSE COLUMN", I have found that it doesn't take any space if the column value is 0 or NULL but when there is a value, it takes 4 times the space a regular(non sparse) column holds.

在阅读了有关 SQL Server 2008 的新功能“SPARSE COLUMN”的一些教程后,我发现如果列值为 0 或 NULL,它不会占用任何空间,但是当有一个值时,它占用的空间是常规的 4 倍(非稀疏)列保留。

If my understanding is correct, then why I will go for that at the time of database design? And if I use that, then at what situation will I be?

如果我的理解是正确的,那为什么我在数据库设计的时候会这样做呢?如果我使用它,那么我会在什么情况下?

Also out of curiosity, how does no space get reserved when a column is defined as sparse column (I mean to say, what is the internal implementation for that?)

同样出于好奇,当一个列被定义为稀疏列时,如何没有保留空间(我的意思是说,它的内部实现是什么?)

回答by BradC

A sparse column doesn't use 4x the amount of spaceto store a value, it uses a (fixed) 4 extra bytesper non-null value. (As you've already stated, a NULL takes 0 space.)

稀疏列不使用4 倍的空间来存储值,它为每个非空值使用(固定)4 个额外字节。(正如您已经说过的,NULL 占用 0 空间。)

  • So a non-null value stored in a bitcolumn would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.

  • A non-null value stored in a GUID (UniqueIdentifier)column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.

  • 因此,存储在位列中的非空值将是 1 位 + 4 字节 = 4.125 字节。但如果其中 99% 为 NULL,则仍然是净节省。

  • 存储在GUID (UniqueIdentifier)列中的非空值是 16 字节 + 4 字节 = 20 字节。因此,如果其中只有 50% 为 NULL,那仍然是净节省。

So the "expected savings" depends strongly on what kindof column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.

因此,“预期节省”在很大程度上取决于我们正在谈论的列类型,以及您对空值与非空值比率的估计。可变宽度列 (varchars) 可能更难以准确预测。

This Books Online Pagehas a table showing what percentageof different data types would need to be null for you to end up with a benefit.

联机丛书页面有一个表格,显示不同数据类型需要为空的百分比才能最终获得收益。

So whenshould you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:

那么什么时候应该使用稀疏列呢?当您期望很大比例的行具有 NULL 值时。想到的一些例子:

  • A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
  • A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
  • A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
  • 订单表中的“订单退货日期”列。您可能希望只有很小一部分的销售额会导致退货。
  • 地址表中的“第 4 个地址”行。大多数邮寄地址,即使您需要部门名称和“关心”也可能不需要 4 行。
  • 客户表中的“后缀”列。相当低百分比的人拥有“Jr”。或在他们的名字后面加上“III”或“Esquire”。

回答by rahul

  • Storing a null in a sparse column takes up no space at all.

  • To any external application the column will behave the same

  • Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.

  • You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set. The column set behaves like a column itself. Note: you can only have one column set per table.

  • Change Data Capture and Transactional replication both work, but not the column sets feature.

  • 在稀疏列中存储空值根本不占用空间。

  • 对于任何外部应用程序,该列的行为都相同

  • 稀疏列与过滤索引一起工作得非常好,因为您只想创建一个索引来处理列中的非空属性。

  • 您可以在稀疏列上创建一个列集,该列集返回该集覆盖的列中所有非空数据的 xml 剪辑。列集的行为类似于列本身。注意:每个表只能设置一列。

  • 变更数据捕获和事务复制都有效,但列集功能无效。

Downsides

缺点

  • If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.

  • Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)

  • computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)

  • You can't apply rules or have default values.

  • Sparse columns cannot form part of a clustered index. If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).

  • Merge replication doesn't work.

  • Data compression doesn't work.

  • Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.

  • 如果稀疏列中有数据,它将比普通列多占用 4 个字节,例如,即使一个位(通常为 0.125 个字节)也是 4.125 个字节,唯一标识符从 16 个字节上升到 20 个字节。

  • 并非所有数据类型都可以是稀疏的:文本、ntext、图像、时间戳、用户定义的数据类型、几何或地理或具有 FILESTREAM 属性的 varbinray (max) 不能是稀疏的。(Changed17/5/2009 感谢 Alex 发现错别字)

  • 计算列不能是稀疏的(尽管稀疏列可以参与另一个计算列中的计算)

  • 您不能应用规则或具有默认值。

  • 稀疏列不能构成聚集索引的一部分。如果您需要这样做,请使用基于稀疏列的计算列并在其上创建聚集索引(这会使对象失败)。

  • 合并复制不起作用。

  • 数据压缩不起作用。

  • 访问(读取和写入)稀疏列的成本更高,但我无法找到任何确切的数字。

Reference

参考

回答by mbourgon

You're reading it wrong - it never takes 4x the space.

你读错了——它永远不会占用 4 倍的空间。

Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.

具体来说,它说 4*(4 个字节,见脚注),而不是 4x(乘以 4)。唯一正好是空间 4 倍的情况是 char(4),如果 NULL 存在的时间超过 64%,这将节省空间。

"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."

“*长度等于类型中包含的数据的平均值,再加上 2 或 4 个字节。”

回答by Ian Boyd

| datetime NULL      | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| NULL     (8 bytes) | 20171213 (12 bytes)  | 20171213 (12 bytes)  |
| 20171213 (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |
| NULL     (8 bytes) | NULL      (0 bytes)  | NULL      (0 bytes)  |

You lose 4 bytes not just once per row; but for every cell in the row that is not null.

您每行丢失 4 个字节,而不仅仅是一次;但是对于行中不为空的每个单元格。

回答by Place Holiday

From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2by Pinal Dave:

SQL SERVER - 2008 -介绍稀疏列-第2部分皮纳尔戴夫

All SPARSE columns are stored as one XML column in database. Let us see some of the advantage and disadvantage of SPARSE column.

Advantages of SPARSE column are:

  • INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.

  • SPARSE column can take advantage of filtered Indexes, where data are filled in the row.

  • SPARSE column saves lots of database space when there are zero or null values in database.

Disadvantages of SPARSE column are:

  • SPARSE column does not have IDENTITY or ROWGUIDCOL property.

  • SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.

  • SPARSE column can not have default value or rule or computed column.

  • Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.

  • Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.

所有 SPARSE 列都作为一个 XML 列存储在数据库中。让我们看看 SPARSE 列的一些优点和缺点。

SPARSE 列的优点是:

  • INSERT、UPDATE 和 DELETE 语句可以按名称引用稀疏列。SPARSE 列也可以作为一个 XML 列使用。

  • SPARSE 列可以利用过滤索引,其中数据填充在行中。

  • 当数据库中有零值或空值时,SPARSE 列可以节省大量的数据库空间。

SPARSE 列的缺点是:

  • SPARSE 列没有 IDENTITY 或 ROWGUIDCOL 属性。

  • SPARSE 列不能应用于文本、ntext、图像、时间戳、几何、地理或用户定义的数据类型。

  • SPARSE 列不能有默认值或规则或计算列。

  • 聚集索引或唯一主键索引不能应用于 SPARSE 列。SPARSE 列不能是聚集索引键的一部分。

  • 包含 SPARSE 列的表的最大大小为 8018 字节,而不是常规的 8060 字节。涉及 SPARSE 列的表操作对常规列的性能造成影响。