为什么以及何时应该使用 SPARSE COLUMN?(SQL SERVER 2008)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1398453/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why & When should I use SPARSE COLUMN? (SQL SERVER 2008)
提问by priyanka.sarkar
After going thru some tutorials on SQL Server 2008's new feature "SPARSE COLUMN", I have found that it doesn't take any space if the column value is 0 or NULL but when there is a value, it takes 4 times the space a regular(non sparse) column holds.
在阅读了有关 SQL Server 2008 的新功能“SPARSE COLUMN”的一些教程后,我发现如果列值为 0 或 NULL,它不会占用任何空间,但是当有一个值时,它占用的空间是常规的 4 倍(非稀疏)列保留。
If my understanding is correct, then why I will go for that at the time of database design? And if I use that, then at what situation will I be?
如果我的理解是正确的,那为什么我在数据库设计的时候会这样做呢?如果我使用它,那么我会在什么情况下?
Also out of curiosity, how does no space get reserved when a column is defined as sparse column (I mean to say, what is the internal implementation for that?)
同样出于好奇,当一个列被定义为稀疏列时,如何没有保留空间(我的意思是说,它的内部实现是什么?)
回答by BradC
A sparse column doesn't use 4x the amount of spaceto store a value, it uses a (fixed) 4 extra bytesper non-null value. (As you've already stated, a NULL takes 0 space.)
稀疏列不使用4 倍的空间来存储值,它为每个非空值使用(固定)4 个额外字节。(正如您已经说过的,NULL 占用 0 空间。)
So a non-null value stored in a bitcolumn would be 1 bit + 4 bytes = 4.125 bytes. But if 99% of these are NULL, it is still a net savings.
A non-null value stored in a GUID (UniqueIdentifier)column is 16 bytes + 4 bytes = 20 bytes. So if only 50% of these are NULL, that's still a net savings.
因此,存储在位列中的非空值将是 1 位 + 4 字节 = 4.125 字节。但如果其中 99% 为 NULL,则仍然是净节省。
存储在GUID (UniqueIdentifier)列中的非空值是 16 字节 + 4 字节 = 20 字节。因此,如果其中只有 50% 为 NULL,那仍然是净节省。
So the "expected savings" depends strongly on what kindof column we're talking about, and your estimate of what ratio will be null vs non-null. Variable width columns (varchars) are probably a little more difficult to predict accurately.
因此,“预期节省”在很大程度上取决于我们正在谈论的列类型,以及您对空值与非空值比率的估计。可变宽度列 (varchars) 可能更难以准确预测。
This Books Online Pagehas a table showing what percentageof different data types would need to be null for you to end up with a benefit.
此联机丛书页面有一个表格,显示不同数据类型需要为空的百分比才能最终获得收益。
So whenshould you use a Sparse Column? When you expect a significant percentage of the rows to have a NULL value. Some examples that come to mind:
那么什么时候应该使用稀疏列呢?当您期望很大比例的行具有 NULL 值时。想到的一些例子:
- A "Order Return Date" column in an order table. You would hope that a very small percent of sales would result in returned products.
- A "4th Address" line in an Address table. Most mailing addresses, even if you need a Department name and a "Care Of" probably don't need 4 separate lines.
- A "Suffix" column in a customer table. A fairly low percent of people have a "Jr." or "III" or "Esquire" after their name.
- 订单表中的“订单退货日期”列。您可能希望只有很小一部分的销售额会导致退货。
- 地址表中的“第 4 个地址”行。大多数邮寄地址,即使您需要部门名称和“关心”也可能不需要 4 行。
- 客户表中的“后缀”列。相当低百分比的人拥有“Jr”。或在他们的名字后面加上“III”或“Esquire”。
回答by rahul
Storing a null in a sparse column takes up no space at all.
To any external application the column will behave the same
Sparse columns work really well with filtered indexes as you will only want to create an index to deal with the non-empty attributes in the column.
You can create a column set over the sparse columns that returns an xml clip of all of the non-null data from columns covered by the set. The column set behaves like a column itself. Note: you can only have one column set per table.
Change Data Capture and Transactional replication both work, but not the column sets feature.
在稀疏列中存储空值根本不占用空间。
对于任何外部应用程序,该列的行为都相同
稀疏列与过滤索引一起工作得非常好,因为您只想创建一个索引来处理列中的非空属性。
您可以在稀疏列上创建一个列集,该列集返回该集覆盖的列中所有非空数据的 xml 剪辑。列集的行为类似于列本身。注意:每个表只能设置一列。
变更数据捕获和事务复制都有效,但列集功能无效。
Downsides
缺点
If a sparse column has data in it it will take 4 more bytes than a normal column e.g. even a bit (0.125 bytes normally) is 4.125 bytes and unique identifier rises form 16 bytes to 20 bytes.
Not all data type can be sparse: text, ntext, image, timestamp, user-defined data type, geometry, or geography or varbinray (max) with the FILESTREAM attribute cannot be sparse. (Changed17/5/2009 thanks Alex for spotting the typo)
computed columns can't be sparse (although sparse columns can take part in a calculation in another computed column)
You can't apply rules or have default values.
Sparse columns cannot form part of a clustered index. If you need to do that use a computed column based on the sparse column and create the clustered index on that (which sort of defeats the object).
Merge replication doesn't work.
Data compression doesn't work.
Access (read and write) to sparse columns is more expensive, but I haven't been able to find any exact figures on this.
如果稀疏列中有数据,它将比普通列多占用 4 个字节,例如,即使一个位(通常为 0.125 个字节)也是 4.125 个字节,唯一标识符从 16 个字节上升到 20 个字节。
并非所有数据类型都可以是稀疏的:文本、ntext、图像、时间戳、用户定义的数据类型、几何或地理或具有 FILESTREAM 属性的 varbinray (max) 不能是稀疏的。(Changed17/5/2009 感谢 Alex 发现错别字)
计算列不能是稀疏的(尽管稀疏列可以参与另一个计算列中的计算)
您不能应用规则或具有默认值。
稀疏列不能构成聚集索引的一部分。如果您需要这样做,请使用基于稀疏列的计算列并在其上创建聚集索引(这会使对象失败)。
合并复制不起作用。
数据压缩不起作用。
访问(读取和写入)稀疏列的成本更高,但我无法找到任何确切的数字。
回答by mbourgon
You're reading it wrong - it never takes 4x the space.
你读错了——它永远不会占用 4 倍的空间。
Specifically, it says 4* (4 bytes, see footnote), not 4x (multiply by 4). The only case where it's exactly 4x the space is a char(4), which would see savings if the NULLs exist more than 64% of the time.
具体来说,它说 4*(4 个字节,见脚注),而不是 4x(乘以 4)。唯一正好是空间 4 倍的情况是 char(4),如果 NULL 存在的时间超过 64%,这将节省空间。
"*The length is equal to the average of the data that is contained in the type, plus 2 or 4 bytes."
“*长度等于类型中包含的数据的平均值,再加上 2 或 4 个字节。”
回答by Ian Boyd
| datetime NULL | datetime SPARSE NULL | datetime SPARSE NULL |
|--------------------|----------------------|----------------------|
| 20171213 (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| NULL (8 bytes) | 20171213 (12 bytes) | 20171213 (12 bytes) |
| 20171213 (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
| NULL (8 bytes) | NULL (0 bytes) | NULL (0 bytes) |
You lose 4 bytes not just once per row; but for every cell in the row that is not null.
您每行丢失 4 个字节,而不仅仅是一次;但是对于行中不为空的每个单元格。
回答by Place Holiday
From SQL SERVER – 2008 – Introduction to SPARSE Columns – Part 2by Pinal Dave:
从SQL SERVER - 2008 -介绍稀疏列-第2部分由皮纳尔戴夫:
All SPARSE columns are stored as one XML column in database. Let us see some of the advantage and disadvantage of SPARSE column.
Advantages of SPARSE column are:
INSERT, UPDATE, and DELETE statements can reference the sparse columns by name. SPARSE column can work as one XML column as well.
SPARSE column can take advantage of filtered Indexes, where data are filled in the row.
SPARSE column saves lots of database space when there are zero or null values in database.
Disadvantages of SPARSE column are:
SPARSE column does not have IDENTITY or ROWGUIDCOL property.
SPARSE column can not be applied on text, ntext, image, timestamp, geometry, geography or user defined datatypes.
SPARSE column can not have default value or rule or computed column.
Clustered index or a unique primary key index can not be applied SPARSE column. SPARSE column can not be part of clustered index key.
Table containing SPARSE column can have maximum size of 8018 bytes instead of regular 8060 bytes. A table operation which involves SPARSE column takes performance hit over regular column.
所有 SPARSE 列都作为一个 XML 列存储在数据库中。让我们看看 SPARSE 列的一些优点和缺点。
SPARSE 列的优点是:
INSERT、UPDATE 和 DELETE 语句可以按名称引用稀疏列。SPARSE 列也可以作为一个 XML 列使用。
SPARSE 列可以利用过滤索引,其中数据填充在行中。
当数据库中有零值或空值时,SPARSE 列可以节省大量的数据库空间。
SPARSE 列的缺点是:
SPARSE 列没有 IDENTITY 或 ROWGUIDCOL 属性。
SPARSE 列不能应用于文本、ntext、图像、时间戳、几何、地理或用户定义的数据类型。
SPARSE 列不能有默认值或规则或计算列。
聚集索引或唯一主键索引不能应用于 SPARSE 列。SPARSE 列不能是聚集索引键的一部分。
包含 SPARSE 列的表的最大大小为 8018 字节,而不是常规的 8060 字节。涉及 SPARSE 列的表操作对常规列的性能造成影响。