此 SQL Server 约束中 PAD_INDEX 的用途是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6857007/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 11:30:38  来源:igfitidea点击:

What is the purpose of PAD_INDEX in this SQL Server constraint?

sqlsql-serverindexing

提问by radio star

I have the following constraint being applied to one of my tables, but I don't know what PAD_INDEX means.

我将以下约束应用于我的一个表,但我不知道 PAD_INDEX 是什么意思。

Can someone enlighten me?

有人可以启发我吗?

CONSTRAINT [PK_Employees] PRIMARY KEY CLUSTERED 
(
    [EmployeeId] ASC
) WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
        ^--------------^
         this part here

回答by gbn

An index in SQL Server is a B-Tree

SQL Server 中的索引是B 树

  • FILLFACTOR applies to the bottom layer
    This is the leaf node/data layer in the picture below

  • PAD_INDEX ON means "Apply FILLFACTOR to all layers"
    This is the intermediate levels in the picture below (between root and data)

  • FILLFACTOR 适用于底层
    这是下图中的叶子节点/数据层

  • PAD_INDEX ON 的意思是“Apply FILLFACTOR to all layers”
    这是下图中的中间层(root和data之间)

This means that PAD_INDEX is only useful if FILLFACTOR is set. FILLFACTOR determines how much free space in an data page (roughly)

这意味着 PAD_INDEX 只有在设置了 FILLFACTOR 时才有用。FILLFACTOR 确定数据页中有多少可用空间(大致)

A picture from MSDN:

来自 MSDN 的图片

B-Tree structure

B树结构

回答by SQLador

Basically, you set PAD_INDEX = ON if you expect a lot of random changes to the index regularly.

基本上,如果您希望定期对索引进行大量随机更改,则可以设置 PAD_INDEX = ON。

That helps avoiding index page splits.

这有助于避免索引页面拆分。

I set it on when I expect 30%+ of random records included in the index to be deleted on a regular basis.

当我希望定期删除索引中包含的 30% 以上的随机记录时,我设置了它。

回答by Edwin de Koning

From MSDN:

MSDN

PAD_INDEX = { ON | OFF }

PAD_INDEX = { ON | 离开 }

Specifies index padding. The default is OFF.

指定索引填充。默认为关闭。

ON: The percentage of free space that is specified by fillfactor is applied to the intermediate-level pages of the index.

ON:填充因子指定的可用空间百分比应用于索引的中间级页面。

OFF or fillfactor is not specified: The intermediate-level pages are filled to near capacity, leaving sufficient space for at least one row of the maximum size the index can have, considering the set of keys on the intermediate pages.

OFF 或 fillfactor 未指定:考虑到中间页上的一组键,中间级页被填充到接近容量,至少为索引可以具有的最大大小的一行留出足够的空间。

The PAD_INDEX option is useful only when FILLFACTOR is specified, because PAD_INDEX uses the percentage specified by FILLFACTOR. If the percentage specified for FILLFACTOR is not large enough to allow for one row, the Database Engine internally overrides the percentage to allow for the minimum. The number of rows on an intermediate index page is never less than two, regardless of how low the value of fillfactor.

PAD_INDEX 选项仅在指定 FILLFACTOR 时有用,因为 PAD_INDEX 使用 FILLFACTOR 指定的百分比。如果为 FILLFACTOR 指定的百分比不足以允许一行,则数据库引擎在内部覆盖百分比以允许最小值。无论fillfactor 的值有多低,中间索引页上的行数永远不会少于2。

In backward compatible syntax, WITH PAD_INDEX is equivalent to WITH PAD_INDEX = ON.

在向后兼容的语法中,WITH PAD_INDEX 等效于 WITH PAD_INDEX = ON。

回答by bielawski

This is actually a highly complex subject. Turning on PAD_INDEX can have dramatic effectson read performance and memory pressure in large tables. The larger the table the bigger the effect. As a rule I'd say you want to leave it off unless you fall into some NOT UNCOMMON categories. Then, follow this advice carefully. As I show in the example case below, adjusting FILLFACTOR when PAD_INDEX is ON can have an exponential effect that needs to be carefully balanced.

这实际上是一个非常复杂的课题。 打开 PAD_INDEX 会对大表中的读取性能和内存压力产生巨大影响。桌子越大,效果就越大。作为一项规则,我会说除非你属于一些非罕见的类别,否则你想放弃它。然后,请仔细遵循此建议。正如我在下面的示例案例中所示,在 PAD_INDEX 为 ON 时调整 FILLFACTOR 可能会产生需要仔细平衡的指数效应。

  1. PAD_INDEX ALWAYS has a detrimental effect on reads! The lower your FILLFACTOR the bigger the effect so you need to pay close attention to the value of FILLFACTOR when you turn it on. On large tables you essentially stop thinking about FILLFACTOR in terms of reducing leaf splits and start thinking about its effect on intermediate bloat vs intermediate splits.
  2. PAD_INDEX rarely has a useful effect on indexes with less than 100,000 rows and NEVER has a positive effect on indexes covering identity or insert-time type columns were inserts are always to the end of the table.
  3. From the above you should see that if you turn PAD_INDEX on you must carefully balance the negative effects with the positive.
  1. PAD_INDEX 总是对读取产生不利影响!您的 FILLFACTOR 越低,效果越大,因此您在打开它时需要密切注意 FILLFACTOR 的值。在大型表上,您基本上不再考虑 FILLFACTOR 减少叶分裂,而是开始考虑它对中间膨胀与中间分裂的影响
  2. PAD_INDEX 很少对少于 100,000 行的索引产生有用的影响,并且永远不会对覆盖标识或插入时间类型列的索引产生积极影响,因为插入总是在表的末尾。
  3. 从上面你应该看到,如果你打开 PAD_INDEX,你必须小心地平衡负面影响和正面影响。

Rules of thumb: PAD_INDEX is rarely useful on non-clustered indexes unless they are quite wide, on clustered indexes of very narrow tables, or on tables that have less than 100K rows unless inserts are highly clustered and even then it can be questionable.

经验法则: PAD_INDEX 在非聚集索引上很少有用,除非它们非常宽,在非常窄的表的聚集索引上,或者在行数少于 100K 的表上,除非插入是高度聚集的,即使这样它也可能有问题。

You MUST understand how it works:When you insert into an index the row must fit into the the leaf block that contains the appropriate range of keys. Clustered indexes typically have much wider rows than non-clustered indexes and so their leaf blocks hold fewer rows. FillFactor creates space for new rows in the leaf but in the case of very wide rows or a large volume of inserts that are clustered together rather than evenly distributed it's often impractical or impossible to create enough slack (1-pct fill) to prevent splits.

您必须了解它是如何工作的:当您插入索引时,该行必须适合包含适当键范围的叶块。聚簇索引的行通常比非聚簇索引宽得多,因此它们的叶块包含更少的行。FillFactor 为叶中的新行创建空间,但在非常宽的行或大量插入物聚集在一起而不是均匀分布的情况下,通常不切实际或不可能创建足够的松弛(1-pct 填充)以防止分裂。

When a split occurs a new intermediate row is created to point to the new block and that row must fit into its appropriate block. If that intermediate block is full it must first be split. Splits can run all the way down to the root if you are particularly unlucky. When the root splits you end up creating a new index level.

当拆分发生时,会创建一个新的中间行以指向新块,并且该行必须适合其适当的块。如果该中间块已满,则必须首先对其进行拆分。如果您特别不走运,拆分可以一直运行到根。当根分裂时,您最终会创建一个新的索引级别。

The point of PAD_INDEX is to force a minimum amount of free space in your intermediate level blocks.

PAD_INDEX 的要点是在中间级别块中强制最小量的可用空间。

After a rebuild there may be little or no space at the lower levels. So you can have massive splitting of your intermediates all over the place if you have lots of leaf splits and PAD_INDEX isn't turned on!

重建后,较低级别的空间可能很少或没有。因此,如果您有很多叶子分割并且未打开 PAD_INDEX,您可以在整个地方进行大量的中间体分割!

Mostly though, splits can be managed with FILLFACTOR. The bigger split problems happen with insert patterns that virtually guarantee you won't have enough free space and turning PAD_INDEX on then helps alleviate this by providing space at deeper levels so when a split does occur you are less likely to incur lots of multilevel splits.

不过,大多数情况下,可以使用 FILLFACTOR 管理拆分。更大的拆分问题发生在插入模式中,它实际上保证您没有足够的可用空间,打开 PAD_INDEX 然后通过在更深层次提供空间来帮助缓解这种情况,因此当拆分确实发生时,您不太可能发生大量多级拆分。

Example Case

示例案例

I have a customer table with 100K rows. On any given day about 5% of my customers will be active. I have a table that records activity by customer by time. On average a customer performs 20 actions and the description takes, on average, 1K. So I collect 100MB of data and lets assume I've got a year already in the table - so 36GB.

我有一个包含 100K 行的客户表。在任何一天,大约有 5% 的客户是活跃的。我有一个表,按时间记录客户的活动。一个客户平均执行 20 个操作,而描述平均需要 1K。所以我收集了 100MB 的数据,并假设我已经有一年了 - 所以 36GB。

The table has inserts of 1Kb rows with customer_number and insert_time (in that order) for key columns. Clearly the average customer will split an 8K leaf block several times while inserting their expected 20 rows because each row will insert immediately after the preceding row in the same block until it splits and splits and splits (makes one consider a heap with only non-clustered indexes...). If the intermediate block pointing to the appropriate leaf doesn't have enough room for at least 4 rows (in reality probably 8 but...) the intermediate will need to split. Given this example's key takes up 22 bytes, an intermediate block can hold 367 entries. This means I need 6% free space in my intermediate block or a fill of 94% to hold the 4 entries.

该表插入了 1Kb 行的 customer_number 和 insert_time(按该顺序)用于键列。显然,普通客户在插入他们预期的 20 行时会多次拆分 8K 叶块,因为每行将在同一块中的前一行之后立即插入,直到它分裂、分裂和分裂(使人们考虑一个只有非集群的堆索引...)。如果指向适当叶子的中间块没有足够的空间容纳至少 4 行(实际上可能是 8 行,但是……)中间块将需要拆分。鉴于此示例的密钥占用 22 个字节,一个中间块可以容纳 367 个条目。这意味着我的中间块需要 6% 的可用空间或 94% 的填充来保存 4 个条目。

Notice that even a 1% FILLFACTOR won't stop leaf block splits since a block can only hold 8 rows. Setting FILLFACTOR TO 80% will only allow 1 row to be added before the leaf splits but will inject over 800 bytes of free space per intermediate block if PAD_INDEX is on. That's ~800 empty bytes for EVERY intermediate block when I only need 88.

请注意,即使是 1% 的 FILLFACTOR 也不会阻止叶块分裂,因为一个块只能容纳 8 行。将 FILLFACTOR 设置为 80% 将只允许在叶拆分之前添加 1 行,但如果 PAD_INDEX 处于打开状态,则会为每个中间块注入超过 800 字节的可用空间。当我只需要 88 个时,每个中间块大约有 800 个空字节。

This is really important!: So if I have 36M rows already in the table, using 80% means 294 rows per intermediate block, meaning 122K blocks, meaning I've injected 98MB into my intermediate block structure when 94% lets 345 rows fit per block so there are only 104K intermediate blocks (yes I'm leaving out the lower levels for simplicity). Adding 88 bytes to each of 104K blocks adds only 9.2MB as opposed to 98MB.

这真的很重要!:因此,如果表中已经有 36M 行,使用 80% 意味着每个中间块有 294 行,这意味着 122K 块,这意味着当 94% 让每个块适合 345 行时,我已经将 98MB 注入到我的中间块结构中,因此只有104K 中间块(是的,为了简单起见,我省略了较低级别)。向 104K 块中的每一个添加 88 个字节仅增加 9.2MB,而不是 98MB。

Now consider that only 5% of my customers did anything. Some did more than 20 things and some less so some blocks split anyway and since only 275KB were actually needed to hold the day's index rows (100k/8*22), the best case is that only 8.9MB of my 9.2MB were dead air. If split prevention is important it's well worth 9mb however I'd be thinking harder about 98mb.

现在想想只有 5% 的客户做了任何事情。有些做了 20 多件事,有些则更少,所以无论如何有些块会分裂,因为实际上只需要 275KB 来保存当天的索引行(100k/8*22),最好的情况是我的 9.2MB 中只有 8.9MB 是死空的. 如果防止分裂很重要,那么 9mb 就很值了,但是我会更仔细地考虑 98mb。

So by turning PAD_INDEX on I should be giving up on controlling leaf splits entirely and turning to controlling intermediate splits.

因此,通过打开 PAD_INDEX,我应该完全放弃控制叶分裂并转向控制中间分裂。

DON'T bother worrying about anything but the first intermediate level! There is a butterfly effect induced by any clustering (in this case clustering of customer_number) that will throw any calculation you make out the window. Unless your inserts are perfectly uniform your margin of error in finding the right number to balance bloat with splits is typically far bigger than the effect of the lower level block space.

除了第一个中级,别担心任何事情!任何聚类(在这种情况下是 customer_number 的聚类)都会引起蝴蝶效应,它会将您所做的任何计算抛到窗外。除非您的插入完全一致,否则您在找到合适的数字以平衡膨胀与拆分的误差范围通常远大于较低级别块空间的影响。

回答by Peter_K

@bielawski You describe only cases when PAD_INDEX=ON and FILLFACTOR is between 1 to 99. What you're thinking about set PAD_INDEX=ON and FILLFACTOR=0 or 100 in case I insert ordered rows, which always be newer then previous one.

@bielawski 你只描述了 PAD_INDEX=ON 和 FILLFACTOR 介于 1 到 99 之间的情况。你在想什么设置 PAD_INDEX=ON 和 FILLFACTOR=0 或 100 以防我插入有序行,这些行总是比前一个新。

CREATE CLUSTERED INDEX [IX_z_arch_export_dzienny_pre] ON [dbo].[z_arch_export_daily_pre]
(
    [Date] ASC,
    [Object Code] ASC,
    [From date] ASC,
    [Person_role] ASC,
    [Departure] ASC,
    [Room code] ASC,
    [period_7_14] ASC
)WITH (PAD_INDEX = ON, FILLFACTOR=100)


insert into z_arch_export_daily_pre
select * from export_daily_pre
order by [Date] ASC,[Object Code] ASC,[From date] ASC,[Person_role] ASC,[Departure] ASC,[Room code] ASC,[period_7_14] ASC

I have 100% assurance that all new rows will be inserted "at the end" of index, and only with this options (PAD_INDEX = ON, FILLFACTOR=100) I could achieve 0.01% of fragmentation index after insert. Is something dangerous with this settings with that assumptions?

我 100% 保证所有新行都将插入索引的“末尾”,只有使用此选项 (PAD_INDEX = ON, FILLFACTOR=100) 我才能在插入后实现 0.01% 的碎片索引。使用该假设进行此设置是否有危险?