Oracle:索引中的列顺序重要吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2196484/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 19:57:04  来源:igfitidea点击:

Oracle: does the column order matter in an index?

oracleoptimizationindexing

提问by Mark Harrison

An index on two columns can be created with either of the statements

可以使用任一语句创建两列的索引

create index foo_ix on foo(a,b);
create index foo_ix on foo(b,a);
  1. How does this affect the operational (runtime) characteristics of using the index?

  2. How does this affect the layout (physical) characteristics of the index?

  3. Are either (1) or (2) affected by the types/sizes of the columns?

  4. What are the best practices for creating multi-column indexes?

  1. 这如何影响使用索引的操作(运行时)特性?

  2. 这如何影响索引的布局(物理)特性?

  3. (1) 或 (2) 是否受列的类型/大小的影响?

  4. 创建多列索引的最佳实践是什么?

In short, does it matter which column I put first?

简而言之,我把哪一列放在最前面有关系吗?

回答by APC

  1. If aand bboth have 1000 distinct values and they are always queried together then the order of columns in the index doesn't really matter. But if ahas only 10 distinct values or you have queries which use just one of the columns then it does matter; in these scenarios the index may not be used if the column ordering does not suit the query.
  2. The column with the least distinct values ought to be first and the column with the most distinct values last. This not only maximises the utility of the index it also increases the potential gains from index compression.
  3. The datatype and length of the column have an impact on the return we can get from index compression but not on the best order of columns in an index.
  4. Arrange the columns with the least selective column first and the most selective column last. In the case of a tie lead with the column which is more likely to be used on its own.
  1. 如果ab都有 1000 个不同的值并且它们总是一起查询,那么索引中列的顺序并不重要。但是如果a只有 10 个不同的值,或者您有只使用其中一列的查询,那么这很重要;在这些情况下,如果列排序不适合查询,则可能不会使用索引。
  2. 具有最少不同值的列应该在最前面,具有最不同值的列应该在最后。这不仅最大化了索引的效用,还增加了索引压缩的潜在收益。
  3. 列的数据类型和长度会影响我们从索引压缩中获得的回报,但不会影响索引中列的最佳顺序。
  4. 将选择最少的列排在最前面,将选择最多的列排在最后。在与列有联系的情况下,更有可能单独使用。

The one potential exception to 2. and 3. is with DATE columns. Because Oracle DATE columns include a time element they might have 86400 distinct values per day. However most queries on a data column are usually only interested in the day element, so you might want to consider only the number of distinct days in your calculations. Although I suspect it won't affect the relative selectivity in but a handful of cases.

2. 和 3. 的一个潜在例外是 DATE 列。由于 Oracle DATE 列包含时间元素,因此它们每天可能有 86400 个不同的值。然而,对数据列的大多数查询通常只对 day 元素感兴趣,因此您可能只想在计算中考虑不同的天数。虽然我怀疑它不会影响但少数情况下的相对选择性。

edit (in response to Nick Pierpoint's comment)

编辑(回应 Nick Pierpoint 的评论)

The two main reasons for leading with the least selective column are

使用选择性最少的列领先的两个主要原因是

  1. Index compression
  2. Index Skip reads
  1. 索引压缩
  2. 索引跳过读取

Both these work their magic from knowing that the value in the current slot is the same as the value in the previous slot. Consequently we can maximize the return from these techniques by minimsing the number of times the value changes. In the following example, Ahas four distinct values and Bhas six. The dittos represent a compressible value or a skippable index block.

通过知道当前插槽中的值与前一个插槽中的值相同,这两种方法都发挥了作用。因此,我们可以通过最小化值变化的次数来最大化这些技术的回报。在下面的例子中,A有四个不同的值,B有六个。同上表示可压缩值或可跳过的索引块。

Least selective column leads ...

A          B
---------  -
AARDVARK   1
"          2
"          3
"          4
"          5
"          6
DIFFVAL    1
"          2
"          3
"          4
"          5
"          6
OTHERVAL   1
"          2
"          3
"          4
"          5
"          6
WHATEVER   1
"          2
"          3
"          4
"          5
"          6

Most selective column leads ...

大多数选择性列导致...

B  A
-  --------
1  AARDVARK
"  DIFFVAL
"  OTHERVAL
"  WHATEVER
2  AARDVARK
"  DIFFVAL
"  OTHERVAL
"  WHATEVER
3  AARDVARK
"  DIFFVAL
"  OTHERVAL
"  WHATEVER
4  AARDVARK
"  DIFFVAL
"  OTHERVAL
"  WHATEVER
5  AARDVARK
"  DIFFVAL
"  OTHERVAL
"  WHATEVER
6  AARDVARK
"  DIFFVAL
"  OTHERVAL
"  WHATEVER

Even in this trival example, (A, B)has 20 skippable slots compared to the 18 of (B, A). A wider disparity would generate greater ROI on index compression or better utility from Index Skip reads.

即使在这个简单的例子中,(A, B)也有 20 个可跳过的插槽,而(B, A). 更大的差异将产生更大的索引压缩投资回报率或更好的索引跳过读取效用。

As is the case with most tuning heuristics we need to benchmark using actual values and realistic volumes. This is definitely a scenario where data skew could have a dramatic impact of the effectiveness of different approaches.

与大多数调整启发式方法一样,我们需要使用实际值和实际体积进行基准测试。这绝对是一种数据倾斜可能会对不同方法的有效性产生巨大影响的场景。



"I think if you have a highly selective first index then - from a performance perspective - you'll do well to put it first."

“我认为,如果你有一个高度选择性的第一个索引,那么从性能的角度来看,你最好把它放在第一位。”

If we have a highly selective column then we should build it an index of its own. The additional benefits of avoiding a FILTER operation on a handful of rows is unlikely to be outweighed by the overhead of maintaining a composite index.

如果我们有一个高度选择性的列,那么我们应该为它建立一个自己的索引。避免对少数行进行 FILTER 操作的额外好处不太可能被维护复合索引的开销所抵消。

Multi-column indexes are most useful when we have:

当我们有以下情况时,多列索引最有用:

  • two or more columns of middling selectivity,
  • which are frequently used in the same query.
  • 两列或更多列中等选择性,
  • 在同一个查询中经常使用。

回答by DVr

But according to Oracle itself, it's better to put the column with the highest cardinality first:

但是根据Oracle本身的说法,最好将基数最高的列放在第一位:

http://docs.oracle.com/cd/B10500_01/server.920/a96533/data_acc.htm#2174

http://docs.oracle.com/cd/B10500_01/server.920/a96533/data_acc.htm#2174

Ordering Keys for Composite Indexes

复合索引的排序键

If all keys are used in WHERE clauses equally often, then ordering these keys from most selective to least selective in the CREATE INDEX statement best improves query performance.

如果所有键在 WHERE 子句中的使用频率相同,则在 CREATE INDEX 语句中将这些键从选择性最高到选择性最低的顺序最能提高查询性能。

回答by Eddie Awad

You may find answers to some of your questions here: Index Skip Scan – Does Index Column Order Matter Any More? (Warning Sign)

您可以在此处找到一些问题的答案:索引跳过扫描 – 索引列顺序是否更重要?(警告牌)

回答by andy.larsen

  1. if a column is used alone in a query Oracle will use a less efficient skip scan index access path if it is not the leading column in the index
  2. depends on the respective selectivity of the columns
  3. no
  4. I would look at the queries involving the columns in the index and rank their position in the index by most heavily queried
  1. 如果某列在查询中单独使用,如果它不是索引中的前导列,Oracle 将使用效率较低的跳过扫描索引访问路径
  2. 取决于各列的选择性
  3. 我会查看涉及索引中列的查询,并按最严重的查询对它们在索引中的位置进行排名