oracle 超过 1 列的 B 树索引是什么样的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1648217/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 19:19:01  来源:igfitidea点击:

what does a B-tree index on more than 1 column look like?

sql-serverdatabaseoracleindexing

提问by Lawrence

So I was reading up on indexes and their implementation, and I stumbled upon this website that has a brief explanation of b-tree indexes:

所以我正在阅读索引及其实现,我偶然发现了这个网站,它对 b-tree 索引进行了简要说明:

http://20bits.com/articles/interview-questions-database-indexes/

http://20bits.com/articles/interview-questions-database-indexes/

The b-tree index makes perfect sense for indexes that are only on a single column, but let's say I create an index with multiple columns, how then does the b-tree work? What is the value of each node in the b-tree?

b-tree 索引对于仅在单列上的索引非常有意义,但是假设我创建了一个包含多个列的索引,那么 b-tree 是如何工作的?b树中每个节点的值是多少?

For example, if I have this table:

例如,如果我有这张表:

table customer:
id    number
name   varchar
phone_number   varchar
city   varchar

and I create an index on: (id, name, city)

我创建了一个索引:(id, name, city)

and then run the following query:

然后运行以下查询:

SELECT id, name 
  FROM customer
 WHERE city = 'My City';

how does this query utilize the multiple column index, or does it not utilize it unless the index is created as (city, id, name) or (city, name, id) instead?

此查询如何使用多列索引,或者除非将索引创建为 (city, id, name) 或 (city, name, id) 否则不使用它?

采纳答案by John Machin

Imagine that the key is represented by a Python tuple (col1, col2, col3) ... the indexing operation involves comparing tuple_awith tuple_b... if you have don't know which value of col1 and col2 that you are interested in, but only col3, then it would have to read the whole index ("full index scan"), which is not as efficient.

想象一下,键由一个 Python 元组 (col1, col2, col3) 表示……索引操作涉及tuple_atuple_b……进行比较,如果您不知道您对 col1 和 col2 的哪个值感兴趣,但只有col3,那么它必须读取整个索引(“完整索引扫描”),效率不高。

If you have an index on (col1, col2, col3), then you can expect that any RDBMS will use the index (in a direct manner) when the WHERE clause contains reference to (1) all 3 columns (2) both col1 and col2 (3) only col1.

如果在 (col1, col2, col3) 上有索引,那么当 WHERE 子句包含对 (1) 所有 3 列 (2) col1 和col2 (3) 只有 col1。

Otherwise (e.g. only col3 in the WHERE clause), either the RDBMS will not use that index at all (e.g. SQLite), or will do a full index scan (e.g. Oracle) [if no other index is better].

否则(例如在 WHERE 子句中只有 col3),RDBMS 将根本不使用该索引(例如 SQLite),或者将执行完整索引扫描(例如 Oracle)[如果没有其他索引更好]。

In your specific example, presuming that id is a unique identifier of a customer, it is pointless to have it appear in an index (other than the index that your DBMS should set up for a primary key or column noted as UNIQUE).

在您的具体示例中,假设 id 是客户的唯一标识符,让它出现在索引中是没有意义的(除了您的 DBMS 应该为主键或标记为 UNIQUE 的列设置的索引)。

回答by mjv

With most implementations, the key is simply a longer key that includes all of the key values, with a separator. No magic there ;-)

对于大多数实现,键只是一个较长的键,包括所有键值,带有一个分隔符。那里没有魔法;-)

In your example the key values could look something like

在您的示例中,键值可能类似于

"123499|John Doe|Conway, NH"
"32144|Bill Gates| Seattle, WA"

One of the characteristics of these indexes with composite keys is that the intermediate tree nodes can be used in some cases to "cover" the query.

这些具有复合键的索引的特征之一是在某些情况下可以使用中间树节点来“覆盖”查询。

For example, if the query is to find the Name and City given the ID, since the ID is first in the index, the index can search by this efficiently. Once in the intermediate node, it can "parse" the Name and City, from the key, and doesn't need to go to the leaf node to read the same.

例如,如果查询要查找给定 ID 的 Name 和 City,由于 ID 是索引中的第一个,因此索引可以有效地进行搜索。一旦进入中间节点,它就可以从键中“解析”出 Name 和 City,并且不需要去叶子节点读取相同的内容。

If however the query wanted also to display the phone number, then the logic would follow down the leaf when the full record is found.

然而,如果查询还想显示电话号码,那么当找到完整记录时,逻辑将沿着叶子向下。

回答by richardtallent

Some implementations simply concatenate the values in the order of the columns, with delimiters.

一些实现只是按照列的顺序使用分隔符连接值。

Another solution is to simply have a b-tree within a b-tree. When you hit a leaf on the first column, you get both a list of matching records and a mini b-tree of the next column, and so on. Thus, the order of the columns specified in the index makes a huge difference on whether that index will be useful for particular queries.

另一种解决方案是在 b 树中简单地有一个 b 树。当您点击第一列上的叶子时,您将获得匹配记录的列表和下一列的迷你 b 树,依此类推。因此,索引中指定的列的顺序对该索引是否对特定查询有用有很大的不同。

Here's a related question I wrote last week:

这是我上周写的一个相关问题:

Does SQL Server jump leaves when using a composite clustered index?

使用复合聚集索引时SQL Server会跳叶吗?

回答by David Aldridge

In Oracle a composite key index can be used even though the leading columns are not filtered. This is done through three mechanisms:

在 Oracle 中,即使未过滤前导列,也可以使用组合键索引。这是通过三种机制完成的:

  1. A fast full index scan, in which multiblock reads are used to traverse the entire index segment.
  2. An index full scan, in which the index is read in the logical order of the blocks (I believe I read that in recent versions Oracle can use multiblock reads for this, but really you should count on single block reads)
  3. An inddex skip scan, where a very low cardinality for the non-predicated leading columns allows Oracle to perform multiple index range scans, one for each unique value of the leading column(s). These are pretty rare in my experience.
  1. 快速全索引扫描,其中使用多块读取来遍历整个索引段。
  2. 索引全扫描,其中索引按块的逻辑顺序读取(我相信我读到在最近的版本中 Oracle 可以为此使用多块读取,但实际上您应该指望单块读取)
  3. 索引跳过扫描,其中非谓词前导列的基数非常低,允许 Oracle 执行多个索引范围扫描,每个前导列的唯一值扫描一次。这些在我的经验中非常罕见。

Look for articles by Richard Foote or Jonathan Lewis for more information on Oracle index internals.

查找 Richard Foote 或 Jonathan Lewis 的文章,了解有关 Oracle 索引内部结构的更多信息。

回答by Tim Sylvester

Other than the "composite key" mechanism already described, one possibility is a kdtreewhich works like a binary tree, but as you traverse each level you cycle through kdimensions. That is, the first level of the tree separates the first dimension into two parts, the second level splits the second dimension, the k+1th level splits the first dimension again, etc.. This allows for efficient partitioning of data in any number of dimensions. This approach is common in "spatial" databases (e.g., Oracle Spatial, PostGIS, etc.), but probably not as useful in "regular" multi-indexed tables.

除了已经描述的“复合键”机制之外,一种可能性是kdtree像二叉树一样工作,但是当您遍历每个级别时,您会在k维度中循环。也就是说,树的第一层将第一维分为两部分,第二层将第二维拆分,第k+11 层再次拆分第一维,依此类推。这允许在任意数量的维度中对数据进行有效分区。这种方法在“空间”数据库(例如,Oracle Spatial、PostGIS 等)中很常见,但在“常规”多索引表中可能没有那么有用。

http://en.wikipedia.org/wiki/Kd-tree

http://en.wikipedia.org/wiki/Kd-tree

回答by James Anderson

It can use the (id,name,city) index to satisfy a "City = ? " predicate, but very very inefficently.

它可以使用 (id,name,city) 索引来满足“City = ?”谓词,但效率非常低。

In order to use the index to satisfy this query it would need to walk most of tree structure looking for entries with the desired city. This is still probably an order of magnatude faster than scanning the table!

为了使用索引来满足此查询,它需要遍历大部分树结构以查找具有所需城市的条目。这仍然可能比扫描表格快一个数量级!

An index of (city,name,id) would be the best index for your query. It would find all the desired city entries easily and would not need to access the underlying table to get the id and name values.

(city,name,id) 的索引将是您查询的最佳索引。它会很容易地找到所有想要的城市条目,并且不需要访问基础表来获取 id 和 name 值。