postgresql 唯一索引对列搜索性能更好吗?(PGSQL & MySQL)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1293499/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Are unique indexes better for column search performance? (PGSQL & MySQL)
提问by Alex Balashov
I am curious as to whether
我很好奇是否
CREATE INDEX idx ON tbl (columns);
vs.
对比
CREATE UNIQUE INDEX idx ON tbl (columns);
has a significant algorithmic performance benefit in PostgreSQL or MySQL implementations when scanning the indexed column(s), or whether the UNIQUE
keyword simply introduces a unique constraint alongside the index.
在扫描索引列时,或者UNIQUE
关键字是否简单地在索引旁边引入唯一约束时,在 PostgreSQL 或 MySQL 实现中具有显着的算法性能优势。
I imagine it is probably fair to say that there is a marginal benefit insofar as indexes are likely to be internally implemented as some sort of hash1-like structure, and collision handling by definition result in something other than O(1) performance. Given this premise, it is likely that if a large percentage of values are identical than the structure degenerates into something linear.
我想可以公平地说,就索引可能在内部实现为某种类似散列1的结构而言,有一个边际收益可能是公平的,并且根据定义的冲突处理会导致 O(1) 性能以外的其他东西。鉴于此前提,如果大部分值相同,则结构很可能退化为线性。
So, for purposes of my question, assume that the distribution of values is relativelydiscrete and uniform.
因此,就我的问题而言,假设值的分布相对离散和均匀。
Thanks in advance!
提前致谢!
1 Which is a matter of pure speculation for me, as I am not familiar with RDBM internals.
1 这对我来说纯属猜测,因为我不熟悉 RDBM 内部结构。
采纳答案by Quassnoi
If your data are unique, you should create a UNIQUE
index on them.
如果您的数据是唯一的,您应该为UNIQUE
它们创建一个索引。
This implies no additional overhead and affects optimizer's decisions in certain cases so that it can choose a better algorithm.
这意味着没有额外的开销并在某些情况下影响优化器的决策,以便它可以选择更好的算法。
In SQL Server
and in PostgreSQL
, for instance, if you sort on a UNIQUE
key, the optimizer ignores the ORDER BY
clauses used after that (since they are irrelevant), i. e. this query:
例如 inSQL Server
和 in PostgreSQL
,如果你对一个UNIQUE
键进行排序,优化器会忽略ORDER BY
之后使用的子句(因为它们不相关),即这个查询:
SELECT *
FROM mytable
ORDER BY
col_unique, other_col
LIMIT 10
will use an index on col_unique
and won't sort on other_col
because it's useless.
将使用索引col_unique
并且不会排序,other_col
因为它没用。
This query:
这个查询:
SELECT *
FROM mytable
WHERE mycol IN
(
SELECT othercol
FROM othertable
)
will also be converted into an INNER JOIN
(as opposed to a SEMI JOIN
) if there is a UNIQUE
index on othertable.othercol
.
如果 上有索引,也将被转换为 an INNER JOIN
(而不是 a SEMI JOIN
)。UNIQUE
othertable.othercol
An index always contains some kind of a pointer to the row (ctid
in PostgreSQL
, row pointer in MyISAM
, primary key/uniquifier in InnoDB
) and the leaves are ordered on these pointers, so in fact every index leaf is unique is some way (though it may not be obvious).
索引总是包含某种指向行的指针(ctid
in PostgreSQL
,行指针 in MyISAM
,主键/唯一符 in InnoDB
)并且叶子在这些指针上排序,因此实际上每个索引叶子在某种程度上都是唯一的(尽管它可能不是很明显)。
See this article in my blog for performance details:
有关性能详细信息,请参阅我博客中的这篇文章:
回答by Eric
There is a small penalty during update/insert operations for having the unique constraint. It has to search before the insert/update operation to make sure the uniqueness constraint isn't violated.
在更新/插入操作期间有一个小的惩罚,因为具有唯一约束。它必须在插入/更新操作之前进行搜索以确保不违反唯一性约束。
回答by Eric
Well, usually indexes are B-Trees, not hashes (there are hash based indexes, but the most common index (at least in PostgreSQL) is bases on B Tree).
嗯,通常索引是 B 树,而不是哈希(有基于哈希的索引,但最常见的索引(至少在 PostgreSQL 中)是基于 B 树的)。
As for speed - unique should be faster - when index scanning finds row with given value, it doesn't have to search if there are any other rows with this value, and can finish scanning imemdiately.
至于速度——unique 应该更快——当索引扫描找到具有给定值的行时,它不必搜索是否还有其他具有该值的行,并且可以立即完成扫描。