B-Tree 和 GiST 索引方法(在 PostgreSQL 中)有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/766488/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What's the difference between B-Tree and GiST index methods (in PostgreSQL)?
提问by Ash
I have been working on optimizing my Postgres databases recently, and traditionally, I've only ever use B-Tree indexes. However, I saw that GiST indexes suport non-unique, multicolumn indexes, in the Postgres 8.3 documentation.
我最近一直致力于优化我的 Postgres 数据库,传统上,我只使用过 B 树索引。但是,我在 Postgres 8.3 文档中看到 GiST 索引支持非唯一的多列索引。
I couldn't, however, see what the actual difference between them is. I was hoping that my fellow coders might beable to explain, what the pros and cons between them are, and more importantly, the reasons why I would use one over the other?
但是,我看不出它们之间的实际区别是什么。我希望我的编码人员能够解释他们之间的优缺点,更重要的是,为什么我会使用一个而不是另一个的原因?
回答by kquinn
In a nutshell: B-Tree indexes perform better, but GiST indexes are more flexible. Usually, you want B-Tree indexes if they'll work for your data type. There was a recent post on the PG lists about a huge performance hit for using GiST indexes; they're expected to be slower than B-Trees (such is the price of flexibility), but not thatmuch slower... work is, as you might expect, ongoing.
简而言之:B-Tree 索引性能更好,但 GiST 索引更灵活。通常,如果 B 树索引适用于您的数据类型,则您需要它们。最近在 PG 列表上有一篇关于使用 GiST 索引对性能造成巨大影响的帖子;他们预计将慢于B-树(例如是灵活的价格),但不能说慢得多......工作,正如你所预料的,持续的。
From a post by Tom Lane, a core PostgreSQL developer:
来自PostgreSQL 核心开发人员Tom Lane 的帖子:
The main point of GIST is to be able to index queries that simply are not indexable in btree. ... One would fully expect btree to beat out GIST for btree-indexable cases. I think the significant point here is that it's winning by a factor of a couple hundred; that's pretty awful, and might point to some implementation problem.
GIST 的要点是能够索引在 btree 中根本不可索引的查询。...人们会完全期望 btree 在 btree 可索引的情况下击败 GIST。我认为这里的重要一点是它以几百倍的优势获胜;这太糟糕了,可能会指向一些实现问题。
回答by Ash
Basically everybody's right - btree is default index as it performs very well. GiST are somewhat different beasts - it's more of a "framework to write index types" than a index type on its own. You have to add custom code (in server) to use it, but on the other hand - they are very flexible.
基本上每个人都是对的 - btree 是默认索引,因为它表现得很好。GiST 是有些不同的野兽 - 它更像是一个“编写索引类型的框架”,而不是单独的索引类型。您必须添加自定义代码(在服务器中)才能使用它,但另一方面 - 它们非常灵活。
Generally - you don't use GiST unless the datatype you're using tell you to do so. Example of datatypes that use GiST: ltree (from contrib), tsvector (contrib/tsearch till 8.2, in core since 8.3), and others.
通常 - 你不使用 GiST,除非你使用的数据类型告诉你这样做。使用 GiST 的数据类型示例:ltree(来自 contrib)、tsvector(contrib/tsearch 直到 8.2,从 8.3 开始在核心中)等。
There is well known, and pretty fast geographic extenstion to PostgreSQL - PostGIS (http://postgis.refractions.net/) which uses GiST for its purposes.
PostgreSQL 有一个众所周知的、非常快速的地理扩展 - PostGIS ( http://postgis.refractions.net/),它使用 GiST 来实现它的目的。
回答by Dana the Sane
GiST indexes are lossy to an extent, meaning that the DBMS has to deal with false positives/negatives, i.e.:
GiST 索引在某种程度上是有损的,这意味着 DBMS 必须处理误报/否定,即:
GiST indexes are lossy because each document is represented in the index by a fixed- length signature. The signature is generated by hashing each word into a random bit in an n-bit string, with all these bits OR-ed together to produce an n-bit document signature. When two words hash to the same bit position there will be a false match. If all words in the query have matches (real or false) then the table row must be retrieved to see if the match is correct. b-trees do not have this behavior, so depending on the data being indexed, there may be some performance difference between the two.
GiST 索引是有损的,因为每个文档在索引中都由一个固定长度的签名表示。签名是通过将每个单词散列到一个 n 位字符串中的一个随机位中来生成的,所有这些位一起进行 OR 运算以生成一个 n 位文档签名。当两个单词散列到相同的位位置时,就会出现错误匹配。如果查询中的所有单词都匹配(真或假),则必须检索表行以查看匹配是否正确。b-trees 没有这种行为,因此根据被索引的数据,两者之间可能存在一些性能差异。
See for text search behavior http://www.postgresql.org/docs/8.3/static/textsearch-indexes.htmland http://www.postgresql.org/docs/8.3/static/indexes-types.htmlfor a general purpose comparison.
请参阅文本搜索行为http://www.postgresql.org/docs/8.3/static/textsearch-indexes.html和http://www.postgresql.org/docs/8.3/static/indexes-types.html了通用比较。
回答by Pablo Santa Cruz
GiST are more general indexes. You can use them for broader purposes that the ones you would use with B-Tree. Including the ability to build a B-Tree using GiST.
GiST 是更通用的索引。您可以将它们用于更广泛的用途,而不是与 B-Tree 一起使用的用途。包括使用 GiST 构建 B 树的能力。
I.E.: you can use GiST to index on geographical points, or geographical areas, something you won't be able to do with B-Tree indexes, since the only thing that matter on a B-Tree is the key (or keys) you are indexing on.
IE:您可以使用 GiST 对地理点或地理区域进行索引,这是 B 树索引无法做到的,因为 B 树上唯一重要的是您的键(或键)正在索引。