database 字符串列上的 postgresql 索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16924323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 08:58:24  来源:igfitidea点击:

postgresql index on string column

databasepostgresqlindexing

提问by Hao

Say, I have a table ResidentInfo, and in this table I have unique constraints HomeAddress, which is VARCHARtype. For future query, I gonna add an index on this column. The query will only have operation =, and I'll use B-TREE pattern since the Hash pattern is not recommended currently.

说,我有一个 table ResidentInfo,在这个 table 中我有唯一的约束HomeAddress,它是VARCHAR类型。对于将来的查询,我将在此列上添加一个索引。查询将只有 operation =,我将使用 B-TREE 模式,因为目前不推荐使用 Hash 模式。

Question: From efficiency view, using B-TREE, do you think I should add a new column with numbers 1,2,3....,N corresponding to different homeaddress, and instead of adding index on HomeAddress, I should add index on the number column?

问题:从效率来看,使用B-TREE,你认为我应该添加一个新的列,数字1,2,3....,N对应不同的homeaddress,而不是在上添加索引HomeAddress,我应该在上添加索引数字栏?

I ask this question because I don't know how index works.

我问这个问题是因为我不知道索引是如何工作的。

回答by Erwin Brandstetter

For simple equality checks (=), a B-Tree index on a varcharor textcolumn is simple and the best choice. It certainly helps performance a lot.

对于简单的相等性检查 ( =), avarchartext列上的 B 树索引很简单,也是最好的选择。它当然对性能有很大帮助。

Of course, a B-Tree index on a simple integerperforms better. For starters, comparing simple integervalues is a bit faster. But more importantly, performance is also a function of the size of the index. A bigger column means fewer rows per data page, means more pages have to be read ...

当然,简单的 B 树索引integer性能更好。首先,比较简单的integer值要快一些。但更重要的是,性能也是索引大小的函数。更大的列意味着每个数据页的行数更少,意味着必须读取更多的页面......

Since the HomeAddressis hardly unique anyway, it's not a good natural primary key. I would strongly suggest to use a surrogate primary keyinstead. A serialcolumnis the obvious choice for that. Its only purpose is to have a simple, fast primary key to work with.

因为HomeAddress无论如何它都不是唯一的,所以它不是一个好的自然主键。我强烈建议改用代理主键。一serial是该显而易见的选择。它的唯一目的是使用一个简单、快速的主键。

If you have other tables referencing said table, this becomes even more efficient. Instead of duplicating a lengthy string for the foreign key column, you only need the 4 bytes for an integer column. And you don't need to cascade updates so much, since an address is bound to change, while a surrogate pk can stay the same (but doesn't have to, of course).

如果您有其他表引用所述表,这将变得更加有效。不需要为外键列复制一个冗长的字符串,您只需要 4 个字节的整数列。而且你不需要级联更新这么多,因为地址肯定会改变,而代理 pk 可以保持不变(但当然不必)。

Your table could look like this:

您的表格可能如下所示:

CREATE TABLE resident (
   resident_id serial PRIMARY KEY
  ,address text NOT NULL
   -- more columns
);

CREATE INDEX resident_adr_idx ON resident(address);

This results in two B-Tree indexes. A unique index on resident_idand a plain index on address.

这导致两个 B 树索引。上的唯一索引resident_id和 上的普通索引address

More about indexes in the manual.
Postgres offers a lot of options - but you don't need any more for this simple case.

更多关于手册中的索引
Postgres 提供了很多选项 - 但是对于这个简单的案例,您不需要更多选项。

回答by Denis de Bernardy

In Postgres, a unique constraint is enforced by maintaining a unique index on the field, so you're covered already.

在 Postgres 中,通过在字段上维护唯一索引来强制执行唯一约束,因此您已经了解了。

In the event you decide the unique constraint on the address is bad (which, honestly, it is: what a spouse creating a separate account? about flatshares? etc.), you can create one like so:

如果您认为地址的唯一约束不好(老实说,这是:配偶创建一个单独的帐户?关于公寓?等),您可以创建一个这样的:

create index on ResidentInfo (HomeAddress);