postgresql varchar 上的 SQL 索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2632347/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-20 00:06:07  来源:igfitidea点击:

SQL indexing on varchar

sqlperformancepostgresqlindexing

提问by alex

I have a table whose columns are varchar(50)and a float. I need to (very quickly) look get the float associated with a given string. Even with indexing, this is rather slow.

我有一个表,其列是varchar(50)和 a float。我需要(非常快地)查看与给定字符串关联的浮点数。即使使用索引,这也相当慢。

I know, however, that each string is associated with an integer, which I know at the time of lookup, so that each string maps to a unique integer, but each integer does not map to a unique string. One might think of it as a tree structure.

然而,我知道每个字符串都与一个整数相关联,我在查找时知道这个整数,因此每个字符串都映射到一个唯一的整数,但每个整数都没有映射到一个唯一的字符串。人们可能会认为它是一种树结构。

Is there anything to be gained by adding this integer to the table, indexing on it, and using a query like:

通过将此整数添加到表中,对其进行索引并使用如下查询,是否可以获得任何好处:

SELECT floatval FROM mytable WHERE phrase=givenstring AND assoc=givenint

This is Postgres, and if you could not tell, I have very little experience with databases.

这是 Postgres,如果你不知道,我对数据库的经验很少。

回答by Quassnoi

Keys on VARCHARcolumns can be very long which results in less records per page and more depth (more levels in the B-Tree). Longer indexes also increase the cache miss ratio.

VARCHAR列上的键可能很长,这会导致每页记录更少,深度更多( 中的级别更多B-Tree)。更长的索引也会增加缓存未命中率。

How many strings in average map to each integer?

平均有多少字符串映射到每个整数?

If there are relatively few, you can create an index only on integer column and PostgreSQLwill do the fine filtering on records:

如果数量相对较少,您可以仅在整数列上创建索引,PostgreSQL并对记录进行精细过滤:

CREATE INDEX ix_mytable_assoc ON mytable (assoc);

SELECT  floatval
FROM    mytable
WHERE   assoc = givenint
        AND phrase = givenstring

You can also consider creating the index on the string hashes:

您还可以考虑在字符串哈希上创建索引:

CREATE INDEX ix_mytable_md5 ON mytable (DECODE(MD5(phrase), 'HEX'));

SELECT  floatval
FROM    mytable
WHERE   DECODE(MD5(phrase), 'HEX') = DECODE(MD5('givenstring'), 'HEX')
        AND phrase = givenstring -- who knows when do we get a collision?

Each hash is only 16bytes long, so the index keys will be much shorter while still preserving the selectiveness almost perfectly.

每个散列只有16字节长,所以索引键会更短,同时仍然几乎完美地保留了选择性。

回答by Tometzky

I'd recommend simply a hash index:

我只推荐一个哈希索引:

create index mytable_phrase_idx on mytable using hash(phrase);

This way queries like

这样查询

select floatval from mytable where phrase='foo bar';

will be very quick. Test this:

会很快。测试这个:

create temporary table test ( k varchar(50), v float);
insert into test (k, v) select 'foo bar number '||generate_series(1,1000000), 1;
create index test_k_idx on test using hash (k);
analyze test;
explain analyze select v from test where k='foo bar number 634652';
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Index Scan using test_k_idx on test  (cost=0.00..8.45 rows=1 width=8) (actual time=0.201..0.206 rows=1 loops=1)
   Index Cond: ((k)::text = 'foo bar number 634652'::text)
 Total runtime: 0.265 ms
(3 rows)

回答by Magnus Hagander

Short answer: yes, there will be much to gain. At least as long as you don't have many updates, but it's quite likely that the overhead even there will not be noticable.

简短的回答:是的,会有很多收获。至少只要你没有很多更新,但很可能即使有开销也不会引起注意。

回答by J?rn Schou-Rode

By declaring an index on (phrase, assoc, floatval)you will get a "covering index", which allows the query posted in the question to performed without even accessing the table. Assuming that either phraseor assocalone is highly selective (not many rows share the same value for the field), creating an index on that field alone should yield almost the same performance.

通过声明索引,(phrase, assoc, floatval)您将获得一个“覆盖索引”,它允许在问题中发布的查询甚至无需访问表即可执行。假设任一phraseassoc单独是高度选择性的(没有多少行共享相同的字段值),单独在该字段上创建索引应该产生几乎相同的性能。

Generally, you will want to limit the number of indexes to the smallest set that gets your frequent queries up to the desired performance. For each index you add to a table, you pay some disk space, but more importantly you pay the price of having the DBMS do more work on each INSERTinto the table.

通常,您希望将索引的数量限制为最小的集合,以使您的频繁查询达到所需的性能。对于您添加到表中的每个索引,您需要支付一些磁盘空间,但更重要的是,您需要支付让 DBMS 对每个INSERT表中的每个索引进行更多工作的代价。

回答by Cade Roux

It couldn't hurt to try adding the int and making your index on int, varchar and include float - this would be covering and pretty efficient - not sure if Postgres has included columns - if it doesn't simply add it to the index itself.

尝试添加 int 并在 int、varchar 和包含 float 上创建索引不会有什么坏处 - 这将覆盖并且非常有效 - 不确定 Postgres 是否包含列 - 如果它不简单地将它添加到索引本身.

There are several other techniques you could look into (I'm not familiar with all Postgres features, so I'll give them by SQL Server name):

您还可以研究其他几种技术(我不熟悉所有 Postgres 功能,因此我将按 SQL Server 名称提供它们):

Indexed views - you can effectively materialize a view which joins several tables - so you could join your varchar to your int and have your index on int and varchar and float

索引视图 - 您可以有效地实现一个连接多个表的视图 - 这样您就可以将 varchar 连接到 int 并在 int 和 varchar 和 float 上设置索引

Included columns - you can include columns in an index to ensure that the index is covering - i.e. have an index on varchar include (float) - if your index isn't covering, the query optimizer is still going to have to use the index and then do a bookmark lookup to get the remaining data.

包含的列 - 您可以在索引中包含列以确保索引覆盖 - 即在 varchar include (float) 上有一个索引 - 如果您的索引没有覆盖,查询优化器仍然必须使用该索引并且然后进行书签查找以获取剩余数据。