postgresql 中多列上的多个索引与单个索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39297221/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Multiple indexes vs single index on multiple columns in postgresql
提问by let_there_be_light
I could not reach any conclusive answers reading some of the existing posts on this topic.
阅读有关此主题的一些现有帖子,我无法得出任何结论性的答案。
I have certain data at 100 locations the for past 10 years. The table has about 800 million rows. I need to primarily generate yearly statistics for each location. Some times I need to generate monthly variation statistics and hourly variation statistics as well. I'm wondering if I should generate two indexes - one for location and another for year or generate one index on both location and year. My primary key currently is a serial number (Probably I could use location and timestamp as the primary key).
我在过去 10 年里有 100 个地点的某些数据。该表有大约 8 亿行。我需要主要为每个位置生成年度统计数据。有时我还需要生成每月变化统计数据和每小时变化统计数据。我想知道我是否应该生成两个索引 - 一个用于位置,另一个用于年份,或者在位置和年份上生成一个索引。我的主键目前是一个序列号(可能我可以使用位置和时间戳作为主键)。
Thanks.
谢谢。
回答by Ildar Musin
Regardless of how many indices have you created on relation, only one of them will be used in a certain query (which one depends on query, statistics etc). So in your case you wouldn't get a cumulative advantage from creating two single column indices. To get most performance from index I would suggest to use composite index on (location, timestamp).
无论您在关系上创建了多少个索引,在某个查询中只会使用其中一个(哪个取决于查询、统计信息等)。因此,在您的情况下,您不会从创建两个单列索引中获得累积优势。为了从索引中获得最大性能,我建议在(位置,时间戳)上使用复合索引。
Note, that queries like ... WHERE timestamp BETWEEN smth AND smth
will not use the index above while queries like ... WHERE location = 'smth'
or ... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth
will. It's because the first attribute in index is crucial for searching and sorting.
请注意,... WHERE timestamp BETWEEN smth AND smth
当查询 like... WHERE location = 'smth'
或... WHERE location = 'smth' AND timestamp BETWEEN smth AND smth
will时,like 查询不会使用上面的索引。这是因为索引中的第一个属性对于搜索和排序至关重要。
Don't forget to perform
别忘了表演
ANALYZE;
after index creation in order to collect statistics.
创建索引后,以收集统计信息。
Update:As @MondKinmentioned in comments certain queries can actually use several indexes on the same relation. For example, query with OR
clauses like a = 123 OR b = 456
(assuming that there are indexes for both columns). In this case postgres would perform bitmap index scans for both indexes, build a union of resulting bitmaps and use it for bitmap heap scan. In certain conditions the same scheme may be used for AND
queries but instead of union there would be an intersection.
更新:正如@MondKin在评论中提到的,某些查询实际上可以在同一关系上使用多个索引。例如,使用OR
like 子句进行查询a = 123 OR b = 456
(假设两列都有索引)。在这种情况下,postgres 将对两个索引执行位图索引扫描,构建结果位图的联合并将其用于位图堆扫描。在某些情况下,相同的方案可用于AND
查询,但会出现交集而不是联合。
回答by MondKin
There is no rule of thumb for situations like these, I suggest you experiment in a copy of your production DB to see what works best for you: a single multi-column index or 2 single-column indexes.
对于此类情况,没有经验法则,我建议您在生产数据库的副本中进行试验,看看哪种最适合您:单个多列索引或 2 个单列索引。
One nice feature of Postgres is you can have multiple indexes and use them in the same query. Check this chapter of the docs:
Postgres 的一个很好的特性是你可以有多个索引并在同一个查询中使用它们。检查文档的这一章:
... PostgreSQL has the ability to combine multiple indexes ... to handle cases that cannot be implemented by single index scans ....
... Sometimes multicolumn indexes are best, but sometimes it's better to create separate indexes and rely on the index-combination feature ...
... PostgreSQL 具有组合多个索引的能力 ... 以处理单索引扫描无法实现的情况 ....
...有时多列索引是最好的,但有时最好创建单独的索引并依靠索引组合功能...
You can even experiment creating both the individual and combined indexes, and checking how big each one isand determine if it's worth having them at the same time.
您甚至可以尝试创建单个索引和组合索引,并检查每个索引的大小并确定是否值得同时拥有它们。
Some things that you can also experiment with:
您还可以尝试一些事情:
- If your table is too large, consider partitioning it. It looks like you could partition either by location or by date. Partitioning splits your table's data in smaller tables, reducing the amount of places where a query needs to look.
- If your data is laid out according to a date (like transaction date) check BRIN indexes.
- If multiple queries will be processing your data in a similar fashion (like aggregating all transactions over the same period, check materialized viewsso you only need to do those costly aggregations once.
- 如果您的表太大,请考虑对其进行分区。看起来您可以按位置或日期进行分区。分区将表的数据拆分为较小的表,从而减少查询需要查看的位置数量。
- 如果您的数据是根据日期(如交易日期)排列的,请检查BRIN 索引。
- 如果多个查询将以类似的方式处理您的数据(例如聚合同一时期的所有事务,请检查物化视图,以便您只需要进行一次这些昂贵的聚合。
About the order in which to put your multi-column index, put first the column on which you will have an equality operation, and later the column in which you have a range, >=
or <=
operation.
关于放置多列索引的顺序,首先放置将进行相等运算的列,然后放置具有范围>=
或<=
运算的列。
回答by redneb
An index on (location,timestamp) should work better that 2 separate indexes for you case. Note that the order of the columns is important.
(location,timestamp) 上的索引应该比 2 个单独的索引更适合您的情况。请注意,列的顺序很重要。