postgresql 多分区Postgres表的高效查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2236776/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficient querying of multi-partition Postgres table
提问by Adrian Pronk
I've just restructured my database to use partitioningin Postgres 8.2. Now I have a problem with query performance:
我刚刚重组了我的数据库以在 Postgres 8.2 中使用分区。现在我在查询性能方面遇到了问题:
SELECT *
FROM my_table
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100;
There are 45 million rows in the table. Prior to partitioning, this would use a reverse index scan and stop as soon as it hit the limit.
表中有 4500 万行。在分区之前,这将使用反向索引扫描并在达到限制时立即停止。
After partitioning (on time_stamp ranges), Postgres does a full index scan of the master table and the relevant partition and merges the results, sorts them, then applies the limit. This takes way too long.
分区后(在时间戳范围内),Postgres 对主表和相关分区进行全索引扫描,合并结果,对它们进行排序,然后应用限制。这需要太长时间。
I can fix it with:
我可以用以下方法修复它:
SELECT * FROM (
SELECT *
FROM my_table_part_a
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100) t
UNION ALL
SELECT * FROM (
SELECT *
FROM my_table_part_b
WHERE time_stamp >= '2010-02-10' and time_stamp < '2010-02-11'
ORDER BY id DESC
LIMIT 100) t
UNION ALL
... and so on ...
ORDER BY id DESC
LIMIT 100
This runs quickly. The partitions where the times-stamps are out-of-range aren't even included in the query plan.
这运行得很快。时间戳超出范围的分区甚至不包括在查询计划中。
My question is: Is there some hint or syntax I can use in Postgres 8.2 to prevent the query-planner from scanning the full table but still using simple syntax that only refers to the master table?
我的问题是:我可以在 Postgres 8.2 中使用一些提示或语法来防止查询规划器扫描整个表,但仍然使用仅引用主表的简单语法吗?
Basically, can I avoid the pain of dynamically building the big UNION query over each partition that happens to be currently defined?
基本上,我能否避免在当前定义的每个分区上动态构建大 UNION 查询的痛苦?
EDIT:I have constraint_exclusion enabled (thanks @Vinko Vrsalovic)
编辑:我启用了constraint_exclusion(感谢@Vinko Vrsalovic)
回答by Vinko Vrsalovic
Have you tried Constraint Exclusion (section 5.9.4 in the document you've linked to)
您是否尝试过约束排除(您链接到的文档中的第 5.9.4 节)
Constraint exclusion is a query optimization technique that improves performance for partitioned tables defined in the fashion described above. As an example:
约束排除是一种查询优化技术,可提高以上述方式定义的分区表的性能。举个例子:
SET constraint_exclusion = on;
SELECT count(*) FROM measurement WHERE logdate >= DATE '2006-01-01';
Without constraint exclusion, the above query would scan each of the partitions of the measurement table. With constraint exclusion enabled, the planner will examine the constraints of each partition and try to prove that the partition need not be scanned because it could not contain any rows meeting the query's WHERE clause. When the planner can prove this, it excludes the partition from the query plan.
You can use the EXPLAIN command to show the difference between a plan with constraint_exclusion on and a plan with it off.
在没有约束排除的情况下,上述查询将扫描测量表的每个分区。启用约束排除后,规划器将检查每个分区的约束并尝试证明该分区不需要扫描,因为它不能包含任何符合查询 WHERE 子句的行。当规划器可以证明这一点时,它会从查询计划中排除分区。
您可以使用 EXPLAIN 命令来显示启用了约束排除的计划和关闭它的计划之间的差异。
回答by Greg
I had a similar problem that I was able fix by casting conditions in WHERE. EG: (assuming the time_stamp column is timestamptz type)
我有一个类似的问题,我可以通过在 WHERE 中转换条件来解决。EG:(假设time_stamp 列是timestamptz 类型)
WHERE time_stamp >= '2010-02-10'::timestamptz and time_stamp < '2010-02-11'::timestamptz
Also, make sure the CHECK condition on the table is defined the same way... EG: CHECK (time_stamp < '2010-02-10'::timestamptz)
另外,请确保表上的 CHECK 条件以相同的方式定义... EG: CHECK (time_stamp < '2010-02-10'::timestamptz)
回答by skaurus
I had the same problem and it boiled down to two reasons in my case:
我遇到了同样的问题,在我的情况下归结为两个原因:
I had indexed column of type
timestamp WITH time zone
and partition constraint by this column with typetimestamp WITHOUT time zone
.After fixing constraints
ANALYZE
of all child tables was needed.
我
timestamp WITH time zone
用 type 为该列索引了类型和分区约束的列timestamp WITHOUT time zone
。在需要修复
ANALYZE
所有子表的约束之后。
Edit: another bit of knowledge - it's important to remember that constraint exclusion (which allows PG to skip scanning some tables based on your partitioning criteria) doesn't work with, quote: non-immutable function such as CURRENT_TIMESTAMP
编辑:另一点知识 - 重要的是要记住约束排除(允许 PG 根据您的分区标准跳过扫描某些表)不起作用,引用:non-immutable function such as CURRENT_TIMESTAMP
I had requests with CURRENT_DATE
and it was part of my problem.
我有请求,CURRENT_DATE
这是我问题的一部分。