使用 postgresql 的 ORDER BY、OFFSET 和 LIMIT 优化 SELECT 查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13543004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql
提问by Tan Nguyen
This is my table schema
这是我的表架构
Column | Type | Modifiers
-------------+------------------------+------------------------------------------------------
id | integer | not null default nextval('message_id_seq'::regclass)
date_created | bigint |
content | text |
user_name | character varying(128) |
user_id | character varying(128) |
user_type | character varying(8) |
user_ip | character varying(128) |
user_avatar | character varying(128) |
chatbox_id | integer | not null
Indexes:
"message_pkey" PRIMARY KEY, btree (id)
"idx_message_chatbox_id" btree (chatbox_id)
"indx_date_created" btree (date_created)
Foreign-key constraints:
"message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE
This is the query
这是查询
SELECT *
FROM message
WHERE chatbox_id=
ORDER BY date_created
OFFSET 0
LIMIT 20;
($1 will be replaced by the actual ID)
($1 将替换为实际 ID)
It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records
它运行得很好,但是当它达到 370 万条记录时,所有 SELECT 查询开始消耗大量 CPU 和 RAM,然后整个系统就会崩溃。我必须临时备份所有当前消息并截断该表。我不确定发生了什么,因为当我有大约 200 万条记录时一切正常
I am using Postresql Server 9.1.5 with default options.
我正在使用带有默认选项的 Postresql Server 9.1.5。
Update the output of EXPLAIN ANALYZE
更新 EXPLAIN ANALYZE 的输出
Limit (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
-> Index Scan Backward using indx_date_created on message (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)
Update server specification
更新服务器规范
Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB
Final solution
最终解决方案
Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested
最后我可以超过 300 万条消息,我必须按照wildplasser 的建议优化postgresql 配置,并按照AH 的建议创建一个新索引
回答by A.H.
You could try to give PostgreSQL a better index for that query. I propose something like this:
您可以尝试为该查询为 PostgreSQL 提供更好的索引。我提出这样的建议:
create index invent_suitable_name on message(chatbox_id, date_created);
or
或者
create index invent_suitable_name on message(chatbox_id, date_created desc);
回答by Igor Romanchenko
Try adding an index for chatbox_id, date_created
. For this particular query it will give you maximum performance.
尝试为chatbox_id, date_created
. 对于这个特定的查询,它会给你最大的性能。
For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).
对于这种情况,当 postgres“开始消耗大量 CPU 和 RAM”时,尝试获取更多详细信息。这可能是一个错误(默认配置 postgres 通常不会消耗太多内存)。
UPD My guess for the reason of bad performance:
UPD 我的猜测是性能不佳的原因:
At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZE
Postgresql got bad statistics for the table. As a result - got bad plan that consisted of:
在某个时间点,该表变得很大,无法进行全面扫描以收集准确的统计信息。又一次ANALYZE
Postgresql 得到了糟糕的表统计信息。结果 - 得到了糟糕的计划,包括:
- Index scan on
chatbox_id
; - Ordering of returned records to get top 20.
- 索引扫描开启
chatbox_id
; - 返回记录的排序以获得前 20 名。
Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.
由于在第 1 步返回的默认配置和大量记录,postgres 被迫对磁盘上的文件进行排序。结果 - 性能不佳。
UPD2 EXPALIN ANALYZE
shows 0.376 ms
time and a good plan. Can you give details about a case with bad performance?
UPD2EXPALIN ANALYZE
显示0.376 ms
时间和一个好的计划。您能否提供有关性能不佳的案例的详细信息?