使用 postgresql 的 ORDER BY、OFFSET 和 LIMIT 优化 SELECT 查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13543004/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:34:21  来源:igfitidea点击:

Optimize SELECT query with ORDER BY, OFFSET and LIMIT of postgresql

postgresql

提问by Tan Nguyen

This is my table schema

这是我的表架构

Column       |          Type          |                      Modifiers                      
-------------+------------------------+------------------------------------------------------
id           | integer                | not null default nextval('message_id_seq'::regclass)
date_created | bigint                 |
content      | text                   |
user_name    | character varying(128) |
user_id      | character varying(128) |
user_type    | character varying(8)   |
user_ip      | character varying(128) |
user_avatar  | character varying(128) |
chatbox_id   | integer                | not null
Indexes:
    "message_pkey" PRIMARY KEY, btree (id)
    "idx_message_chatbox_id" btree (chatbox_id)
    "indx_date_created" btree (date_created)
Foreign-key constraints:
    "message_chatbox_id_fkey" FOREIGN KEY (chatbox_id) REFERENCES chatboxes(id) ON UPDATE CASCADE ON DELETE CASCADE

This is the query

这是查询

SELECT * 
FROM message 
WHERE chatbox_id= 
ORDER BY date_created 
OFFSET 0 
LIMIT 20;

($1 will be replaced by the actual ID)

($1 将替换为实际 ID)

It runs pretty well, but when it reaches 3.7 millions records, all SELECT queries start consuming a lot of CPU and RAM and then the whole system goes down. I have to temporarily backup all the current messages and truncate that table. I am not sure what is going on because everything is ok when I have about 2 millions records

它运行得很好,但是当它达到 370 万条记录时,所有 SELECT 查询开始消耗大量 CPU 和 RAM,然后整个系统就会崩溃。我必须临时备份所有当前消息并截断该表。我不确定发生了什么,因为当我有大约 200 万条记录时一切正常

I am using Postresql Server 9.1.5 with default options.

我正在使用带有默认选项的 Postresql Server 9.1.5。



Update the output of EXPLAIN ANALYZE

更新 EXPLAIN ANALYZE 的输出

Limit  (cost=0.00..6.50 rows=20 width=99) (actual time=0.107..0.295 rows=20 loops=1)
->  Index Scan Backward using indx_date_created on message  (cost=0.00..3458.77 rows=10646 width=99) (actual time=0.105..0.287 rows=20 loops=1)
Filter: (chatbox_id = 25065)
Total runtime: 0.376 ms
(4 rows)


Update server specification

更新服务器规范

Intel Xeon 5620 8x2.40GHz+HT
12GB DDR3 1333 ECC
SSD Intel X25-E Extreme 64GB


Final solution

最终解决方案

Finally I can go above 3 million messages, I have to optimize the postgresql configuration as wildplasser suggested and also make a new index as A.H. suggested

最后我可以超过 300 万条消息,我必须按照wildplasser 的建议优化postgresql 配置,并按照AH 的建议创建一个新索引

回答by A.H.

You could try to give PostgreSQL a better index for that query. I propose something like this:

您可以尝试为该查询为 PostgreSQL 提供更好的索引。我提出这样的建议:

create index invent_suitable_name on message(chatbox_id, date_created);

or

或者

 create index invent_suitable_name on message(chatbox_id, date_created desc);

回答by Igor Romanchenko

Try adding an index for chatbox_id, date_created. For this particular query it will give you maximum performance.

尝试为chatbox_id, date_created. 对于这个特定的查询,它会给你最大的性能。

For the case, when postgres "start consuming a lot of CPU and RAM" try to get more details. It could be a bug (with default configuration postgres normally doesn't consume much RAM).

对于这种情况,当 postgres“开始消耗大量 CPU 和 RAM”时,尝试获取更多详细信息。这可能是一个错误(默认配置 postgres 通常不会消耗太多内存)。

UPD My guess for the reason of bad performance:

UPD 我的猜测是性能不佳的原因:

At some point in time the table becomes to big for full scan to collect accurate statistics. After another ANALYZEPostgresql got bad statistics for the table. As a result - got bad plan that consisted of:

在某个时间点,该表变得很大,无法进行全面扫描以收集准确的统计信息。又一次ANALYZEPostgresql 得到了糟糕的表统计信息。结果 - 得到了糟糕的计划,包括:

  1. Index scan on chatbox_id;
  2. Ordering of returned records to get top 20.
  1. 索引扫描开启chatbox_id
  2. 返回记录的排序以获得前 20 名。

Because of default configs and lots of records, returned on step 1, postgres was forced to do sorting in files on disk. As a result - bad performance.

由于在第 1 步返回的默认配置和大量记录,postgres 被迫对磁盘上的文件进行排序。结果 - 性能不佳。

UPD2 EXPALIN ANALYZEshows 0.376 mstime and a good plan. Can you give details about a case with bad performance?

UPD2EXPALIN ANALYZE显示0.376 ms时间和一个好的计划。您能否提供有关性能不佳的案例的详细信息?