postgresql 在大表上使用 OFFSET 优化查询

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34110504/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 02:06:13  来源:igfitidea点击:

Optimize query with OFFSET on large table

sqlpostgresqlpaginationsql-order-bypostgresql-9.5

提问by Oto Shavadze

I have table

我有桌子

create table big_table (
id serial primary key,
-- other columns here
vote int
); 

This table is very big, approximately 70 million rows, I need to query:

这个表很大,大约有7000万行,我需要查询:

SELECT * FROM big_table
ORDER BY vote [ASC|DESC], id [ASC|DESC]
OFFSET x LIMIT n  -- I need this for pagination

As you may know, when xis a large number, queries like this are very slow.

您可能知道,当x数量很大时,这样的查询非常慢。

For performance optimization I added indexes:

为了性能优化,我添加了索引:

create index vote_order_asc on big_table (vote asc, id asc);

and

create index vote_order_desc on big_table (vote desc, id desc);

EXPLAINshows that the above SELECTquery uses these indexes, but it's very slow anyway with a large offset.

EXPLAIN显示上面的SELECT查询使用了这些索引,但无论如何它都非常慢,并且偏移量很大。

What can I do to optimize queries with OFFSETin big tables? Maybe PostgreSQL 9.5 or even newer versions have some features? I've searched but didn't find anything.

我可以做些什么来优化OFFSET大表中的查询?也许 PostgreSQL 9.5 甚至更新的版本有一些特性?我已经搜索过但没有找到任何东西。

回答by Erwin Brandstetter

A large OFFSETis always going to be slow. Postgres has to order all rows and count the visibleones up to your offset. To skip all previous rows directlyyou could add an indexed row_numberto the table (or create a MATERIALIZED VIEWincluding said row_number) and work with WHERE row_number > xinstead of OFFSET x.

OFFSET的总是会很慢。Postgres 必须对所有行进行排序并计算可见的行直到您的偏移量。要直接跳过所有先前的行您可以row_number向表中添加一个索引(或创建一个MATERIALIZED VIEW包含所述的row_number)并使用WHERE row_number > x而不是OFFSET x.

However, this approach is only sensible for read-only (or mostly) data. Implementing the same for table data that can change concurrentlyis more challenging. You need to start by defining desired behavior exactly.

但是,这种方法仅适用于只读(或大部分)数据。对可以并发更改的表数据实施相同的操作更具挑战性。您需要从准确定义所需的行为开始。

I suggest a different approach for pagination:

我建议一种不同的分页方法:

SELECT *
FROM   big_table
WHERE  (vote, id) > (vote_x, id_x)  -- ROW values
ORDER  BY vote, id  -- needs to be deterministic
LIMIT  n;

Where vote_xand id_xare from the lastrow of the previous page(for both DESCand ASC). Or from the firstif navigating backwards.

Wherevote_xid_xare 来自上一页最后一行(对于和)。或者从第一个向后导航。DESCASC

Comparing row values is supported by the index you already have - a feature that complies with the ISO SQL standard, but not every RDBMS supports it.

您已有的索引支持比较行值——该功能符合 ISO SQL 标准,但并非每个 RDBMS 都支持它。

CREATE INDEX vote_order_asc ON big_table (vote, id);

Or for descending order:

或降序:

SELECT *
FROM   big_table
WHERE  (vote, id) < (vote_x, id_x)  -- ROW values
ORDER  BY vote DESC, id DESC
LIMIT  n;

Can use the same index.
I suggest you declare your columns NOT NULLor acquaint yourself with the NULLS FIRST|LASTconstruct:

可以使用相同的索引。
我建议您声明您的列NOT NULL或熟悉NULLS FIRST|LAST结构:

Note two thingsin particular:

特别注意两点

  1. The ROWvalues in the WHEREclause cannot be replaced with separated member fields. WHERE (vote, id) > (vote_x, id_x)cannotbe replaced with:

    WHERE  vote >= vote_x
    AND    id   > id_x

    That would rule out allrows with id <= id_x, while we only want to do that for the same vote and not for the next. The correct translation would be:

    WHERE (vote = vote_x AND id > id_x) OR vote > vote_x
    

    ... which doesn't play along with indexes as nicely, and gets increasingly complicated for more columns.

    Would be simple for a singlecolumn, obviously. That's the special case I mentioned at the outset.

  2. The technique does not work for mixed directions in ORDER BYlike:

    ORDER  BY vote ASC, id DESC
    

    At least I can't think of a genericway to implement this as efficiently. If at least one of both columns is a numeric type, you could use a functional index with an inverted value on (vote, (id * -1))- and use the same expression in ORDER BY:

    ORDER  BY vote ASC, (id * -1) ASC
    
  1. 子句中的ROWWHERE不能用分隔的成员字段替换。WHERE (vote, id) > (vote_x, id_x)不能替换为:

    WHERE  vote >= vote_x
    AND    id   > id_x

    这将排除所有带有 的行id <= id_x,而我们只想为同一次投票而不是下一次投票。正确的翻译应该是:

    WHERE (vote = vote_x AND id > id_x) OR vote > vote_x
    

    ...它不能很好地与索引一起使用,并且对于更多的列变得越来越复杂。

    显然,对于单列来说很简单。这就是我开头提到的特例。

  2. 该技术不适用于以下混合方向ORDER BY

    ORDER  BY vote ASC, id DESC
    

    至少我想不出一种通用的方法来有效地实现这一点。如果两列中至少有一列是数字类型,则可以使用带有倒排值的函数索引(vote, (id * -1))- 并在 中使用相同的表达式ORDER BY

    ORDER  BY vote ASC, (id * -1) ASC
    

Related:

有关的:

Note in particular the presentation by Markus Winand I linked to:

特别注意 Markus Winand 的演讲,我链接到:

回答by thepiyush13

Have you tried partioning the table ?

你有没有试过分区表?

Ease of management, improved scalability and availability, and a reduction in blocking are common reasons to partition tables. Improving query performance is not a reason to employ partitioning, though it can be a beneficial side-effect in some cases. In terms of performance, it is important to ensure that your implementation plan includes a review of query performance. Confirm that your indexes continue to appropriately support your queries after the table is partitioned, and verify that queries using the clustered and nonclustered indexes benefit from partition elimination where applicable.

http://sqlperformance.com/2013/09/sql-indexes/partitioning-benefits

易于管理、改进的可扩展性和可用性以及减少阻塞是对表进行分区的常见原因。提高查询性能不是采用分区的理由,尽管在某些情况下它可能是有益的副作用。在性能方面,确保您的实施计划包括对查询性能的非常重要。确认您的索引在表分区后继续适当地支持您的查询,并验证使用聚簇和非聚簇索引的查询在适用的情况下受益于分区消除。

http://sqlperformance.com/2013/09/sql-indexes/partitioning-benefits