postgresql 偏移量与 ROW_NUMBER()

Question

提问by zzzeek

As we know, Postgresql's OFFSET requires that it scan through all the rows up until the point it gets to where you requested, which makes it kind of useless for pagination through huge result sets, getting slower and slower as the OFFSET goes up.

正如我们所知，Postgresql 的 OFFSET 要求它扫描所有行，直到到达您请求的位置，这使得它对于通过巨大的结果集进行分页有点无用，随着 OFFSET 的增加而变得越来越慢。

PG 8.4 now supports window functions. Instead of:

PG 8.4 现在支持窗口函数。代替：

SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500

You can say:

你可以说：

SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo
WHERE rownum > 500 AND rownum <= 510

Does the latter approach help us at all ? Or do we have to keep using identifying columns and temp tables for large pagination ?

后一种方法对我们有帮助吗？或者我们是否必须继续使用识别列和临时表来进行大分页？

Answer 1

采纳答案by zzzeek

I've constructed a test which compares OFFSET, cursors, and ROW_NUMBER(). My impression of ROW_NUMBER(), that it would be consistent in speed regardless of where you are in the result set, is correct. However, that speed is dramatically slower than either OFFSET or CURSOR, which, as was also my impression, are pretty much the same in speed, both degrading in speed the further out to the end of the result you go.

我已经构建了一个比较 OFFSET、游标和 ROW_NUMBER() 的测试。我对 ROW_NUMBER() 的印象是，无论您在结果集中的哪个位置，它的速度都是一致的，这是正确的。但是，该速度比 OFFSET 或 CURSOR 慢得多，这也是我的印象，它们的速度几乎相同，两者的速度都随着您走得越远而降低。

Results:

结果：

offset(100,100): 0.016359
scroll(100,100): 0.018393
rownum(100,100): 15.535614

offset(100,480000): 1.761800
scroll(100,480000): 1.781913
rownum(100,480000): 15.158601

offset(100,999900): 3.670898
scroll(100,999900): 3.664517
rownum(100,999900): 14.581068

The test script uses sqlalchemy to set up tables and 1000000 rows of test data. It then uses a psycopg2 cursor to execute each SELECT statement and fetch results with the three different methods.

测试脚本使用sqlalchemy建立表和1000000行测试数据。然后它使用 psycopg2 游标来执行每个 SELECT 语句并使用三种不同的方法获取结果。

from sqlalchemy import *

metadata = MetaData()
engine = create_engine('postgresql://scott:tiger@localhost/test', echo=True)

t1 = Table('t1', metadata,
    Column('id', Integer, primary_key=True),
    Column('d1', String(50)),
    Column('d2', String(50)),
    Column('d3', String(50)),
    Column('d4', String(50)),
    Column('d5', String(50))
)

if not engine.has_table('t1'):
    conn = engine.connect()
    t1.create(conn)

    # 1000000 rows
    for i in range(100):
        conn.execute(t1.insert(), [
            dict(
                ('d%d' % col, "data data data %d %d" % (col, (i * 10000) + j))
                for col in range(1, 6)
            ) for j in xrange(1, 10001)
        ])

import time

def timeit(fn, count, *args):
    now = time.time()
    for i in xrange(count):
        fn(*args)
    total = time.time() - now
    print "%s(%s): %f" % (fn.__name__, ",".join(repr(x) for x in args), total)

# this is a raw psycopg2 connection.
conn = engine.raw_connection()

def offset(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from t1 order by id limit %d offset %d" % (limit, offset))
    cursor.fetchall()
    cursor.close()

def rownum(limit, offset):
    cursor = conn.cursor()
    cursor.execute("select * from (select *, "
                    "row_number() over (order by id asc) as rownum from t1) as foo "
                    "where rownum>=%d and rownum<%d" % (offset, limit + offset))
    cursor.fetchall()
    cursor.close()

def scroll(limit, offset):
    cursor = conn.cursor('foo')
    cursor.execute("select * from t1 order by id")
    cursor.scroll(offset)
    cursor.fetchmany(limit)
    cursor.close()

print 

timeit(offset, 10, 100, 100)
timeit(scroll, 10, 100, 100)
timeit(rownum, 10, 100, 100)

print 

timeit(offset, 10, 100, 480000)
timeit(scroll, 10, 100, 480000)
timeit(rownum, 10, 100, 480000)

print 

timeit(offset, 10, 100, 999900)
timeit(scroll, 10, 100, 999900)
timeit(rownum, 10, 100, 999900)

Answer 2

回答by Frank Heikens

Use a CURSORfor a large resultset, will be much faster. For small result sets the LIMIT OFFSET construction works fine, but it has it's limits.

对大型结果集使用CURSOR会快得多。对于小的结果集，LIMIT OFFSET 构造工作正常，但它有它的局限性。

ROW_NUMBER is a nice thing, but not for pagination. You end up with bad performance because of sequential scan's.

ROW_NUMBER 是一件好事，但不适用于分页。由于顺序扫描，您最终会出现性能不佳的情况。

postgresql 偏移量与 ROW_NUMBER()

提问by zzzeek

采纳答案by zzzeek

回答by Frank Heikens

相关推荐

最近更新

标签

postgresql 偏移量与 ROW_NUMBER()

提问by zzzeek

采纳答案by zzzeek

回答by Frank Heikens

相关推荐

postgresql 如何在不创建函数的情况下执行 pl/pgsql 代码？

postgresql Rails 的 ActiveRecord::Migration 的外键？

postgresql 休眠模式参数在@SequenceGenerator 注释中不起作用

postgresql varchar 上的 SQL 索引

相关推荐

最近更新

标签