postgresql 偏移量与 ROW_NUMBER()
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3125571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
OFFSET vs. ROW_NUMBER()
提问by zzzeek
As we know, Postgresql's OFFSET requires that it scan through all the rows up until the point it gets to where you requested, which makes it kind of useless for pagination through huge result sets, getting slower and slower as the OFFSET goes up.
正如我们所知,Postgresql 的 OFFSET 要求它扫描所有行,直到到达您请求的位置,这使得它对于通过巨大的结果集进行分页有点无用,随着 OFFSET 的增加而变得越来越慢。
PG 8.4 now supports window functions. Instead of:
PG 8.4 现在支持窗口函数。代替:
SELECT * FROM table ORDER BY somecol LIMIT 10 OFFSET 500
You can say:
你可以说:
SELECT * FROM (SELECT *, ROW_NUMBER() OVER (ORDER BY somecol ASC) AS rownum FROM table) AS foo
WHERE rownum > 500 AND rownum <= 510
Does the latter approach help us at all ? Or do we have to keep using identifying columns and temp tables for large pagination ?
后一种方法对我们有帮助吗?或者我们是否必须继续使用识别列和临时表来进行大分页?
采纳答案by zzzeek
I've constructed a test which compares OFFSET, cursors, and ROW_NUMBER(). My impression of ROW_NUMBER(), that it would be consistent in speed regardless of where you are in the result set, is correct. However, that speed is dramatically slower than either OFFSET or CURSOR, which, as was also my impression, are pretty much the same in speed, both degrading in speed the further out to the end of the result you go.
我已经构建了一个比较 OFFSET、游标和 ROW_NUMBER() 的测试。我对 ROW_NUMBER() 的印象是,无论您在结果集中的哪个位置,它的速度都是一致的,这是正确的。但是,该速度比 OFFSET 或 CURSOR 慢得多,这也是我的印象,它们的速度几乎相同,两者的速度都随着您走得越远而降低。
Results:
结果:
offset(100,100): 0.016359
scroll(100,100): 0.018393
rownum(100,100): 15.535614
offset(100,480000): 1.761800
scroll(100,480000): 1.781913
rownum(100,480000): 15.158601
offset(100,999900): 3.670898
scroll(100,999900): 3.664517
rownum(100,999900): 14.581068
The test script uses sqlalchemy to set up tables and 1000000 rows of test data. It then uses a psycopg2 cursor to execute each SELECT statement and fetch results with the three different methods.
测试脚本使用sqlalchemy建立表和1000000行测试数据。然后它使用 psycopg2 游标来执行每个 SELECT 语句并使用三种不同的方法获取结果。
from sqlalchemy import *
metadata = MetaData()
engine = create_engine('postgresql://scott:tiger@localhost/test', echo=True)
t1 = Table('t1', metadata,
Column('id', Integer, primary_key=True),
Column('d1', String(50)),
Column('d2', String(50)),
Column('d3', String(50)),
Column('d4', String(50)),
Column('d5', String(50))
)
if not engine.has_table('t1'):
conn = engine.connect()
t1.create(conn)
# 1000000 rows
for i in range(100):
conn.execute(t1.insert(), [
dict(
('d%d' % col, "data data data %d %d" % (col, (i * 10000) + j))
for col in range(1, 6)
) for j in xrange(1, 10001)
])
import time
def timeit(fn, count, *args):
now = time.time()
for i in xrange(count):
fn(*args)
total = time.time() - now
print "%s(%s): %f" % (fn.__name__, ",".join(repr(x) for x in args), total)
# this is a raw psycopg2 connection.
conn = engine.raw_connection()
def offset(limit, offset):
cursor = conn.cursor()
cursor.execute("select * from t1 order by id limit %d offset %d" % (limit, offset))
cursor.fetchall()
cursor.close()
def rownum(limit, offset):
cursor = conn.cursor()
cursor.execute("select * from (select *, "
"row_number() over (order by id asc) as rownum from t1) as foo "
"where rownum>=%d and rownum<%d" % (offset, limit + offset))
cursor.fetchall()
cursor.close()
def scroll(limit, offset):
cursor = conn.cursor('foo')
cursor.execute("select * from t1 order by id")
cursor.scroll(offset)
cursor.fetchmany(limit)
cursor.close()
print
timeit(offset, 10, 100, 100)
timeit(scroll, 10, 100, 100)
timeit(rownum, 10, 100, 100)
print
timeit(offset, 10, 100, 480000)
timeit(scroll, 10, 100, 480000)
timeit(rownum, 10, 100, 480000)
print
timeit(offset, 10, 100, 999900)
timeit(scroll, 10, 100, 999900)
timeit(rownum, 10, 100, 999900)
回答by Frank Heikens
Use a CURSORfor a large resultset, will be much faster. For small result sets the LIMIT OFFSET construction works fine, but it has it's limits.
对大型结果集使用CURSOR会快得多。对于小的结果集,LIMIT OFFSET 构造工作正常,但它有它的局限性。
ROW_NUMBER is a nice thing, but not for pagination. You end up with bad performance because of sequential scan's.
ROW_NUMBER 是一件好事,但不适用于分页。由于顺序扫描,您最终会出现性能不佳的情况。