在 PostgreSQL 中使用“Cursors”进行分页

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13142834/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:30:05  来源:igfitidea点击:

Using "Cursors" for paging in PostgreSQL

postgresqlpaginationpaginate

提问by Chris Dutrow

Possible Duplicate:
How to provide an API client with 1,000,000 database results?

可能的重复:
如何为 API 客户端提供 1,000,000 个数据库结果?

Wondering of the use of Cursorsis a good way to implement "paging" using PostgreSQL.

想知道Cursors的使用是使用 PostgreSQL 实现“分页”的好方法。

The use case is that we have upwards 100,000 rows that we'd like to make available to our API clients. We thought a good way to make this happen would be to allow the client to request the information in batches ( pages ). The client could request 100 rows at a time. We would return the 100 rows as well as a cursor, then when the client was ready, they could request the next 100 rows using the cursor that we sent to them.

用例是我们有超过 100,000 行可供我们的 API 客户端使用。我们认为实现这一目标的一个好方法是允许客户端分批(页面)请求信息。客户端一次可以请求 100 行。我们将返回 100 行和一个游标,然后当客户端准备好时,他们可以使用我们发送给他们的游标请求接下来的 100 行。

However, I'm a little hazy on how cursors work and exactly how and when cursors should be used:

但是,我对游标的工作方式以及应该如何以及何时使用游标有点模糊:

  • Do the cursors require that a database connection be left open?
  • Do the cursors run inside a transaction, locking resources until they are "closed"?
  • Are there any other "gotchas" that I'm not aware of?
  • Is there another, better way that this situation should be handled?
  • 游标是否要求数据库连接保持打开状态?
  • 游标是否在事务内运行,锁定资源直到它们“关闭”?
  • 还有其他我不知道的“陷阱”吗?
  • 是否有另一种更好的方法来处理这种情况?

Thanks so much!

非常感谢!

回答by Craig Ringer

Cursors are a reasonable choice for paging in smaller intranet applications that work with large data sets, but you need to be prepared to discard them after a timeout. Users like to wander off, go to lunch, go on holiday for two weeks, etc, and leave their applications running. If it's a web-based app there's even the question of what "running" is and how to tell if the user is still around.

游标是在处理大型数据集的较小 Intranet 应用程序中进行分页的合理选择,但您需要准备好在超时后丢弃它们。用户喜欢闲逛、去吃午饭、去度假两周等,然后让他们的应用程序运行。如果它是一个基于网络的应用程序,那么甚至还有一个问题是“运行”是什么以及如何判断用户是否还在。

They are not suitable for large-scale applications with high client counts and clients that come and go near-randomly like in web-based apps or web APIs. I would not recommend using cursors in your application unless you have a fairly small client count and very high request rates ... in which case sending tiny batches of rows will be very inefficient and you should think about allowing range-requests etc instead.

它们不适合具有大量客户端的大型应用程序以及像基于 Web 的应用程序或 Web API 那样几乎随机地来来往往的客户端。我不建议在你的应用程序中使用游标,除非你有一个相当小的客户端数量和非常高的请求率......在这种情况下,发送小批量的行将非常低效,你应该考虑允许范围请求等。

Cursors have several costs. If the cursor is not WITH HOLDyou must keep a transaction open. The open transaction can prevent autovacuum from doing its work properly, causing table bloat and other issues. If the cursor is declared WITH HOLDand the transaction isn't held open you have to pay the cost of materializing and storing a potentially large result set - at least, I think that's how hold cursors work. The alternative is just as bad, keeping the transaction implicitly open until the cursor is destroyed and preventing rows from being cleaned up.

游标有几个成本。如果游标不是,WITH HOLD您必须保持事务打开。打开的事务会阻止 autovacuum 正常工作,导致表膨胀和其他问题。如果声明了游标WITH HOLD并且事务没有保持打开状态,则您必须支付实现和存储可能很大的结果集的成本 - 至少,我认为这就是保持游标的工作方式。替代方案同样糟糕,保持事务隐式打开,直到游标被销毁并防止行被清理。

Additionally, if you're using cursors you can't hand connections back to a connection pool. You'll need one connection per client. That means more backend resources are used just maintaining session state, and sets a very real upper limit on the number of clients you can handle with a cursor-based approach.

此外,如果您使用游标,则无法将连接交还给连接池。每个客户端需要一个连接。这意味着更多的后端资源仅用于维护会话状态,并为您可以使用基于游标的方法处理的客户端数量设置了一个非常实际的上限。

There's also the complexity and overhead of managing a stateful, cursor-based setup as compared to a stateless connection-pooling approach with limit and offset. You need to have your application expire cursors after a timeout or you face potentially unbounded resource use on the server, and you need to keep track of which connections have which cursors for which result sets for which users....

与具有限制和偏移的无状态连接池方法相比,管理有状态的、基于游标的设置也存在复杂性和开销。您需要让您的应用程序在超时后使游标过期,或者您面临服务器上潜在的无限资源使用,并且您需要跟踪哪些连接具有哪些游标,哪些结果集为哪些用户....

In general, despite the fact that it can be quite inefficient, LIMITand OFFSETcan be the better solution. It can often be better to search the primary key rather than using OFFSET, though.

在一般情况下,尽管事实上,它可以是相当低效的,LIMIT并且OFFSET可以更好的解决方案。它往往是更好的搜索主键,而不是使用OFFSET,虽然

By the way, you were looking at the documentation for cursors in PL/pgSQL. You want normal SQL-level cursorsfor this job.

顺便说一下,您正在查看有关 PL/pgSQL 中游标的文档。您需要此作业的普通 SQL 级游标



Do the cursors require that a database connection be left open?

游标是否要求数据库连接保持打开状态?

Yes.

是的。

Do the cursors run inside a transaction, locking resources until they are "closed"?

游标是否在事务内运行,锁定资源直到它们“关闭”?

Yes unless they are WITH HOLD, in which case they consume other database resources.

是的,除非它们是WITH HOLD,在这种情况下它们会消耗其他数据库资源。

Are there any other "gotchas" that I'm not aware of?

还有其他我不知道的“陷阱”吗?

Yes, as the above should explain.

是的,正如上面应该解释的那样。

回答by Colin 't Hart

For HTTP clients, don't use cursors to implement paging. For scalability, you don't want server resources tied up between requests.

对于 HTTP 客户端,不要使用游标来实现分页。对于可伸缩性,您不希望在请求之间绑定服务器资源。

Instead, use LIMIT and OFFSET on your queries; see LIMITand OFFSETin the Pg docs.

相反,在您的查询中使用 LIMIT 和 OFFSET;请参阅LIMITOFFSET在 Pg 文档中

But make sure that the indexing on your tables will support efficient queries of this form.

但请确保表上的索引将支持这种形式的高效查询。

Design a RESTful API, so that the client can invoke the "next_url" (also passed in the response) to get the next set of rows.

设计一个 RESTful API,以便客户端可以调用“next_url”(也在响应中传递)来获取下一组行。