php 在应用 LIMIT 之前获得结果计数的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/156114/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to get result count before LIMIT was applied
提问by EvilPuppetMaster
When paging through data that comes from a DB, you need to know how many pages there will be to render the page jump controls.
当分页来自数据库的数据时,您需要知道将有多少页面来呈现页面跳转控件。
Currently I do that by running the query twice, once wrapped in a count()to determine the total results, and a second time with a limit applied to get back just the results I need for the current page.
目前,我通过运行查询两次来做到这一点,一次包含在 a 中count()以确定总结果,第二次使用限制来获取当前页面所需的结果。
This seems inefficient. Is there a better way to determine how many results would have been returned before LIMITwas applied?
这似乎效率低下。有没有更好的方法来确定在LIMIT应用之前会返回多少结果?
I am using PHP and Postgres.
我正在使用 PHP 和 Postgres。
回答by Erwin Brandstetter
Pure SQL
纯SQL
Things have changed since 2008. You can use a window functionto get the full count andthe limited result in one query. Introduced with PostgreSQL 8.4 in 2009.
自 2008 年以来,情况发生了变化。您可以使用窗口函数在一个查询中获取完整计数和有限结果。2009 年随PostgreSQL 8.4引入。
SELECT foo
, count(*) OVER() AS full_count
FROM bar
WHERE <some condition>
ORDER BY <some col>
LIMIT <pagesize>
OFFSET <offset>;
Note that this can be considerably more expensive than without the total count. All rows have to be counted, and a possible shortcut taking just the top rows from a matching index may not be helpful any more.
Doesn't matter much with small tables or full_count<= OFFSET+ LIMIT. Matters for a substantially bigger full_count.
请注意,这可能比没有总数的情况要贵得多。必须计算所有行,并且仅从匹配索引中获取顶部行的可能快捷方式可能不再有用。
与小桌子或full_count<= OFFSET+无关紧要LIMIT。事情要大得多full_count。
Corner case: when OFFSETis at least as great as the number of rows from the base query, no rowis returned. So you also get no full_count. Possible alternative:
OFFSET极端情况:当至少与基本查询中的行数一样大时,不返回任何行。所以你也没有full_count。可能的替代方案:
Sequence of events in a SELECTquery
SELECT查询中的事件序列
( 0. CTEs are evaluated and materialized separately. In Postgres 12 or later the planner may inline those like subqueries before going to work.) Not here.
( 0. CTE 被单独评估和具体化。在 Postgres 12 或更高版本中,规划器可能会在开始工作之前内联那些类似子查询的内容。)不是在这里。
WHEREclause (andJOINconditions, though none in your example) filter qualifying rows from the base table(s). The rest is based on the filtered subset.
WHERE子句(和JOIN条件,尽管在您的示例中没有)从基表中过滤符合条件的行。其余的基于过滤的子集。
( 2. GROUP BYand aggregate functions would go here.) Not here.
( 2.GROUP BY和聚合函数会放在这里。)不在这里。
( 3. Other SELECTlist expressions are evaluated, based on grouped / aggregated columns.) Not here.
( 3. 其他SELECT列表表达式是根据分组/聚合列计算的。) 不是这里。
Window functions are applied depending on the
OVERclause and the frame specification of the function. The simplecount(*) OVER()is based on all qualifying rows.ORDER BY
窗口函数的应用取决于函数的
OVER子句和框架规范。简单count(*) OVER()基于所有符合条件的行。ORDER BY
( 6. DISTINCTor DISTINCT ONwould go here.) Not here.
( 6. DISTINCTor DISTINCT ONwould go here.) 不在这里。
LIMIT/OFFSETare applied based on the established order to select rows to return.
LIMIT/OFFSET根据既定的顺序应用以选择要返回的行。
LIMIT/ OFFSETbecomes increasingly inefficient with a growing number of rows in the table. Consider alternative approaches if you need better performance:
LIMIT/OFFSET随着表中行数的增加而变得越来越低效。如果您需要更好的性能,请考虑替代方法:
Alternatives to get final count
获得最终计数的替代方法
There are completely different approaches to get the count of affected rows (notthe full count before OFFSET& LIMITwere applied). Postgres has internal bookkeeping how many rows where affected by the last SQL command. Some clients can access that information or count rows themselves (like psql).
有完全不同的方法来获取受影响行的计数(而不是应用OFFSET&之前的完整计数LIMIT)。Postgres 有内部簿记受最后一条 SQL 命令影响的行数。一些客户端可以访问该信息或自己计算行数(如 psql)。
For instance, you can retrieve the number of affected rows in plpgsqlimmediately after executing an SQL command with:
例如,您可以在执行 SQL 命令后立即检索plpgsql中受影响的行数:
GET DIAGNOSTICS integer_var = ROW_COUNT;
Or you can use pg_num_rowsin PHP. Or similar functions in other clients.
或者你可以pg_num_rows在PHP 中使用。或其他客户端中的类似功能。
Related:
有关的:
回答by Grey Panther
As I describe on my blog, MySQL has a feature called SQL_CALC_FOUND_ROWS. This removes the need to do the query twice, but it still needs to do the query in its entireity, even if the limit clause would have allowed it to stop early.
正如我在我的博客中所描述的,MySQL 有一个名为SQL_CALC_FOUND_ROWS的功能。这消除了执行两次查询的需要,但它仍然需要完整地执行查询,即使限制子句允许它提前停止。
As far as I know, there is no similar feature for PostgreSQL. One thing to watch out for when doing pagination (the most common thing for which LIMIT is used IMHO): doing an "OFFSET 1000 LIMIT 10" means that the DB has to fetch at least1010 rows, even if it only gives you 10. A more performant way to do is to remember the value of the row you are ordering by for the previous row (the 1000th in this case) and rewrite the query like this: "... WHERE order_row > value_of_1000_th LIMIT 10". The advantage is that "order_row" is most probably indexed (if not, you've go a problem). The disadvantage being that if new elements are added between page views, this can get a little out of synch (but then again, it may not be observable by visitors and can be a big performance gain).
据我所知,PostgreSQL 没有类似的功能。进行分页时要注意的一件事(恕我直言,使用 LIMIT 最常见的事情):执行“OFFSET 1000 LIMIT 10”意味着数据库必须获取至少1010 行,即使它只给你 10 行。一种更高效的方法是记住您为前一行(在本例中为第 1000 行)排序的行的值,并像这样重写查询:“... WHERE order_row > value_of_1000_th LIMIT 10”。优点是“order_row”很可能被索引(如果没有,你就有问题)。缺点是如果在页面视图之间添加新元素,这可能会有点不同步(但话说回来,访问者可能无法观察到,并且可能会带来很大的性能提升)。
回答by Bob Somers
You could mitigate the performance penalty by not running the COUNT() query every time. Cache the number of pages for, say 5 minutes before the query is run again. Unless you're seeing a huge number of INSERTs, that should work just fine.
您可以通过不每次都运行 COUNT() 查询来减轻性能损失。在再次运行查询之前缓存页面数,例如 5 分钟。除非您看到大量 INSERT,否则应该可以正常工作。
回答by grantwparks
Since Postgres already does a certain amount of caching things, this type of method isn't as inefficient as it seems. It's definitely not doubling execution time. We have timers built into our DB layer, so I have seen the evidence.
由于 Postgres 已经做了一定数量的缓存工作,所以这种类型的方法并不像看起来那么低效。这绝对不是执行时间加倍。我们的数据库层内置了计时器,所以我已经看到了证据。
回答by Steve M
Seeing as you need to know for the purpose of paging, I'd suggest running the full query once, writing the data to disk as a server-side cache, then feeding that through your paging mechanism.
鉴于您需要了解分页的目的,我建议运行一次完整查询,将数据作为服务器端缓存写入磁盘,然后通过分页机制提供。
If you're running the COUNT query for the purpose of deciding whether to provide the data to the user or not (i.e. if there are > X records, give back an error), you need to stick with the COUNT approach.
如果您运行 COUNT 查询的目的是决定是否向用户提供数据(即,如果有 > X 条记录,则返回错误),您需要坚持使用 COUNT 方法。

