MongoDB 范围分页
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9703319/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MongoDB ranged pagination
提问by Roman
It's said that using skip() for pagination in MongoDB collection with many records is slow and not recommended.
据说在有很多记录的 MongoDB 集合中使用 skip() 进行分页很慢,不推荐。
Ranged pagination (based on >_id comparsion) could be used
可以使用范围分页(基于 >_id 比较)
db.items.find({_id: {$gt: ObjectId('4f4a3ba2751e88780b000000')}});
It's good for displaying prev. & next buttons - but it's not very easy to implement when you want to display actual page numbers 1 ... 5 6 7 ... 124 - you need to pre-calculate from which "_id" each page starts.
它有利于显示上一个。& 下一个按钮——但是当你想要显示实际的页码 1 ... 5 6 7 ... 124 时,实现起来并不容易——你需要预先计算每个页面从哪个“_id”开始。
So I have two questions:
所以我有两个问题:
1) When should I start worry about that? When there're "too many records" with noticeable slowdown for skip()? 1 000? 1 000 000?
1)我应该什么时候开始担心?当有“太多记录”且 skip() 明显减慢时?1000?1 000 000?
2) What is the best approach to show links with actual page numbers when using ranged pagination?
2)在使用范围分页时显示带有实际页码的链接的最佳方法是什么?
回答by Sergio Tulentsev
Good question!
好问题!
"How many is too many?" - that, of course, depends on your data size and performance requirements. I, personally, feel uncomfortable when I skip more than 500-1000 records.
“多少是太多了?” - 当然,这取决于您的数据大小和性能要求。当我跳过超过 500-1000 条记录时,我个人会感到不舒服。
The actual answer depends on your requirements. Here's what modern sites do (or, at least, some of them).
实际答案取决于您的要求。这是现代网站所做的(或者至少是其中一些)。
First, navbar looks like this:
首先,导航栏看起来像这样:
1 2 3 ... 457
They get final page number from total record count and page size. Let's jump to page 3. That will involve some skipping from the first record. When results arrive, you know id of first record on page 3.
他们从总记录数和页面大小中获得最终页码。让我们跳到第 3 页。这将涉及从第一条记录跳过一些内容。当结果到达时,您知道第 3 页上第一条记录的 ID。
1 2 3 4 5 ... 457
Let's skip some more and go to page 5.
让我们跳过一些内容并转到第 5 页。
1 ... 3 4 5 6 7 ... 457
You get the idea. At each point you see first, last and current pages, and also two pages forward and backward from the current page.
你明白了。在每个点上,您都会看到第一页、最后一页和当前页,以及从当前页向前和向后两页。
Queries
查询
var current_id; // id of first record on current page.
// go to page current+N
db.collection.find({_id: {$gte: current_id}}).
skip(N * page_size).
limit(page_size).
sort({_id: 1});
// go to page current-N
// note that due to the nature of skipping back,
// this query will get you records in reverse order
// (last records on the page being first in the resultset)
// You should reverse them in the app.
db.collection.find({_id: {$lt: current_id}}).
skip((N-1)*page_size).
limit(page_size).
sort({_id: -1});
回答by Tad Marshall
It's hard to give a general answer because it depends a lot on what query (or queries) you are using to construct the set of results that are being displayed. If the results can be found using only the index and are presented in index order then db.dataset.find().limit().skip() can perform well even with a large number of skips. This is likely the easiest approach to code up. But even in that case, if you can cache page numbers and tie them to index values you can make it faster for the second and third person that wants to view page 71, for example.
很难给出一般性答案,因为这在很大程度上取决于您使用什么查询(或多个查询)来构建正在显示的结果集。如果可以仅使用索引找到结果并按索引顺序显示,那么 db.dataset.find().limit().skip() 即使有大量跳过也可以很好地执行。这可能是最简单的编码方法。但即使在这种情况下,如果您可以缓存页码并将它们与索引值联系起来,例如,您可以让第二和第三个人更快地查看第 71 页。
In a very dynamic dataset where documents will be added and removed while someone else is paging through data, such caching will become out-of-date quickly and the limit and skip method may be the only one reliable enough to give good results.
在一个非常动态的数据集中,当其他人正在对数据进行分页时,将添加和删除文档,这样的缓存将很快过时,限制和跳过方法可能是唯一一种足够可靠的方法,可以提供良好的结果。
回答by Carlos Ruiz
I recently encounter the same problem when trying to paginate a request while using a field that wasn't unique, for example "FirstName". The idea of this query is to be able to implement pagination on a non-unique field without using skip()
我最近在尝试使用非唯一字段(例如“FirstName”)对请求进行分页时遇到了同样的问题。这个查询的想法是能够在不使用skip()的情况下在非唯一字段上实现分页
The main problem here is being able to query for a field that is not unique "FirstName" because the following will happen:
这里的主要问题是能够查询一个不是唯一的“名字”的字段,因为会发生以下情况:
- $gt: {"FirstName": "Carlos"} -> this will skip all the records where first name is "Carlos"
- $gte: {"FirstName": "Carlos"} -> will always return the same set of data
- $gt: {"FirstName": "Carlos"} -> 这将跳过名字为“Carlos”的所有记录
- $gte: {"FirstName": "Carlos"} -> 将始终返回相同的数据集
Therefore the solution I came up with was making the $match portion of the query unique by combining the targeted search field with a secondary field in order to make it a unique search.
因此,我想出的解决方案是通过将目标搜索字段与辅助字段相结合来使查询的 $match 部分变得唯一,以使其成为唯一的搜索。
Ascending order:
升序:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$gt: 'Carlos'}}]}},
{$sort: {'FirstName': 1, '_id': 1}},
{$limit: 10}
])
Descending order:
降序排列:
db.customers.aggregate([
{$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$lt: 'Carlos'}}]}},
{$sort: {'FirstName': -1, '_id': 1}},
{$limit: 10}
])
The $match part of this query is basically behaving as an if statement: if firstName is "Carlos" then it needs to also be greater than this id if firstName is not equal to "Carlos" then it needs to be greater than "Carlos"
这个查询的 $match 部分基本上表现为一个 if 语句:如果 firstName 是“Carlos”那么它也需要大于这个 id 如果 firstName 不等于“Carlos”那么它需要大于“Carlos”
Only problem is that you cannot navigate to an specific page number (it can probably be done with some code manipulation) but other than it solved my problem with pagination for non-unique fields without having to use skip which eats a lot of memory and processing power when getting to the end of whatever dataset you are querying for.
唯一的问题是您无法导航到特定的页码(可能可以通过一些代码操作来完成),但除此之外,它解决了我对非唯一字段进行分页的问题,而不必使用会占用大量内存和处理的跳过到达您要查询的任何数据集的末尾时的功能。