在 mongodb 中对大量记录进行缓慢分页

Question

提问by Radek Simko

I have over 300k records in one collection in Mongo.

我在 Mongo 的一个集合中有超过 30 万条记录。

When I run this very simple query:

当我运行这个非常简单的查询时：

db.myCollection.find().limit(5);

It takes only few miliseconds.

它只需要几毫秒。

But when I use skip in the query:

但是当我在查询中使用跳过时：

db.myCollection.find().skip(200000).limit(5)

It won't return anything... it runs for minutes and returns nothing.

它不会返回任何东西......它运行了几分钟并且什么都不返回。

How to make it better?

如何让它变得更好？

Answer 1

回答by Russell

One approach to this problem, if you have large quantities of documents and you are displaying them in sortedorder (I'm not sure how useful skipis if you're not) would be to use the key you're sorting on to select the next page of results.

解决此问题的一种方法是，如果您有大量文档并且按排序顺序显示它们（如果不是，我不确定有多大用处skip）将使用您排序的键来选择结果的下一页。

So if you start with

所以如果你开始

db.myCollection.find().limit(100).sort({created_date:true});

and then extract the created date of the lastdocument returned by the cursor into a variable max_created_date_from_last_result, you can get the next page with the far more efficient (presuming you have an index on created_date) query

然后将光标返回的最后一个文档的创建日期提取到一个变量中max_created_date_from_last_result，您可以获得更高效（假设您有索引created_date）查询的下一页

db.myCollection.find({created_date : { $gt : max_created_date_from_last_result } }).limit(100).sort({created_date:true});

Answer 2

回答by Tomasz Nurkiewicz

From MongoDB documentation:

从 MongoDB文档：

Paging Costs
Unfortunately skip can be (very) costly and requires the server to walk from the beginning of the collection, or index, to get to the offset/skip position before it can start returning the page of data (limit). As the page number increases skip will become slower and more cpu intensive, and possibly IO bound, with larger collections.
Range based paging provides better use of indexes but does not allow you to easily jump to a specific page.

寻呼成本
不幸的是，skip 可能（非常）昂贵，并且需要服务器从集合或索引的开头走，到达偏移/跳过位置，然后才能开始返回数据页（限制）。随着页数的增加，skip 会变得更慢，CPU 密集度更高，并且可能会受到 IO 限制，并且集合更大。
基于范围的分页可以更好地利用索引，但不允许您轻松跳转到特定页面。

You have to ask yourself a question: how often do you need 40000th page? Also see thisarticle;

你必须问自己一个问题：你多久需要第 40000 页？另见这篇文章；

Answer 3

回答by Mr. T

I found it performant to combine the two concepts together (both a skip+limit and a find+limit). The problem with skip+limit is poor performance when you have a lot of docs (especially larger docs). The problem with find+limit is you can't jump to an arbitrary page. I want to be able to paginate without doing it sequentially.

我发现将这两个概念结合在一起（跳过+限制和查找+限制）非常有效。当您有很多文档（尤其是较大的文档）时，skip+limit 的问题是性能不佳。find+limit 的问题是你不能跳转到任意页面。我希望能够在不按顺序进行的情况下进行分页。

The steps I take are:

我采取的步骤是：

Create an index based on how you want to sort your docs, or just use the default _id index (which is what I used)
Know the starting value, page size and the page you want to jump to
Project + skip + limit the value you should start from
Find + limit the page's results

根据您希望对文档进行排序的方式创建索引，或者仅使用默认的 _id 索引（这是我使用的）
知道起始值、页面大小和要跳转到的页面
项目 + 跳过 + 限制您应该开始的值
查找+限制页面结果

It looks roughly like this if I want to get page 5432 of 16 records (in javascript):

如果我想获取 16 条记录的第 5432 页（在 javascript 中），它看起来大致如下：

let page = 5432;
let page_size = 16;
let skip_size = page * page_size;

let retval = await db.collection(...).find().sort({ "_id": 1 }).project({ "_id": 1 }).skip(skip_size).limit(1).toArray();
let start_id = retval[0].id;

retval = await db.collection(...).find({ "_id": { "$gte": new mongo.ObjectID(start_id) } }).sort({ "_id": 1 }).project(...).limit(page_size).toArray();

This works because a skip on a projected index is very fast even if you are skipping millions of records (which is what I'm doing). if you run explain("executionStats"), it still has a large number for totalDocsExaminedbut because of the projection on an index, it's extremely fast (essentially, the data blobs are never examined). Then with the value for the start of the page in hand, you can fetch the next page very quickly.

这是有效的，因为即使您要跳过数百万条记录（这就是我正在做的），对投影索引的跳过也非常快。如果你运行explain("executionStats")，它仍然有一个很大的数字，totalDocsExamined但是由于索引上的投影，它非常快（基本上，数据块从不检查）。然后有了页面开头的值，您可以非常快速地获取下一页。

Answer 4

回答by Kamil D?browski

i connected two answer.

我连接了两个答案。

the problem is when you using skip and limit, without sort, it just pagination by order of table in the same sequence as you write data to table so engine needs make first temporary index. is better using ready _id index :) You need use sort by _id. Than is very quickly with large tables like.

问题是当您使用跳过和限制时，没有排序，它只是按表的顺序分页，顺序与将数据写入表的顺序相同，因此引擎需要先创建临时索引。使用 ready _id 索引更好:) 您需要使用按 _id 排序。比像大桌子一样很快。

db.myCollection.find().skip(4000000).limit(1).sort({ "_id": 1 });

In PHP it will be

在 PHP 中，它将是

$manager = new \MongoDB\Driver\Manager("mongodb://localhost:27017", []);
$options = [
            'sort' => array('_id' => 1),
            'limit' => $limit, 
            'skip' => $skip,

        ];
$where = [];
$query = new \MongoDB\Driver\Query($where, $options );
$get = $manager->executeQuery("namedb.namecollection", $query);

在 mongodb 中对大量记录进行缓慢分页

提问by Radek Simko

回答by Russell

回答by Tomasz Nurkiewicz

回答by Mr. T

回答by Kamil D?browski

相关推荐

最近更新

标签

在 mongodb 中对大量记录进行缓慢分页

提问by Radek Simko

回答by Russell

回答by Tomasz Nurkiewicz

回答by Mr. T

回答by Kamil D?browski

相关推荐

mongodb 如何在mongodb中找到最小值

MongoDB 删除每个数据库

MongoDB 在哪里存储它的文档？

mongodb 无法启动mongodb本地服务器

相关推荐

最近更新

标签