在 mongodb 中对大量记录进行缓慢分页

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7228169/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 12:13:22  来源:igfitidea点击:

Slow pagination over tons of records in mongodb

mongodb

提问by Radek Simko

I have over 300k records in one collection in Mongo.

我在 Mongo 的一个集合中有超过 30 万条记录。

When I run this very simple query:

当我运行这个非常简单的查询时:

db.myCollection.find().limit(5);

It takes only few miliseconds.

它只需要几毫秒。

But when I use skip in the query:

但是当我在查询中使用跳过时:

db.myCollection.find().skip(200000).limit(5)

It won't return anything... it runs for minutes and returns nothing.

它不会返回任何东西......它运行了几分钟并且什么都不返回。

How to make it better?

如何让它变得更好?

回答by Russell

One approach to this problem, if you have large quantities of documents and you are displaying them in sortedorder (I'm not sure how useful skipis if you're not) would be to use the key you're sorting on to select the next page of results.

解决此问题的一种方法是,如果您有大量文档并且按排序顺序显示它们(如果不是,我不确定有多大用处skip)将使用您排序的键来选择结果的下一页。

So if you start with

所以如果你开始

db.myCollection.find().limit(100).sort({created_date:true});

and then extract the created date of the lastdocument returned by the cursor into a variable max_created_date_from_last_result, you can get the next page with the far more efficient (presuming you have an index on created_date) query

然后将光标返回的最后一个文档的创建日期提取到一个变量中max_created_date_from_last_result,您可以获得更高效(假设您有索引created_date)查询的下一页

db.myCollection.find({created_date : { $gt : max_created_date_from_last_result } }).limit(100).sort({created_date:true}); 

回答by Tomasz Nurkiewicz

From MongoDB documentation:

从 MongoDB文档

Paging Costs

Unfortunately skip can be (very) costly and requires the server to walk from the beginning of the collection, or index, to get to the offset/skip position before it can start returning the page of data (limit). As the page number increases skip will become slower and more cpu intensive, and possibly IO bound, with larger collections.

Range based paging provides better use of indexes but does not allow you to easily jump to a specific page.

寻呼成本

不幸的是,skip 可能(非常)昂贵,并且需要服务器从集合或索引的开头走,到达偏移/跳过位置,然后才能开始返回数据页(限制)。随着页数的增加,skip 会变得更慢,CPU 密集度更高,并且可能会受到 IO 限制,并且集合更大。

基于范围的分页可以更好地利用索引,但不允许您轻松跳转到特定页面。

You have to ask yourself a question: how often do you need 40000th page? Also see thisarticle;

你必须问自己一个问题:你多久需要第 40000 页?另见这篇文章;

回答by Mr. T

I found it performant to combine the two concepts together (both a skip+limit and a find+limit). The problem with skip+limit is poor performance when you have a lot of docs (especially larger docs). The problem with find+limit is you can't jump to an arbitrary page. I want to be able to paginate without doing it sequentially.

我发现将这两个概念结合在一起(跳过+限制和查找+限制)非常有效。当您有很多文档(尤其是较大的文档)时,skip+limit 的问题是性能不佳。find+limit 的问题是你不能跳转到任意页面。我希望能够在不按顺序进行的情况下进行分页。

The steps I take are:

我采取的步骤是:

  1. Create an index based on how you want to sort your docs, or just use the default _id index (which is what I used)
  2. Know the starting value, page size and the page you want to jump to
  3. Project + skip + limit the value you should start from
  4. Find + limit the page's results
  1. 根据您希望对文档进行排序的方式创建索引,或者仅使用默认的 _id 索引(这是我使用的)
  2. 知道起始值、页面大小和要跳转到的页面
  3. 项目 + 跳过 + 限制您应该开始的值
  4. 查找+限制页面结果

It looks roughly like this if I want to get page 5432 of 16 records (in javascript):

如果我想获取 16 条记录的第 5432 页(在 javascript 中),它看起来大致如下:

let page = 5432;
let page_size = 16;
let skip_size = page * page_size;

let retval = await db.collection(...).find().sort({ "_id": 1 }).project({ "_id": 1 }).skip(skip_size).limit(1).toArray();
let start_id = retval[0].id;

retval = await db.collection(...).find({ "_id": { "$gte": new mongo.ObjectID(start_id) } }).sort({ "_id": 1 }).project(...).limit(page_size).toArray();

This works because a skip on a projected index is very fast even if you are skipping millions of records (which is what I'm doing). if you run explain("executionStats"), it still has a large number for totalDocsExaminedbut because of the projection on an index, it's extremely fast (essentially, the data blobs are never examined). Then with the value for the start of the page in hand, you can fetch the next page very quickly.

这是有效的,因为即使您要跳过数百万条记录(这就是我正在做的),对投影索引的跳过也非常快。如果你运行explain("executionStats"),它仍然有一个很大的数字,totalDocsExamined但是由于索引上的投影,它非常快(基本上,数据块从不检查)。然后有了页面开头的值,您可以非常快速地获取下一页。

回答by Kamil D?browski

i connected two answer.

我连接了两个答案。

the problem is when you using skip and limit, without sort, it just pagination by order of table in the same sequence as you write data to table so engine needs make first temporary index. is better using ready _id index :) You need use sort by _id. Than is very quickly with large tables like.

问题是当您使用跳过和限制时,没有排序,它只是按表的顺序分页,顺序与将数据写入表的顺序相同,因此引擎需要先创建临时索引。使用 ready _id 索引更好:) 您需要使用按 _id 排序。比像大桌子一样很快。

db.myCollection.find().skip(4000000).limit(1).sort({ "_id": 1 });

In PHP it will be

在 PHP 中,它将是

$manager = new \MongoDB\Driver\Manager("mongodb://localhost:27017", []);
$options = [
            'sort' => array('_id' => 1),
            'limit' => $limit, 
            'skip' => $skip,

        ];
$where = [];
$query = new \MongoDB\Driver\Query($where, $options );
$get = $manager->executeQuery("namedb.namecollection", $query);