Mongodb Explain for Aggregation 框架
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12702080/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Mongodb Explain for Aggregation framework
提问by SCB
Is there an explain function for the Aggregation framework in MongoDB? I can't see it in the documentation.
MongoDB 中的聚合框架是否有解释功能?我在文档中看不到它。
If not is there some other way to check, how a query performs within the aggregation framework?
如果没有,是否有其他方法可以检查查询在聚合框架内的执行情况?
I know with find you just do
我知道找到你就做
db.collection.find().explain()
But with the aggregation framework I get an error
但是使用聚合框架我得到一个错误
db.collection.aggregate(
{ $project : { "Tags._id" : 1 }},
{ $unwind : "$Tags" },
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{
$group:
{
_id : { id: "$_id"},
"count": { $sum:1 }
}
},
{ $sort: {"count":-1}}
).explain()
回答by Stennie
Starting with MongoDB version 3.0, simply changing the order from
从 MongoDB 3.0 版开始,只需将顺序从
collection.aggregate(...).explain()
to
到
collection.explain().aggregate(...)
will give you the desired results (documentation here).
会给你想要的结果(文档在这里)。
For older versions >= 2.6, you will need to use the explain
option for aggregation pipeline operations
对于 >= 2.6 的旧版本,您将需要使用聚合管道操作的explain
选项
explain:true
explain:true
db.collection.aggregate([
{ $project : { "Tags._id" : 1 }},
{ $unwind : "$Tags" },
{ $match: {$or: [{"Tags._id":"tag1"},{"Tags._id":"tag2"}]}},
{ $group: {
_id : "$_id",
count: { $sum:1 }
}},
{$sort: {"count":-1}}
],
{
explain:true
}
)
An important consideration with the Aggregation Framework is that an index can only be used to fetch the initial data for a pipeline (e.g. usage of $match
, $sort
, $geonear
at the beginning of a pipeline) as well as subsequent $lookup
and $graphLookup
stages. Once data has been fetched into the aggregation pipeline for processing (e.g. passing through stages like $project
, $unwind
, and $group
) further manipulation will be in-memory (possibly using temporary files if the allowDiskUse
option is set).
聚合框架的一个重要考虑因素是索引只能用于获取管道的初始数据(例如,在管道开始时使用 $match
, )以及后续 和阶段。一旦数据被提取到聚合管道中进行处理(例如通过像、和等阶段),进一步的操作将在内存中(如果设置了该选项,则可能使用临时文件)。$sort
$geonear
$lookup
$graphLookup
$project
$unwind
$group
allowDiskUse
Optimizing pipelines
优化管道
In general, you can optimize aggregation pipelines by:
通常,您可以通过以下方式优化聚合管道:
- Starting a pipeline with a
$match
stage to restrict processing to relevant documents. - Ensuring the initial
$match
/$sort
stages are supported by an efficient index. - Filtering data early using
$match
,$limit
, and$skip
. - Minimizing unnecessary stages and document manipulation (perhaps reconsidering your schema if complicated aggregation gymnastics are required).
- Taking advantage of newer aggregation operators if you have upgraded your MongoDB server. For example, MongoDB 3.4 added many new aggregation stages and expressionsincluding support for working with arrays, strings, and facets.
- 启动带有
$match
阶段的管道以限制对相关文档的处理。 - 确保初始
$match
/$sort
阶段得到有效索引的支持。 - 过滤数据早期使用
$match
,$limit
和$skip
。 - 最小化不必要的阶段和文档操作(如果需要复杂的聚合体操,可能需要重新考虑您的架构)。
- 如果您升级了 MongoDB 服务器,请利用较新的聚合运算符。例如,MongoDB 3.4 添加了许多新的聚合阶段和表达式,包括对使用数组、字符串和构面的支持。
There are also a number of Aggregation Pipeline Optimizationsthat automatically happen depending on your MongoDB server version. For example, adjacent stages may be coalesced and/or reordered to improve execution without affecting the output results.
还有许多聚合管道优化会根据您的 MongoDB 服务器版本自动发生。例如,相邻阶段可以合并和/或重新排序以改进执行而不影响输出结果。
Limitations
限制
As at MongoDB 3.4, the Aggregation Framework explain
option provides information on how a pipeline is processed but does not support the same level of detail as the executionStats
mode for a find()
query. If you are focused on optimizing initial query execution you will likely find it beneficial to review the equivalent find().explain()
query with executionStats
or allPlansExecution
verbosity.
在 MongoDB 3.4 中,聚合框架explain
选项提供有关如何处理管道的信息,但不支持与查询executionStats
模式相同级别的详细信息find()
。如果您专注于优化初始查询执行,您可能会发现find().explain()
使用executionStats
或allPlansExecution
详细查看等效查询是有益的。
There are a few relevant feature requests to watch/upvote in the MongoDB issue tracker regarding more detailed execution stats to help optimize/profile aggregation pipelines:
在 MongoDB 问题跟踪器中,有一些相关的功能请求需要关注/投票,以了解更详细的执行统计信息,以帮助优化/分析聚合管道:
回答by Salvador Dali
Starting with version 2.6.xmongodb allows users to do explain with aggregation framework.
从2.6.x版本开始mongodb 允许用户使用聚合框架进行解释。
All you need to do is to add explain : true
您需要做的就是添加解释:true
db.records.aggregate(
[ ...your pipeline...],
{ explain: true }
)
Thanks to Rafa, I know that it was possible to do even in 2.4, but only through runCommand()
. But now you can use aggregate as well.
感谢 Rafa,我知道即使在 2.4 中也可以做到,但只能通过runCommand()
. 但是现在您也可以使用聚合。
回答by xameeramir
The aggregation framework is a set of analytics tools within MongoDB
that allows us to run various types of reports or analysis on documents in one or more collections. Based on the idea of a pipeline. We take input from a MongoDB
collection and pass the documents from that collection through one or more stages, each of which performs a different operation on it's inputs. Each stage takes as input whatever the stage before it produced as output. And the inputs and outputs for all stages are a stream of documents. Each stage has a specific job that it does. It's expecting a specific form of document and produces a specific output, which is itself a stream of documents. At the end of the pipeline, we get access to the output.
聚合框架是其中的一组分析工具MongoDB
,允许我们对一个或多个集合中的文档运行各种类型的报告或分析。基于管道的想法。我们从一个MongoDB
集合中获取输入并将该集合中的文档通过一个或多个阶段传递,每个阶段对其输入执行不同的操作。每个阶段都将其作为输出产生之前的任何阶段作为输入。所有阶段的输入和输出都是文档流。每个阶段都有它所做的特定工作。它期待特定形式的文档并产生特定的输出,它本身就是一个文档流。在管道的末尾,我们可以访问输出。
An individual stage is a data processing unit. Each stage takes as input a stream of documents one at a time, processes each document one at a time and produces the output stream of documents. Again, one at a time. Each stage provide a set of knobs or tunables that we can control to parameterize the stage to perform whatever task we're interested in doing. So a stage performs a generic task - a general purpose task of some kind and parameterize the stage for the particular set of documents that we're working with. And exactly what we would like that stage to do with those documents. These tunables typically take the form of operators that we can supply that will modify fields, perform arithmetic operations, reshape documents or do some sort of accumulation task as well as a veriety of other things. Often times, it the case that we'll want to include the same type of stage multiple times within a single pipeline.
一个单独的阶段是一个数据处理单元。每个阶段将一个文档流作为输入,一次一个处理每个文档并生成输出文档流。一次一个。每个阶段都提供一组旋钮或可调参数,我们可以控制这些旋钮或可调参数来参数化阶段,以执行我们感兴趣的任何任务。因此,阶段执行通用任务——某种通用任务,并为我们正在处理的特定文档集参数化阶段。以及我们希望在那个阶段对这些文件做些什么。这些可调参数通常采用我们可以提供的运算符的形式,这些运算符将修改字段、执行算术运算、重塑文档或执行某种累积任务以及许多其他事情。很多时候,我们的情况是
e.g. We may wish to perform an initial filter so that we don't have to pass the entire collection into our pipeline. But, then later on, following some additional processing, want to filter once again using a different set of criteria. So, to recap, pipeline works with a MongoDB
collection. They're composed of stages, each of which does a different data processing task on it's input and produces documents as output to be passed to the next stage. And finally at the end of the pipeline output is produced that we can then do something within our application. In many cases, it's necessary to include the same type of stage, multiple times within an individual pipeline.
例如,我们可能希望执行一个初始过滤器,这样我们就不必将整个集合传递到我们的管道中。但是,稍后,在进行一些额外的处理之后,想要使用一组不同的标准再次过滤。因此,回顾一下,管道与MongoDB
集合一起工作。它们由阶段组成,每个阶段对其输入执行不同的数据处理任务,并生成文档作为输出传递给下一个阶段。最后在管道输出的最后,我们可以在我们的应用程序中做一些事情。在许多情况下,有必要在单个管道中多次包含相同类型的阶段。