mongodb 使用 Mongo 聚合框架进行多组操作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11040662/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Multiple group operations using Mongo aggregation framework
提问by Allyl Isocyanate
Given a set of questions that have linked survey and category id:
给定一组关联调查和类别 ID 的问题:
> db.questions.find().toArray();
[
{
"_id" : ObjectId("4fda05bc322b1c95b531ac25"),
"id" : 1,
"name" : "Question 1",
"category_id" : 1,
"survey_id" : 1,
"score" : 5
},
{
"_id" : ObjectId("4fda05cb322b1c95b531ac26"),
"id" : 2,
"name" : "Question 2",
"category_id" : 1,
"survey_id" : 1,
"score" : 3
},
{
"_id" : ObjectId("4fda05d9322b1c95b531ac27"),
"id" : 3,
"name" : "Question 3",
"category_id" : 2,
"survey_id" : 1,
"score" : 4
},
{
"_id" : ObjectId("4fda4287322b1c95b531ac28"),
"id" : 4,
"name" : "Question 4",
"category_id" : 2,
"survey_id" : 1,
"score" : 7
}
]
I can find the category average with:
我可以找到类别平均值:
db.questions.aggregate(
{ $group : {
_id : "$category_id",
avg_score : { $avg : "$score" }
}
}
);
{
"result" : [
{
"_id" : 1,
"avg_score" : 4
},
{
"_id" : 2,
"avg_score" : 5.5
}
],
"ok" : 1
}
How can I get the average of category averages (note this is different than simply averaging all questions)?I would assume I would do multiple group operations but this fails:
如何获得类别平均值的平均值(请注意,这与简单地对所有问题求平均值不同)?我假设我会做多个组操作,但这失败了:
> db.questions.aggregate(
... { $group : {
... _id : "$category_id",
... avg_score : { $avg : "$score" },
... }},
... { $group : {
... _id : "$survey_id",
... avg_score : { $avg : "$score" },
... }}
... );
{
"errmsg" : "exception: the _id field for a group must not be undefined",
"code" : 15956,
"ok" : 0
}
>
回答by William Z
It's important to understand that the operations in the argument to aggregate() form a pipeline. This meant that the input to any element of the pipeline is the stream of documents produced by the previous element in the pipeline.
理解aggregate() 参数中的操作形成一个管道很重要。这意味着管道任何元素的输入都是管道中前一个元素生成的文档流。
In your example, your first query creates a pipeline of documents that look like this:
在您的示例中,您的第一个查询创建了一个如下所示的文档管道:
{
"_id" : 2,
"avg_score" : 5.5
},
{
"_id" : 1,
"avg_score" : 4
}
This means that the second element of the pipline is seeing a series of documents where the only keys are "_id" and "avg_score". The keys "category_id" and "score" no longer existin this document stream.
这意味着管道的第二个元素是看到一系列文档,其中唯一的键是“_id”和“avg_score”。此文档流中不再存在键“category_id”和“score” 。
If you want to further aggregate on this stream, you'll have to aggregate using the keys that are seen at this stage in the pipeline. Since you want to average the averages, you need to put in a single constant value for the _id field, so that all of the input documents get grouped into a single result.
如果您想在此流上进一步聚合,则必须使用在此阶段在管道中看到的键进行聚合。由于您想要平均平均值,您需要为 _id 字段输入一个常量值,以便所有输入文档都被分组为一个结果。
The following code produces the correct result:
以下代码产生正确的结果:
db.questions.aggregate(
{ $group : {
_id : "$category_id",
avg_score : { $avg : "$score" },
}
},
{ $group : {
_id : "all",
avg_score : { $avg : "$avg_score" },
}
}
);
When run, it produces the following output:
运行时,它会产生以下输出:
{
"result" : [
{
"_id" : "all",
"avg_score" : 4.75
}
],
"ok" : 1
}