mongodb 使用组计数获取 $group 结果
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13529323/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Obtaining $group result with group count
提问by MervS
Assuming I have a collection called "posts" (in reality it is a more complex collection, posts is too simple) with the following structure:
假设我有一个名为“posts”的集合(实际上它是一个更复杂的集合,posts 太简单了)具有以下结构:
> db.posts.find()
{ "_id" : ObjectId("50ad8d451d41c8fc58000003"), "title" : "Lorem ipsum", "author" :
"John Doe", "content" : "This is the content", "tags" : [ "SOME", "RANDOM", "TAGS" ] }
I expect this collection to span hundreds of thousands, perhaps millions, that I need to query for posts by tags and group the results by tag and display the results paginated. This is where the aggregation framework comes in. I plan to use the aggregate() method to query the collection:
我希望这个集合跨越数十万,也许数百万,我需要按标签查询帖子并按标签对结果进行分组并显示分页的结果。这就是聚合框架的用武之地。我计划使用aggregate() 方法来查询集合:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
]);
The catch is that to create the paginator I would need to know the length of the output array. I know that to do that you can do:
问题是要创建分页器,我需要知道输出数组的长度。我知道要做到这一点,您可以:
db.posts.aggregate([
{ "$unwind" : "$tags" },
{ "$group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
{ "$group" : {
_id: null,
total: { $sum: 1 }
} }
]);
But that would discard the output from previous pipeline (the first group). Is there a way that the two operations be combined while preserving each pipeline's output? I know that the output of the whole aggregate operation can be cast to an array in some language and have the contents counted but there may be a possibility that the pipeline output may exceed the 16Mb limit. Also, performing the same query just to obtain the count seems like a waste.
但这会丢弃先前管道(第一组)的输出。有没有办法在保留每个管道的输出的同时将这两个操作结合起来?我知道整个聚合操作的输出可以转换为某种语言的数组并计算内容,但管道输出可能会超过 16Mb 的限制。此外,仅仅为了获得计数而执行相同的查询似乎是一种浪费。
So is obtaining the document result and count at the same time possible? Any help is appreciated.
那么是否可以同时获取文档结果和计数呢?任何帮助表示赞赏。
回答by Chien-Wei Huang
- Use
$project
to savetag
andcount
intotmp
- Use
$push
oraddToSet
to storetmp
into yourdata
list.
- 使用
$project
保存tag
和count
成tmp
- 使用
$push
或addToSet
存储tmp
到您的data
列表中。
Code:
代码:
db.test.aggregate(
{$unwind: '$tags'},
{$group:{_id: '$tags', count:{$sum:1}}},
{$project:{tmp:{tag:'$_id', count:'$count'}}},
{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}}
)
Output:
输出:
{
"result" : [
{
"_id" : null,
"total" : 5,
"data" : [
{
"tag" : "SOME",
"count" : 1
},
{
"tag" : "RANDOM",
"count" : 2
},
{
"tag" : "TAGS1",
"count" : 1
},
{
"tag" : "TAGS",
"count" : 1
},
{
"tag" : "SOME1",
"count" : 1
}
]
}
],
"ok" : 1
}
回答by Ross
I'm not sure you need the aggregation framework for this other than counting all the tags eg:
除了计算所有标签之外,我不确定您是否需要聚合框架,例如:
db.posts.aggregate(
{ "unwind" : "$tags" },
{ "group" : {
_id: { tag: "$tags" },
count: { $sum: 1 }
} }
);
For paginating through per tag you can just use the normal query syntax - like so:
对于每个标签的分页,您可以使用普通的查询语法 - 如下所示:
db.posts.find({tags: "RANDOM"}).skip(10).limit(10)