MongoDB 按数组内部元素分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21509045/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MongoDB group by array inner-elements
提问by Gil Adirim
I've got a list of articles, and each of them has an array property which lists various individuals mentioned in them:
我有一个文章列表,每篇文章都有一个数组属性,其中列出了其中提到的各种个人:
_id: {
$oid: "52b632a9e4f2ba13c82ccd23"
},
providerName: "The Guardian",
url: "http://feeds.theguardian.com/c/34708/f/663860/s/3516cebc/sc/38/l/0L0Stheguardian0N0Cmusic0C20A130Cdec0C220Cwaterboys0Efishermans0Eblues0Etour0Ehammersmith/story01.htm",
subject: "The Waterboys – review",
class_artist: [
"paul mccartney"
]
I've been trying (unsuccessfully) to get a list of all the individual artists (class_artist
), based on the number of articles they've been tagged in within the past 7 days.
我一直在尝试(未成功)class_artist
根据他们在过去 7 天内被标记的文章数量来获取所有艺术家 ( )的列表。
I've gotten as far as:
我已经到了:
var date = new Date();
date.setDate(date.getDate() - 7);
db.articles.group({
key: { class_artist: 1 },
cond: { class_date: { $gt: date } },
reduce: function ( curr, result ) { result.cnt++; },
initial: { cnt : 0 }
}).sort({cnt: -1});
But unfortunately, it doesn't count them based on the individual array values, but by array compositions (that is, lists of artists).
但不幸的是,它不是根据单个数组值来计算它们,而是根据数组组合(即艺术家列表)来计算它们。
I tried using the $unwind
function, but have not been able to make it work.
我尝试使用该$unwind
功能,但无法使其工作。
回答by Neil Lunn
What framework are you using? This is not MongoDB shell and looks like some weird wrapper around MapReduce. In that case $unwindwould not be available, and you need it for user in the aggregation framework. Here's what you want in the mongo shell:
你用的是什么框架?这不是 MongoDB shell,看起来像是MapReduce周围的一些奇怪的包装器。在这种情况下,$unwind将不可用,您需要在聚合框架中为用户使用它。这是您在 mongo shell 中想要的:
db.articles.aggregate([
{$match: { class_date: { $gte: date } } },
{$project: { _id: 0, class_artist: 1 } },
{$unwind: "$class_artist" },
{$group: { _id: "$class_artist", tags: { $sum: 1 } }},
{$project: { _id: 0,class_artist: "$_id", tags: 1 } },
{$sort: { tags: -1 } }
])
So efficiently:
如此高效:
- Filterby date because you already set a var for the last 7 days
- Projectonly the field(s) we need { We need only one! }
- Unwindthe array so we now have a record for every array element in every document
- Groupon the Artist from the expanded documents
- Project into a document format you can use as group messed around with _id
- Sortthe results in reverse order to see the top tagged first
- 按日期过滤,因为您已经为过去 7 天设置了 var
- 项目只有场(S),我们需要{我们只需要一个!}
- 放松数组,所以我们现在有一个记录每一个文档中的每个数组元素
- 扩展文档中的艺术家分组
- 投影到一种文档格式中,您可以将其用作与 _id 混在一起的组
- 排序按相反的顺序,结果看到顶部标记的第一
And the great thing about aggregation is you can gradually build up those stages to see what is going on.
聚合的好处是你可以逐渐建立这些阶段,看看发生了什么。
Shake and bake into your own driver implmentation or ODM framework as required.
根据需要摇晃并烘焙到您自己的驱动程序实现或 ODM 框架中。