Javascript $lookup 后的聚合过滤器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36459983/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Aggregation filter after $lookup
提问by Phillip Bartschinski
How can I add a filter after an $lookup or is there any other method to do this?
如何在 $lookup 之后添加过滤器或者是否有其他方法可以做到这一点?
My data collection test is:
我的数据收集测试是:
{ "_id" : ObjectId("570557d4094a4514fc1291d6"), "id" : 100, "value" : "0", "contain" : [ ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d7"), "id" : 110, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d8"), "id" : 120, "value" : "1", "contain" : [ 100 ] }
{ "_id" : ObjectId("570557d4094a4514fc1291d9"), "id" : 121, "value" : "2", "contain" : [ 100, 120 ] }
I select id 100 and aggregate the childs:
我选择 id 100 并聚合孩子:
db.test.aggregate([ {
$match : {
id: 100
}
}, {
$lookup : {
from : "test",
localField : "id",
foreignField : "contain",
as : "childs"
}
}]);
I get back:
我回来了:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d9"),
"id":121,
"value":"2",
"contain":[ 100, 120 ]
}
]
}
But I want only childs that match with "value: 1"
但我只想要与“值:1”匹配的孩子
At the end I expect this result:
最后我期待这个结果:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
}
]
}
回答by Neil Lunn
The question here is actually about something different and does not need $lookup
at all. But for anyone arriving here purely from the title of "filtering after $lookup" then these are the techniques for you:
这里的问题实际上是关于不同的东西,根本不需要$lookup
。但是对于任何纯粹从“$lookup 后过滤”的标题来到这里的人来说,这些技术适合您:
MongoDB 3.6 - Sub-pipeline
MongoDB 3.6 - 子管道
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"let": { "id": "$id" },
"pipeline": [
{ "$match": {
"value": "1",
"$expr": { "$in": [ "$$id", "$contain" ] }
}}
],
"as": "childs"
}}
])
Earlier - $lookup + $unwind + $match coalescence
早些时候 - $lookup + $unwind + $match 合并
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$unwind": "$childs" },
{ "$match": { "childs.value": "1" } },
{ "$group": {
"_id": "$_id",
"id": { "$first": "$id" },
"value": { "$first": "$value" },
"contain": { "$first": "$contain" },
"childs": { "$push": "$childs" }
}}
])
If you question why would you $unwind
as opposed to using $filter
on the array, then read Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document sizefor all the detail on why this is generally necessary and far more optimal.
如果您质疑为什么要在数组$unwind
上使用而不是使用$filter
,请阅读Aggregate $lookup 匹配管道中文档的总大小超过最大文档大小的所有详细信息,了解为什么这通常是必要的并且更加优化。
For releases of MongoDB 3.6 and onwards, then the more expressive "sub-pipeline" is generally what you want to "filter" the results of the foreign collection before anything gets returned into the array at all.
对于 MongoDB 3.6 及以后的版本,更具表现力的“子管道”通常是您想要在任何东西返回到数组之前“过滤”外部集合的结果。
Back to the answer though which actually describes why the question asked needs "no join" at all....
回到答案,尽管它实际上描述了为什么提出的问题根本不需要“不加入”......
Original
原来的
Using $lookup
like this is not the most "efficient" way to do what you want here. But more on this later.
$lookup
像这样使用并不是在这里做你想做的最“有效”的方式。但稍后会详细介绍。
As a basic concept, just use $filter
on the resulting array:
作为一个基本概念,只需$filter
在结果数组上使用:
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$project": {
"id": 1,
"value": 1,
"contain": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$eq": [ "$$child.value", "1" ] }
}
}
}}
]);
Or use $redact
instead:
或者$redact
改用:
db.test.aggregate([
{ "$match": { "id": 100 } },
{ "$lookup": {
"from": "test",
"localField": "id",
"foreignField": "contain",
"as": "childs"
}},
{ "$redact": {
"$cond": {
"if": {
"$or": [
{ "$eq": [ "$value", "0" ] },
{ "$eq": [ "$value", "1" ] }
]
},
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
]);
Both get the same result:
两者都得到相同的结果:
{
"_id":ObjectId("570557d4094a4514fc1291d6"),
"id":100,
"value":"0",
"contain":[ ],
"childs":[ {
"_id":ObjectId("570557d4094a4514fc1291d7"),
"id":110,
"value":"1",
"contain":[ 100 ]
},
{
"_id":ObjectId("570557d4094a4514fc1291d8"),
"id":120,
"value":"1",
"contain":[ 100 ]
}
]
}
Bottom line is that $lookup
itself cannot "yet" query to only select certain data. So all "filtering" needs to happen after the $lookup
最重要的是,$lookup
它本身不能“还”查询以仅选择某些数据。所以所有的“过滤”都需要在$lookup
But really for this type of "self join" you are better off not using $lookup
at all and avoiding the overhead of an additional read and "hash-merge" entirely. Just fetch the related items and $group
instead:
但实际上对于这种类型的“自连接”,您最好根本不使用$lookup
并完全避免额外读取和“哈希合并”的开销。只需获取相关项目,$group
而是:
db.test.aggregate([
{ "$match": {
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}},
{ "$group": {
"_id": {
"$cond": {
"if": { "$eq": [ "$value", "0" ] },
"then": "$id",
"else": { "$arrayElemAt": [ "$contain", 0 ] }
}
},
"value": { "$first": { "$literal": "0"} },
"childs": {
"$push": {
"$cond": {
"if": { "$ne": [ "$value", "0" ] },
"then": "$$ROOT",
"else": null
}
}
}
}},
{ "$project": {
"value": 1,
"childs": {
"$filter": {
"input": "$childs",
"as": "child",
"cond": { "$ne": [ "$$child", null ] }
}
}
}}
])
Which only comes out a little different because I deliberately removed the extraneous fields. Add them in yourself if you really want to:
这只是有点不同,因为我故意删除了无关的字段。如果您真的想将它们添加到自己中:
{
"_id" : 100,
"value" : "0",
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [ 100 ]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [ 100 ]
}
]
}
So the only real issue here is "filtering" any null
result from the array, created when the current document was the parent
in processing items to $push
.
所以这里唯一真正的问题是“过滤”null
数组中的任何结果,当当前文档是parent
处理项目时创建的$push
。
What you also seem to be missing here is that the result you are looking for does not need aggregation or "sub-queries" at all. The structure that you have concluded or possibly found elsewhere is "designed" so that you can get a "node" and all of it's "children" in a single query request.
您在这里似乎还缺少的是,您正在寻找的结果根本不需要聚合或“子查询”。您已经得出结论或可能在其他地方找到的结构是“设计好的”,以便您可以在单个查询请求中获得一个“节点”及其所有“子节点”。
That means just the "query" is all that is really needed, and the data collection ( which is all that is happening since no content is really being "reduced" ) is just a function of iterating the cursor result:
这意味着只有“查询”才是真正需要的,而数据收集(这是所有正在发生的事情,因为没有真正“减少”内容)只是迭代游标结果的函数:
var result = {};
db.test.find({
"$or": [
{ "id": 100 },
{ "contain.0": 100, "value": "1" }
]
}).sort({ "contain.0": 1 }).forEach(function(doc) {
if ( doc.id == 100 ) {
result = doc;
result.childs = []
} else {
result.childs.push(doc)
}
})
printjson(result);
This does exactly the same thing:
这完全相同:
{
"_id" : ObjectId("570557d4094a4514fc1291d6"),
"id" : 100,
"value" : "0",
"contain" : [ ],
"childs" : [
{
"_id" : ObjectId("570557d4094a4514fc1291d7"),
"id" : 110,
"value" : "1",
"contain" : [
100
]
},
{
"_id" : ObjectId("570557d4094a4514fc1291d8"),
"id" : 120,
"value" : "1",
"contain" : [
100
]
}
]
}
And serves as proof that all you really need to do here is issue the "single" query to select both the parent and children. The returned data is just the same, and all you are doing on either server or client is "massaging" into another collected format.
并证明您在这里真正需要做的就是发出“单一”查询以选择父项和子项。返回的数据是一样的,你在服务器或客户端上所做的一切都是“按摩”成另一种收集的格式。
This is one of those cases where you can get "caught up" in thinking of how you did things in a "relational" database, and not realize that since the way the data is stored has "changed", you no longer need to use the same approach.
这是其中一种情况,您可能会“陷入”思考如何在“关系”数据库中做事,而没有意识到由于数据存储方式已经“改变”,您不再需要使用同样的方法。
That is exactly what the point of the documentation example "Model Tree Structures with Child References"in it's structure, where it makes it easy to select parents and children within one query.
这正是文档示例“带有子引用的模型树结构”在其结构中的意义所在,它可以轻松地在一个查询中选择父项和子项。