mongodb 仅返回嵌套数组中匹配的子文档元素
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36229123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Return only matched sub-document elements within a nested array
提问by Vico
The main collection is retailer, which contains an array for stores. Each store contains an array of offers (you can buy in this store). This offers array has an array of sizes. (See example below)
主要集合是零售商,它包含一个商店数组。每家商店都包含一系列优惠(您可以在这家商店购买)。这提供了数组具有大小的数组。(见下例)
Now I try to find all offers, which are available in the size L
.
现在,我尝试查找大小为L
.
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"XS",
"S",
"M"
]
},
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
I've try this query: db.getCollection('retailers').find({'stores.offers.size': 'L'})
我试过这个查询: db.getCollection('retailers').find({'stores.offers.size': 'L'})
I expect some Output like that:
我期待一些这样的输出:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size": [
"S",
"L",
"XL"
]
}
]
}
}
But the Output of my Query contains also the non matching offer with size
XS,X and M.
但是我的查询的输出还包含与size
XS、X 和 M不匹配的报价。
How I can force MongoDB to return only the offers, which matched my query?
如何强制 MongoDB 仅返回与我的查询匹配的优惠?
Greetings and thanks.
问候和感谢。
回答by Blakes Seven
So the query you have actually selects the "document" just like it should. But what you are looking for is to "filter the arrays" contained so that the elements returned only match the condition of the query.
因此,您的查询实际上选择了“文档”,就像它应该的那样。但是您正在寻找的是“过滤数组”,以便返回的元素仅与查询条件匹配。
The real answer is of course that unless you are really saving a lot of bandwidth by filtering out such detail then you should not even try, or at least beyond the first positional match.
真正的答案当然是,除非您真的通过过滤掉这些细节来节省大量带宽,否则您甚至不应该尝试,或者至少不要尝试超过第一个位置匹配。
MongoDB has a positional $
operatorwhich will return an array element at the matched index from a query condition. However, this only returns the "first" matched index of the "outer" most array element.
MongoDB 有一个位置$
运算符,它将从查询条件中返回匹配索引处的数组元素。但是,这仅返回“外部”最数组元素的“第一个”匹配索引。
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
)
In this case, it means the "stores"
array position only. So if there were multiple "stores" entries, then only "one" of the elements that contained your matched condition would be returned. But, that does nothing for the inner array of "offers"
, and as such every "offer" within the matchd "stores"
array would still be returned.
在这种情况下,它"stores"
仅表示阵列位置。因此,如果有多个“商店”条目,则只会返回包含匹配条件的元素中的“一个”。但是,这对 的内部数组没有任何作用,因此仍然会返回"offers"
匹配"stores"
数组中的每个“报价” 。
MongoDB has no way of "filtering" this in a standard query, so the following does not work:
MongoDB 无法在标准查询中“过滤”它,因此以下不起作用:
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$.offers.$': 1 }
)
The only tools MongoDB actually has to do this level of manipulation is with the aggregation framework. But the analysis should show you why you "probably" should not do this, and instead just filter the array in code.
MongoDB 实际上必须执行此级别操作的唯一工具是使用聚合框架。但是分析应该告诉你为什么你“可能”不应该这样做,而只是在代码中过滤数组。
In order of how you can achieve this per version.
按照每个版本如何实现这一点的顺序。
First with MongoDB 3.2.xwith using the $filter
operation:
首先使用MongoDB 3.2.x使用$filter
操作:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$filter": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$filter": {
"input": "$$store.offers",
"as": "offer",
"cond": {
"$setIsSubset": [ ["L"], "$$offer.size" ]
}
}
}
}
}
},
"as": "store",
"cond": { "$ne": [ "$$store.offers", [] ]}
}
}
}}
])
Then with MongoDB 2.6.xand above with $map
and $setDifference
:
然后在MongoDB 2.6.x及更高版本中使用$map
和$setDifference
:
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$project": {
"stores": {
"$setDifference": [
{ "$map": {
"input": {
"$map": {
"input": "$stores",
"as": "store",
"in": {
"_id": "$$store._id",
"offers": {
"$setDifference": [
{ "$map": {
"input": "$$store.offers",
"as": "offer",
"in": {
"$cond": {
"if": { "$setIsSubset": [ ["L"], "$$offer.size" ] },
"then": "$$offer",
"else": false
}
}
}},
[false]
]
}
}
}
},
"as": "store",
"in": {
"$cond": {
"if": { "$ne": [ "$$store.offers", [] ] },
"then": "$$store",
"else": false
}
}
}},
[false]
]
}
}}
])
And finally in any version above MongoDB 2.2.xwhere the aggregation framework was introduced.
最后在MongoDB 2.2.x以上的任何版本中引入了聚合框架。
db.getCollection('retailers').aggregate([
{ "$match": { "stores.offers.size": "L" } },
{ "$unwind": "$stores" },
{ "$unwind": "$stores.offers" },
{ "$match": { "stores.offers.size": "L" } },
{ "$group": {
"_id": {
"_id": "$_id",
"storeId": "$stores._id",
},
"offers": { "$push": "$stores.offers" }
}},
{ "$group": {
"_id": "$_id._id",
"stores": {
"$push": {
"_id": "$_id.storeId",
"offers": "$offers"
}
}
}}
])
Lets break down the explanations.
让我们分解解释。
MongoDB 3.2.x and greater
MongoDB 3.2.x 及更高版本
So generally speaking, $filter
is the way to go here since it is designed with the purpose in mind. Since there are multiple levels of the array, you need to apply this at each level. So first you are diving into each "offers"
within "stores"
to examime and $filter
that content.
所以一般来说,$filter
这是去这里的方式,因为它的设计考虑到了目的。由于数组有多个级别,因此您需要在每个级别应用它。所以首先你要深入到每一个"offers"
内部"stores"
来检查和$filter
那个内容。
The simple comparison here is "Does the "size"
array contain the element I am looking for". In this logical context, the short thing to do is use the $setIsSubset
operation to compare an array ("set") of ["L"]
to the target array. Where that condition is true
( it contains "L" ) then the array element for "offers"
is retained and returned in the result.
这里的简单比较是“"size"
数组是否包含我正在寻找的元素”。在此逻辑上下文中,要做的简短事情是使用该$setIsSubset
操作将一个数组(“set”)["L"]
与目标数组进行比较。如果条件是true
(它包含 "L" ),"offers"
则保留并在结果中返回数组元素。
In the higher level $filter
, you are then looking to see if the result from that previous $filter
returned an empty array []
for "offers"
. If it is not empty, then the element is returned or otherwise it is removed.
在较高的水平$filter
,你再看看是否从以前的结果$filter
返回一个空数组[]
的"offers"
。如果它不为空,则返回该元素,否则将其删除。
MongoDB 2.6.x
MongoDB 2.6.x
This is very similar to the modern process except that since there is no $filter
in this version you can use $map
to inspect each element and then use $setDifference
to filter out any elements that were returned as false
.
这与现代过程非常相似,只是因为$filter
在这个版本中没有你可以$map
用来检查每个元素,然后$setDifference
用来过滤掉任何以false
.
So $map
is going to return the whole array, but the $cond
operation just decides whether to return the element or instead a false
value. In the comparison of $setDifference
to a single element "set" of [false]
all false
elements in the returned array would be removed.
So$map
将返回整个数组,但$cond
操作只是决定是返回元素还是返回false
值。在与返回数组$setDifference
中[false]
所有false
元素的单个元素“集合”的比较中,将被删除。
In all other ways, the logic is the same as above.
在所有其他方面,逻辑与上面相同。
MongoDB 2.2.x and up
MongoDB 2.2.x 及更高版本
So below MongoDB 2.6 the only tool for working with arrays is $unwind
, and for this purpose alone you should notuse the aggregation framework "just" for this purpose.
所以下面的MongoDB 2.6使用数组的唯一工具是$unwind
,并为此单独,你应该不使用聚合框架“只是”为了这个目的。
The process indeed appears simple, by simply "taking apart" each array, filtering out the things you don't need then putting it back together. The main care is in the "two" $group
stages, with the "first" to re-build the inner array, and the next to re-build the outer array. There are distinct _id
values at all levels, so these just need to be included at every level of grouping.
这个过程确实看起来很简单,只需“拆开”每个数组,过滤掉不需要的东西,然后将它们重新组合在一起。主要关心在“两个”$group
阶段,“第一个”重新构建内部阵列,然后重新构建外部阵列。_id
所有级别都有不同的值,因此只需将这些值包含在每个分组级别中。
But the problem is that $unwind
is very costly. Though it does have purpose still, it's main usage intent is not to do this sort of filtering per document. In fact in modern releases it's only usage should be when an element of the array(s) needs to become part of the "grouping key" itself.
但问题是,$unwind
是非常昂贵的。虽然它仍然有目的,但它的主要用途不是对每个文档进行这种过滤。事实上,在现代版本中,只有当数组的元素需要成为“分组键”本身的一部分时才应该使用。
Conclusion
结论
So it's not a simple process to get matches at multiple levels of an array like this, and in fact it can be extremely costlyif implemented incorrectly.
因此,像这样在数组的多个级别上获得匹配并不是一个简单的过程,事实上,如果实施不正确,代价可能会非常高。
Only the two modern listings should ever be used for this purpose, as they employ a "single" pipeline stage in addition to the "query" $match
in order to do the "filtering". The resulting effect is little more overhead than the standard forms of .find()
.
为此目的只应使用两个现代列表,因为除了“查询”之外,它们还使用“单个”管道阶段$match
来执行“过滤”。由此产生的效果比标准形式的.find()
.
In general though, those listings still have an amount of complexity to them, and indeed unless you are really drastically reducing the content returned by such filtering in a way that makes a significant improvement in bandwidth used between the server and client, then you are better of filtering the result of the initial query and basic projection.
但总的来说,这些列表仍然有一定的复杂性,实际上,除非您真的以显着改善服务器和客户端之间使用的带宽的方式大幅减少此类过滤返回的内容,否则您会更好过滤初始查询和基本投影的结果。
db.getCollection('retailers').find(
{ 'stores.offers.size': 'L'},
{ 'stores.$': 1 }
).forEach(function(doc) {
// Technically this is only "one" store. So omit the projection
// if you wanted more than "one" match
doc.stores = doc.stores.filter(function(store) {
store.offers = store.offers.filter(function(offer) {
return offer.size.indexOf("L") != -1;
});
return store.offers.length != 0;
});
printjson(doc);
})
So working with the returned object "post" query processing is far less obtuse than using the aggregation pipeline to do this. And as stated the only "real" diffrerence would be that you are discarding the other elements on the "server" as opposed to removing them "per document" when received, which may save a little bandwidth.
因此,使用返回的对象“发布”查询处理远没有使用聚合管道来执行此操作那么笨拙。如前所述,唯一的“真正”差异是您丢弃了“服务器”上的其他元素,而不是在收到时“按文档”删除它们,这可能会节省一点带宽。
But unless you are doing this in a modern release with only$match
and $project
, then the "cost" of processing on the server will greatly outweigh the "gain" of reducing that network overhead by stripping the unmatched elements first.
但是,除非您在仅使用$match
and的现代版本中执行此操作,否则$project
在服务器上处理的“成本”将大大超过通过首先剥离不匹配元素来减少网络开销的“收益”。
In all cases, you get the same result:
在所有情况下,您都会得到相同的结果:
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"stores" : [
{
"_id" : ObjectId("56f277b5279871c20b8b4783"),
"offers" : [
{
"_id" : ObjectId("56f277b1279871c20b8b4567"),
"size" : [
"S",
"L",
"XL"
]
}
]
}
]
}
回答by profesor79
as your array is embeded we cannot use $elemMatch, instead you can use aggregation framework to get your results:
由于您的数组被嵌入,我们不能使用 $elemMatch,您可以使用聚合框架来获取结果:
db.retailers.aggregate([
{$match:{"stores.offers.size": 'L'}}, //just precondition can be skipped
{$unwind:"$stores"},
{$unwind:"$stores.offers"},
{$match:{"stores.offers.size": 'L'}},
{$group:{
_id:{id:"$_id", "storesId":"$stores._id"},
"offers":{$push:"$stores.offers"}
}},
{$group:{
_id:"$_id.id",
stores:{$push:{_id:"$_id.storesId","offers":"$offers"}}
}}
]).pretty()
what this query does is unwinds arrays (twice), then matches size and then reshapes the document to previous form. You can remove $group steps and see how it prints. Have a fun!
这个查询的作用是展开数组(两次),然后匹配大小,然后将文档重塑为以前的形式。您可以删除 $group 步骤并查看它的打印方式。玩得开心!