mongodb 使用 $lookup 运算符的多个连接条件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37086387/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Multiple join conditions using the $lookup operator
提问by user6148078
Have the two following collections:
有以下两个集合:
// collection1:
{
user1: 1,
user2: 2,
percent: 0.56
}
// collection2:
{
user1: 1,
user2: 2,
percent: 0.3
}
I want to join these two collections on user1
and user2
.
我想在user1
和上加入这两个集合user2
。
How can I write a pipeline in order to get a result like this:
我如何编写管道以获得这样的结果:
{
user1: 1,
user2: 2,
percent1: 0.56,
percent2: 0.3
}
回答by styvane
We can do multiple join conditions with the $lookup
aggregation pipeline operator in version 3.6 and newer.
我们可以使用$lookup
3.6 及更高版本中的聚合管道运算符执行多个连接条件。
We need to assign the fields's values to variable using the let
optional field; you then access those variables in the pipeline
field stages where you specify the pipeline to run on the collections.
我们需要使用let
可选字段将字段的值分配给变量;然后pipeline
,您可以在指定要在集合上运行的管道的字段阶段访问这些变量。
Note that in the $match
stage, we use the $expr
evaluation query operator to compare the fields's value.
请注意,在该$match
阶段,我们使用$expr
评估查询运算符来比较字段的值。
The last stage in the pipeline is the $replaceRoot
aggregation pipeline stage where we simply merge the $lookup
result with part of the $$ROOT
document using the $mergeObjects
operator.
管道的最后一个阶段是$replaceRoot
聚合管道阶段,我们简单地使用运算符将$lookup
结果与部分$$ROOT
文档合并$mergeObjects
。
db.collection2.aggregate([
{
$lookup: {
from: "collection1",
let: {
firstUser: "$user1",
secondUser: "$user2"
},
pipeline: [
{
$match: {
$expr: {
$and: [
{
$eq: [
"$user1",
"$$firstUser"
]
},
{
$eq: [
"$user2",
"$$secondUser"
]
}
]
}
}
}
],
as: "result"
}
},
{
$replaceRoot: {
newRoot: {
$mergeObjects:[
{
$arrayElemAt: [
"$result",
0
]
},
{
percent1: "$$ROOT.percent1"
}
]
}
}
}
]
)
This pipeline yields something that look like this:
该管道产生如下所示的内容:
{
"_id" : ObjectId("59e1ad7d36f42d8960c06022"),
"user1" : 1,
"user2" : 2,
"percent" : 0.3,
"percent1" : 0.56
}
If you are not on version 3.6+, you can first join using one of your field let say "user1" then from there you unwind the array of the matching document using the $unwind
aggregation pipeline operator. The next stage in the pipeline is the $redact
stage where you filter out those documents where the value of "user2" from the "joined" collection and the input document are not equal using the $$KEEP
and $$PRUNE
system variables. You can then reshape your document in $project
stage.
如果您使用的不是 3.6+ 版本,则可以先使用其中一个字段加入,例如“user1”,然后从那里使用$unwind
聚合管道运算符展开匹配文档的数组。管道中的下一个阶段是$redact
使用$$KEEP
和$$PRUNE
系统变量过滤掉那些来自“joined”集合的“user2”值与输入文档不相等的文档的阶段。然后,您可以在$project
舞台上重塑您的文档。
db.collection1.aggregate([
{ "$lookup": {
"from": "collection2",
"localField": "user1",
"foreignField": "user1",
"as": "collection2_doc"
}},
{ "$unwind": "$collection2_doc" },
{ "$redact": {
"$cond": [
{ "$eq": [ "$user2", "$collection2_doc.user2" ] },
"$$KEEP",
"$$PRUNE"
]
}},
{ "$project": {
"user1": 1,
"user2": 1,
"percent1": "$percent",
"percent2": "$collection2_doc.percent"
}}
])
which produces:
它产生:
{
"_id" : ObjectId("572daa87cc52a841bb292beb"),
"user1" : 1,
"user2" : 2,
"percent1" : 0.56,
"percent2" : 0.3
}
If the documents in your collections have the same structure and you find yourself performing this operation often, then you should consider to merge the two collections into one or insert the documents in those collections into a new collection.
如果您的集合中的文档具有相同的结构,并且您发现自己经常执行此操作,那么您应该考虑将两个集合合并为一个或将这些集合中的文档插入到一个新集合中。
db.collection3.insertMany(
db.collection1.find({}, {"_id": 0})
.toArray()
.concat(db.collection2.find({}, {"_id": 0}).toArray())
)
Then $group
your documents by "user1" and "user2"
然后$group
你的文件由“user1”和“user2”
db.collection3.aggregate([
{ "$group": {
"_id": { "user1": "$user1", "user2": "$user2" },
"percent": { "$push": "$percent" }
}}
])
which yields:
产生:
{ "_id" : { "user1" : 1, "user2" : 2 }, "percent" : [ 0.56, 0.3 ] }
回答by Andrew Nessin
If you're trying to model your data, and came here to check if mongodb can perform joins on multiple fields before deciding to do so, please read on.
如果您正在尝试对数据建模,并在决定这样做之前来这里检查 mongodb 是否可以对多个字段执行连接,请继续阅读。
While MongoDB can perform joins, you also have the freedom to model data according to your application access pattern. If the data is as simple as presented in the question, we can simply maintain a single collection that looks like this:
虽然 MongoDB 可以执行连接,但您也可以根据应用程序访问模式自由地对数据建模。如果数据像问题中呈现的一样简单,我们可以简单地维护一个如下所示的集合:
{
user1: 1,
user2: 2,
percent1: 0.56,
percent2: 0.3
}
Now you can perform all the operations on this collection you would have performed by joining. Why are we trying to avoid joins? Because they are not supported by sharded collections (docs), which will stop you from scaling out when needed. Normalizing data (having separate tables/collections) works very well in SQL, but when it comes to Mongo, avoiding joins can offer advantages without consequences in most cases. Use normalization in MongoDB only when you have no other choice. From the docs:
现在,您可以在此集合上执行通过加入执行的所有操作。为什么我们要避免连接?因为它们不受分片集合 ( docs) 的支持,这将阻止您在需要时扩展。规范化数据(具有单独的表/集合)在 SQL 中非常有效,但是当涉及到 Mongo 时,在大多数情况下避免连接可以提供优势而不会产生任何后果。仅当您别无选择时才在 MongoDB 中使用规范化。从文档:
In general, use normalized data models:
- when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
- to represent more complex many-to-many relationships.
- to model large hierarchical data sets.
一般来说,使用规范化的数据模型:
- 当嵌入会导致数据重复但不会提供足够的读取性能优势来抵消重复的影响时。
- 表示更复杂的多对多关系。
- 对大型分层数据集进行建模。
Check hereto read more about embedding and why you would choose it over normalization.
点击此处阅读更多关于嵌入以及为什么选择它而不是标准化的信息。
回答by Xavier Guihot
Starting Mongo 4.4
, we can achieve this type of "join" with the new $unionWith
aggregation stage coupled with a classic $group
stage:
开始Mongo 4.4
,我们可以通过新的$unionWith
聚合阶段加上经典$group
阶段来实现这种类型的“连接” :
// > db.collection1.find()
// { "user1" : 1, "user2" : 2, "percent" : 0.56 }
// { "user1" : 4, "user2" : 3, "percent" : 0.14 }
// > db.collection2.find()
// { "user1" : 1, "user2" : 2, "percent" : 0.3 }
// { "user1" : 2, "user2" : 3, "percent" : 0.25 }
db.collection1.aggregate([
{ $set: { percent1: "$percent" } },
{ $unionWith: {
coll: "collection2",
pipeline: [{ $set: { percent2: "$percent" } }]
}},
{ $group: {
_id: { user1: "$user1", user2: "$user2" },
percents: { $mergeObjects: { percent1: "$percent1", percent2: "$percent2" } }
}}
])
// { _id: { user1: 1, user2: 2 }, percents: { percent1: 0.56, percent2: 0.3 } }
// { _id: { user1: 2, user2: 3 }, percents: { percent2: 0.25 } }
// { _id: { user1: 4, user2: 3 }, percents: { percent1: 0.14 } }
This:
这个:
Starts with a union of both collections into the pipeline via the new
$unionWith
stage:- We first rename
percent
fromcollection1
topercent1
(using a$set
stage) - Within the
$unionWith
stage, we specify apipeline
on thecollection2
in order to also renamepercent
this time topercent2
. - This way, we can differentiate the percentage field's origin.
- We first rename
Continues with a
$group
stage that:- Groups records based on
user1
anduser2
- Accumulate percentages via a
$mergeObjects
operation. Using$first: "$percent1"
and$first: "$percent2"
wouldn't work since this could potentially takenull
first (for elements from the other collection). Whereas$mergeObjects
discardsnull
values.
- Groups records based on
从通过新
$unionWith
阶段将两个集合并入管道开始:- 我们首先
percent
从collection1
to重命名percent1
(使用$set
阶段) - 在
$unionWith
阶段内,我们在pipeline
上指定 acollection2
以便也将percent
这次重命名为percent2
。 - 这样,我们就可以区分百分比字段的来源。
- 我们首先
继续一个
$group
阶段:- 分组记录基于
user1
和user2
- 通过
$mergeObjects
操作累积百分比。使用$first: "$percent1"
and$first: "$percent2"
将不起作用,因为这可能会null
首先(对于来自其他集合的元素)。而$mergeObjects
丢弃null
值。
- 分组记录基于
If you need a different output format, you can add a downstream $project
stage.
如果需要不同的输出格式,可以添加下游$project
阶段。
回答by sbharti
You can do mutiple field matches using $matchand $projectpipelines. (see detailed answer here - mongoDB Join on multiple fields)
您可以使用$match和$project管道进行多字段匹配。(请参阅此处的详细答案 - mongoDB Join on multiple fields)
db.collection1.aggregate([
{"$lookup": {
"from": "collection2",
"localField": "user1",
"foreignField": "user1",
"as": "c2"
}},
{"$unwind": "$c2"},
{"$project": {
"user2Eq": {"$eq": ["$user2", "$c2.user2"]},
"user1": 1, "user2": 1,
"percent1": "$percent", "percent2": "$c2.percent"
}},
{"$match": {
{"user2Eq": {"$eq": True}}
}},
{"$project": {
"user2Eq": 0
}}
])