mongodb 使用 $lookup 运算符的多个连接条件

Question

提问by user6148078

Have the two following collections:

有以下两个集合：

// collection1:
{
  user1: 1,
  user2: 2,
  percent: 0.56
}

// collection2:
{
  user1: 1,
  user2: 2,
  percent: 0.3
}

I want to join these two collections on user1and user2.

我想在user1和上加入这两个集合user2。

How can I write a pipeline in order to get a result like this:

我如何编写管道以获得这样的结果：

{
  user1: 1,
  user2: 2,
  percent1: 0.56,
  percent2: 0.3
}

Answer 1

回答by styvane

We can do multiple join conditions with the $lookupaggregation pipeline operator in version 3.6 and newer.

我们可以使用$lookup3.6 及更高版本中的聚合管道运算符执行多个连接条件。

We need to assign the fields's values to variable using the letoptional field; you then access those variables in the pipelinefield stages where you specify the pipeline to run on the collections.

我们需要使用let可选字段将字段的值分配给变量；然后pipeline，您可以在指定要在集合上运行的管道的字段阶段访问这些变量。

Note that in the $matchstage, we use the $exprevaluation query operator to compare the fields's value.

请注意，在该$match阶段，我们使用$expr评估查询运算符来比较字段的值。

The last stage in the pipeline is the $replaceRootaggregation pipeline stage where we simply merge the $lookupresult with part of the $$ROOTdocument using the $mergeObjectsoperator.

管道的最后一个阶段是$replaceRoot聚合管道阶段，我们简单地使用运算符将$lookup结果与部分$$ROOT文档合并$mergeObjects。

db.collection2.aggregate([
       {
          $lookup: {
             from: "collection1",
             let: {
                firstUser: "$user1",
                secondUser: "$user2"
             },
             pipeline: [
                {
                   $match: {
                      $expr: {
                         $and: [
                            {
                               $eq: [
                                  "$user1",
                                  "$$firstUser"
                               ]
                            },
                            {
                               $eq: [
                                  "$user2",
                                  "$$secondUser"
                               ]
                            }
                         ]
                      }
                   }
                }
             ],
             as: "result"
          }
       },
       {
          $replaceRoot: {
             newRoot: {
                $mergeObjects:[
                   {
                      $arrayElemAt: [
                         "$result",
                         0
                      ]
                   },
                   {
                      percent1: "$$ROOT.percent1"
                   }
                ]
             }
          }
       }
    ]
)

This pipeline yields something that look like this:

该管道产生如下所示的内容：

{
    "_id" : ObjectId("59e1ad7d36f42d8960c06022"),
    "user1" : 1,
    "user2" : 2,
    "percent" : 0.3,
    "percent1" : 0.56
}

If you are not on version 3.6+, you can first join using one of your field let say "user1" then from there you unwind the array of the matching document using the $unwindaggregation pipeline operator. The next stage in the pipeline is the $redactstage where you filter out those documents where the value of "user2" from the "joined" collection and the input document are not equal using the $$KEEPand $$PRUNEsystem variables. You can then reshape your document in $projectstage.

如果您使用的不是 3.6+ 版本，则可以先使用其中一个字段加入，例如“user1”，然后从那里使用$unwind聚合管道运算符展开匹配文档的数组。管道中的下一个阶段是$redact使用$$KEEP和$$PRUNE系统变量过滤掉那些来自“joined”集合的“user2”值与输入文档不相等的文档的阶段。然后，您可以在$project舞台上重塑您的文档。

db.collection1.aggregate([
    { "$lookup": { 
        "from": "collection2", 
        "localField": "user1", 
        "foreignField": "user1", 
        "as": "collection2_doc"
    }}, 
    { "$unwind": "$collection2_doc" },
    { "$redact": { 
        "$cond": [
            { "$eq": [ "$user2", "$collection2_doc.user2" ] }, 
            "$$KEEP", 
            "$$PRUNE"
        ]
    }}, 
    { "$project": { 
        "user1": 1, 
        "user2": 1, 
        "percent1": "$percent", 
        "percent2": "$collection2_doc.percent"
    }}
])

which produces:

它产生：

{
    "_id" : ObjectId("572daa87cc52a841bb292beb"),
    "user1" : 1,
    "user2" : 2,
    "percent1" : 0.56,
    "percent2" : 0.3
}

If the documents in your collections have the same structure and you find yourself performing this operation often, then you should consider to merge the two collections into one or insert the documents in those collections into a new collection.

如果您的集合中的文档具有相同的结构，并且您发现自己经常执行此操作，那么您应该考虑将两个集合合并为一个或将这些集合中的文档插入到一个新集合中。

db.collection3.insertMany(
    db.collection1.find({}, {"_id": 0})
    .toArray()
    .concat(db.collection2.find({}, {"_id": 0}).toArray())
)

Then $groupyour documents by "user1" and "user2"

然后$group你的文件由“user1”和“user2”

db.collection3.aggregate([
    { "$group": {
        "_id": { "user1": "$user1", "user2": "$user2" }, 
        "percent": { "$push": "$percent" }
    }}
])

which yields:

产生：

{ "_id" : { "user1" : 1, "user2" : 2 }, "percent" : [ 0.56, 0.3 ] }

Answer 2

回答by Andrew Nessin

If you're trying to model your data, and came here to check if mongodb can perform joins on multiple fields before deciding to do so, please read on.

如果您正在尝试对数据建模，并在决定这样做之前来这里检查 mongodb 是否可以对多个字段执行连接，请继续阅读。

While MongoDB can perform joins, you also have the freedom to model data according to your application access pattern. If the data is as simple as presented in the question, we can simply maintain a single collection that looks like this:

虽然 MongoDB 可以执行连接，但您也可以根据应用程序访问模式自由地对数据建模。如果数据像问题中呈现的一样简单，我们可以简单地维护一个如下所示的集合：

{
    user1: 1,
    user2: 2,
    percent1: 0.56,
    percent2: 0.3
}

Now you can perform all the operations on this collection you would have performed by joining. Why are we trying to avoid joins? Because they are not supported by sharded collections (docs), which will stop you from scaling out when needed. Normalizing data (having separate tables/collections) works very well in SQL, but when it comes to Mongo, avoiding joins can offer advantages without consequences in most cases. Use normalization in MongoDB only when you have no other choice. From the docs:

现在，您可以在此集合上执行通过加入执行的所有操作。为什么我们要避免连接？因为它们不受分片集合 ( docs) 的支持，这将阻止您在需要时扩展。规范化数据（具有单独的表/集合）在 SQL 中非常有效，但是当涉及到 Mongo 时，在大多数情况下避免连接可以提供优势而不会产生任何后果。仅当您别无选择时才在 MongoDB 中使用规范化。从文档：

In general, use normalized data models:
when embedding would result in duplication of data but would not provide sufficient read performance advantages to outweigh the implications of the duplication.
to represent more complex many-to-many relationships.
to model large hierarchical data sets.

一般来说，使用规范化的数据模型：
当嵌入会导致数据重复但不会提供足够的读取性能优势来抵消重复的影响时。
表示更复杂的多对多关系。
对大型分层数据集进行建模。

Check hereto read more about embedding and why you would choose it over normalization.

点击此处阅读更多关于嵌入以及为什么选择它而不是标准化的信息。

Answer 3

回答by Xavier Guihot

Starting Mongo 4.4, we can achieve this type of "join" with the new $unionWithaggregation stage coupled with a classic $groupstage:

开始Mongo 4.4，我们可以通过新的$unionWith聚合阶段加上经典$group阶段来实现这种类型的“连接” ：

// > db.collection1.find()
//   { "user1" : 1, "user2" : 2, "percent" : 0.56 }
//   { "user1" : 4, "user2" : 3, "percent" : 0.14 }
// > db.collection2.find()
//   { "user1" : 1, "user2" : 2, "percent" : 0.3  }
//   { "user1" : 2, "user2" : 3, "percent" : 0.25 }
db.collection1.aggregate([
  { $set: { percent1: "$percent" } },
  { $unionWith: {
    coll: "collection2",
    pipeline: [{ $set: { percent2: "$percent" } }]
  }},
  { $group: {
    _id: { user1: "$user1", user2: "$user2" },
    percents: { $mergeObjects: { percent1: "$percent1", percent2: "$percent2" } }
  }}
])
// { _id: { user1: 1, user2: 2 }, percents: { percent1: 0.56, percent2: 0.3 } }
// { _id: { user1: 2, user2: 3 }, percents: { percent2: 0.25 } }
// { _id: { user1: 4, user2: 3 }, percents: { percent1: 0.14 } }

This:

这个：

Starts with a union of both collections into the pipeline via the new $unionWithstage:
- We first rename percentfrom collection1to percent1(using a $setstage)
- Within the $unionWithstage, we specify a pipelineon the collection2in order to also rename percentthis time to percent2.
- This way, we can differentiate the percentage field's origin.
Continues with a $groupstage that:
- Groups records based on user1and user2
- Accumulate percentages via a $mergeObjectsoperation. Using $first: "$percent1"and $first: "$percent2"wouldn't work since this could potentially take nullfirst (for elements from the other collection). Whereas $mergeObjectsdiscards nullvalues.

从通过新$unionWith阶段将两个集合并入管道开始：
- 我们首先percent从collection1to重命名percent1（使用$set阶段）
- 在$unionWith阶段内，我们在pipeline上指定 acollection2以便也将percent这次重命名为percent2。
- 这样，我们就可以区分百分比字段的来源。
继续一个$group阶段：
- 分组记录基于user1和user2
- 通过$mergeObjects操作累积百分比。使用$first: "$percent1"and$first: "$percent2"将不起作用，因为这可能会null首先（对于来自其他集合的元素）。而$mergeObjects丢弃null值。

If you need a different output format, you can add a downstream $projectstage.

如果需要不同的输出格式，可以添加下游$project阶段。

Answer 4

回答by sbharti

You can do mutiple field matches using $matchand $projectpipelines. (see detailed answer here - mongoDB Join on multiple fields)

您可以使用$match和$project管道进行多字段匹配。（请参阅此处的详细答案 - mongoDB Join on multiple fields）

db.collection1.aggregate([
                    {"$lookup": {
                    "from": "collection2",
                    "localField": "user1",
                    "foreignField": "user1",
                    "as": "c2"
                    }},
                    {"$unwind": "$c2"},

                    {"$project": {
                    "user2Eq": {"$eq": ["$user2", "$c2.user2"]},
                    "user1": 1, "user2": 1, 
                    "percent1": "$percent", "percent2": "$c2.percent"
                    }},

                    {"$match": {
                    {"user2Eq": {"$eq": True}}
                    }},

                    {"$project": {
                    "user2Eq": 0
                    }}

                    ])

mongodb 使用 $lookup 运算符的多个连接条件

提问by user6148078

回答by styvane

回答by Andrew Nessin

回答by Xavier Guihot

回答by sbharti

相关推荐

最近更新

标签

mongodb 使用 $lookup 运算符的多个连接条件

提问by user6148078

回答by styvane

回答by Andrew Nessin

回答by Xavier Guihot

回答by sbharti

相关推荐

将用户控件添加到 wpf 窗口

具有 3 个级别的 MongoDB 嵌套查找

当属性值大于一定数量时触发 WPF

如何 $project ObjectId 到 mongodb 聚合中的字符串值？

相关推荐

最近更新

标签