在 MongoDB shell 查询中获取“来自集合 b 的数据不在集合 a 中”

Question

提问by Raman

I have two MongoDB collections that share a common _id. Using the mongo shell, I want to find all documents in one collection that do not have a matching _id in the other collection.

我有两个共享一个公共 _id 的 MongoDB 集合。使用 mongo shell，我想在一个集合中查找在另一个集合中没有匹配 _id 的所有文档。

Example:

例子：

> db.Test.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 })
> db.Test.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 })
> db.Test.insert({ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 })
> db.Test.insert({ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 })
> db.Test.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "foo" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "foo" : 2 }
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }
> db.Test2.insert({ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 });
> db.Test2.insert({ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 });
> db.Test2.find()
{ "_id" : ObjectId("4f08a75f306b428fb9d8bb2e"), "bar" : 1 }
{ "_id" : ObjectId("4f08a766306b428fb9d8bb2f"), "bar" : 2 }

Now I want some query or queries that returns the two documents in Test where the _id's do not match any document in Test2:

现在我想要一些查询或查询返回测试中的两个文档，其中 _id 与 Test2 中的任何文档都不匹配：

{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }

I've tried various combinations of $not, $ne, $or, $in but just can't get the right combination and syntax. Also, I don't mind if db.Test2.find({}, {"_id": 1})is executed first, saved to some variable, which is then used in a second query (though I can't get that to work either).

我尝试了 $not、$ne、$or、$in 的各种组合，但无法获得正确的组合和语法。另外，我不介意是否db.Test2.find({}, {"_id": 1})首先执行，保存到某个变量，然后在第二个查询中使用（尽管我也无法使其工作）。

Update: Zachary's answer pointing to the $nin answered the key part of the question. For example, this works:

更新：Zachary 指向 $nin 的答案回答了问题的关键部分。例如，这有效：

> db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), ObjectId("4f08a766306b428fb9d8bb2f")]}})
{ "_id" : ObjectId("4f08a767306b428fb9d8bb30"), "foo" : 3 }
{ "_id" : ObjectId("4f08a769306b428fb9d8bb31"), "foo" : 4 }

But (and acknowledging this is not scalable but trying to it anyway because its not an issue in this situation) I still can't combine the two queries together in the shell. This is the closest I can get, which is obviously less than ideal:

但是（并承认这不是可扩展的，但无论如何都要尝试，因为在这种情况下这不是问题）我仍然无法在 shell 中将两个查询组合在一起。这是我能得到的最接近的，这显然不太理想：

vals = db.Test2.find({}, {"_id": 1}).toArray()
db.Test.find({"_id": {"$nin": [ObjectId(vals[0]._id), ObjectId(vals[1]._id)]}})

Is there a way to return just the values in the find command so that vals can be used directly as the array input to $nin?

有没有办法只返回 find 命令中的值，以便 vals 可以直接用作 $nin 的数组输入？

Answer 1

采纳答案by Zachary Anker

You will have to save the _ids from collection A to not pull them again from collection B, but you can do it using $nin. See Advanced Queriesfor all of the MongoDB operators.

您必须保存集合 A 中的 _ids 才能不再从集合 B 中提取它们，但您可以使用$nin. 请参阅所有 MongoDB 运算符的高级查询。

Your end query, using the example you gave would look something like:

使用您提供的示例，您的最终查询如下所示：

db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), 
 ObjectId("4f08a766306b428fb9d8bb2f")]}})`

Note that this approach won't scale. If you need a solution that scales, you should be setting a flag in collections A and B indicating if the _id is in the other collection and then query off of that instead.

请注意，这种方法不会扩展。如果您需要一个可扩展的解决方案，您应该在集合 A 和 B 中设置一个标志，指示 _id 是否在另一个集合中，然后改为查询。

Updated for second part:

第二部分更新：

The second part is impossible. MongoDB does not support joins or any sort of cross querying between collections in a single query. Querying from one collection, saving the results and then querying from the second is your only choice unless you embed the data in the rows themselves as I mention earlier.

第二部分是不可能的。MongoDB 不支持单个查询中集合之间的连接或任何类型的交叉查询。从一个集合中查询，保存结果，然后从第二个集合中查询是您唯一的选择，除非您将数据嵌入到行本身中，正如我之前提到的。

Answer 2

回答by Nikos Tsagkas

In mongo 3.2 the following code seems to work

在 mongo 3.2 下面的代码似乎工作

db.collectionb.aggregate([
    {
      $lookup:
        {
          from: "collectiona",
          localField: "collectionb_fk",
          foreignField: "collectiona_fk",
          as: "matched_docs"
        }
   },
   {
      $match: { "matched_docs": { $eq: [] } }
   }
]);

based on this https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-arrayexample

基于此https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-array示例

Answer 3

回答by Nikos Tsagkas

Answering your follow-up. I'd use map().

回答您的后续问题。我会使用地图（）。

Given this:

鉴于这种：

> b1 = {i: 1}
> db.b.save(b1)
> db.b.save({i: 2})
> db.a.save({_id: b1._id})

All you need is:

所有你需要的是：

> vals = db.a.find({}, {id: 1}).map(function(a){return a._id;})
> db.b.find({_id: {$nin: vals}})

which returns

返回

{ "_id" : ObjectId("4f08c60d6b5e49fa3f6b46c1"), "i" : 2 }

Answer 4

回答by pablo.vix

I've made a script, marking all documents on the second collection that appears in first collection. Then processed the second collection documents.

我制作了一个脚本，标记了出现在第一个集合中的第二个集合中的所有文档。然后处理第二个集合文件。

var first = db.firstCollection.aggregate([ {'$unwind':'$secondCollectionField'} ])

while (first.hasNext()){ var doc = first.next(); db.secondCollection.update( {_id:doc.secondCollectionField} ,{$set:{firstCollectionField:doc._id}} ); }

...process the second collection that has no mark

...处理没有标记的第二个集合

db.secondCollection.find({"firstCollectionField":{$exists:false}})

在 MongoDB shell 查询中获取“来自集合 b 的数据不在集合 a 中”

提问by Raman

采纳答案by Zachary Anker

回答by Nikos Tsagkas

回答by Nikos Tsagkas

回答by pablo.vix

相关推荐

最近更新

标签

在 MongoDB shell 查询中获取“来自集合 b 的数据不在集合 a 中”

提问by Raman

采纳答案by Zachary Anker

回答by Nikos Tsagkas

回答by Nikos Tsagkas

回答by pablo.vix

相关推荐

Azure 表与 Azure 上的 MongoDB

mongodb 如何导入转储的Mongodb？

如何在 Cassandra、Membase、Hadoop、MongoDB、RDBMS 等之间进行选择？

pymongo find() 与 mongodb find()，pymongo find() 提供的关于文档的数据较少

相关推荐

最近更新

标签