mongodb 检查数组中的每个元素是否符合条件

Question

提问by Wex

I have a collection of documents:

我有一组文件：

date: Date
users: [
  { user: 1, group: 1 }
  { user: 5, group: 2 }
]

date: Date
users: [
  { user: 1, group: 1 }
  { user: 3, group: 2 }
]

I would like to query against this collection to find all documents where every user id in my array of users is in another array, [1, 5, 7]. In this example, only the first document matches.

我想对这个集合进行查询，以查找我的用户数组中的每个用户 ID 都在另一个数组 [1, 5, 7] 中的所有文档。在这个例子中，只有第一个文档匹配。

The best solution I've been able to find is to do:

我能找到的最佳解决方案是：

$where: function() { 
  var ids = [1, 5, 7];
  return this.users.every(function(u) { 
    return ids.indexOf(u.user) !== -1;
  });
}

Unfortunately, this seems to hurt performance is stated in the $wheredocs:

不幸的是，这似乎损害了$where文档中的性能：

$where evaluates JavaScript and cannot take advantage of indexes.

$where 评估 JavaScript，不能利用索引。

How can I improve this query?

如何改进此查询？

Answer 1

回答by Asya Kamsky

The query you want is this:

你想要的查询是这样的：

db.collection.find({"users":{"$not":{"$elemMatch":{"user":{$nin:[1,5,7]}}}}})

This says find me all documents that don't have elements that are outside of the list 1,5,7.

这表示找到我所有没有列表 1、5、7 之外的元素的文档。

Answer 2

回答by Neil Lunn

I don't know about better, but there are a few different ways to approach this, and depending on the version of MongoDB you have available.

我不知道更好，但有几种不同的方法可以解决这个问题，具体取决于您可用的 MongoDB 版本。

Not too sure if this is your intention or not, but the query as shown will match the first document example because as your logic is implemented you are matching the elements within that document's array that must be contained within the sample array.

不太确定这是否是您的意图，但所示查询将匹配第一个文档示例，因为在实现您的逻辑时，您正在匹配该文档数组中必须包含在示例数组中的元素。

So if you actually wanted the document to contain allof those elements, then the $alloperator would be the obvious choice:

因此，如果您确实希望文档包含所有这些元素，那么$all运算符将是显而易见的选择：

db.collection.find({ "users.user": { "$all": [ 1, 5, 7 ] } })

But working with the presumption that your logic is actually intended, at least as per suggestion you can "filter" those results by combining with the $inoperator so that there are less documents subject to your $where**condition in evaluated JavaScript:

但是假设您的逻辑实际上是有意的，至少根据建议，您可以通过与$in操作员结合来“过滤”这些结果，从而减少受您的影响的文档$where**评估 JavaScript 中的条件：

db.collection.find({
    "users.user": { "$in": [ 1, 5, 7 ] },
    "$where": function() { 
        var ids = [1, 5, 7];
        return this.users.every(function(u) { 
            return ids.indexOf(u.user) !== -1;
        });
    }
})

And you get an index though the actual scanned will be multiplied by the number of elements in the arrays from the matched documents, but still better than without the additional filter.

您会得到一个索引，尽管实际扫描的结果将乘以匹配文档中数组中的元素数，但仍然比没有附加过滤器要好。

Or even possibly you consider the logical abstraction of the $andoperator used in combination with $orand possibly the $sizeoperator depending on your actual array conditions:

或者甚至可能您考虑$and结合使用的运算符的逻辑抽象，$or并且可能$size根据您的实际数组条件考虑运算符：

db.collection.find({
    "$or": [
        { "users.user": { "$all": [ 1, 5, 7 ] } },
        { "users.user": { "$all": [ 1, 5 ] } },
        { "users.user": { "$all": [ 1, 7 ] } },
        { "users": { "$size": 1 }, "users.user": 1 },
        { "users": { "$size": 1 }, "users.user": 5 },
        { "users": { "$size": 1 }, "users.user": 7 }
    ]
})

So this is a generations of all of the possible permutations of your matching condition, but again performance will likely vary depending on your available installed version.

因此，这是匹配条件的所有可能排列的代数，但性能可能会因可用的已安装版本而异。

NOTE:Actually a complete fail in this case as this does something entirely different and in fact results in a logical $in

注意：实际上在这种情况下完全失败，因为这做了完全不同的事情，实际上导致了合乎逻辑的$in

Alternates are with the aggregation framework, your mileage may vary on which is most efficient due to the number of documents in your collection, one approach with MongoDB 2.6 and upwards:

替代方案是使用聚合框架，您的里程可能因集合中的文档数量而异，哪种方法最有效，MongoDB 2.6 及更高版本的一种方法：

db.problem.aggregate([
    // Match documents that "could" meet the conditions
    { "$match": { 
        "users.user": { "$in": [ 1, 5, 7 ] } 
    }},

    // Keep your original document and a copy of the array
    { "$project": {
        "_id": {
            "_id": "$_id",
            "date": "$date",
            "users": "$users"
        },
        "users": 1,
    }},

    // Unwind the array copy
    { "$unwind": "$users" },

    // Just keeping the "user" element value
    { "$group": {
        "_id": "$_id",
        "users": { "$push": "$users.user" }
    }},

    // Compare to see if all elements are a member of the desired match
    { "$project": {
        "match": { "$setEquals": [
            { "$setIntersection": [ "$users", [ 1, 5, 7 ] ] },
            "$users"
        ]}
    }},

    // Filter out any documents that did not match
    { "$match": { "match": true } },

    // Return the original document form
    { "$project": {
        "_id": "$_id._id",
        "date": "$_id.date",
        "users": "$_id.users"
    }}
])

So that approach uses some newly introduced set operatorsin order to compare the contents, though of course you need to restructure the array in order to make the comparison.

因此，该方法使用一些新引入的集合运算符来比较内容，当然您需要重构数组以进行比较。

As pointed out, there is a direct operator to do this in $setIsSubsetwhich does the equivalent of the combined operators above in a single operator:

正如所指出的，有一个直接的运算符可以做到这一点，$setIsSubset其中在单个运算符中等效于上述组合运算符：

db.collection.aggregate([
    { "$match": { 
        "users.user": { "$in": [ 1,5,7 ] } 
    }},
    { "$project": {
        "_id": {
            "_id": "$_id",
            "date": "$date",
            "users": "$users"
        },
        "users": 1,
    }},
    { "$unwind": "$users" },
    { "$group": {
        "_id": "$_id",
        "users": { "$push": "$users.user" }
    }},
    { "$project": {
        "match": { "$setIsSubset": [ "$users", [ 1, 5, 7 ] ] }
    }},
    { "$match": { "match": true } },
    { "$project": {
        "_id": "$_id._id",
        "date": "$_id.date",
        "users": "$_id.users"
    }}
])

Or with a different approach while still taking advantage of the $sizeoperator from MongoDB 2.6:

或者使用不同的方法，同时仍然利用$sizeMongoDB 2.6中的运算符：

db.collection.aggregate([
    // Match documents that "could" meet the conditions
    { "$match": { 
        "users.user": { "$in": [ 1, 5, 7 ] } 
    }},

    // Keep your original document and a copy of the array
    // and a note of it's current size
    { "$project": {
        "_id": {
            "_id": "$_id",
            "date": "$date",
            "users": "$users"
        },
        "users": 1,
        "size": { "$size": "$users" }
    }},

    // Unwind the array copy
    { "$unwind": "$users" },

    // Filter array contents that do not match
    { "$match": { 
        "users.user": { "$in": [ 1, 5, 7 ] } 
    }},

    // Count the array elements that did match
    { "$group": {
        "_id": "$_id",
        "size": { "$first": "$size" },
        "count": { "$sum": 1 }
    }},

    // Compare the original size to the matched count
    { "$project": { 
        "match": { "$eq": [ "$size", "$count" ] } 
    }},

    // Filter out documents that were not the same
    { "$match": { "match": true } },

    // Return the original document form
    { "$project": {
        "_id": "$_id._id",
        "date": "$_id.date",
        "users": "$_id.users"
    }}
])

Which of course can still be done, though a little more long winded in versions prior to 2.6:

这当然仍然可以完成，尽管在 2.6 之前的版本中有点冗长：

db.collection.aggregate([
    // Match documents that "could" meet the conditions
    { "$match": { 
        "users.user": { "$in": [ 1, 5, 7 ] } 
    }},

    // Keep your original document and a copy of the array
    { "$project": {
        "_id": {
            "_id": "$_id",
            "date": "$date",
            "users": "$users"
        },
        "users": 1,
    }},

    // Unwind the array copy
    { "$unwind": "$users" },

    // Group it back to get it's original size
    { "$group": { 
        "_id": "$_id",
        "users": { "$push": "$users" },
        "size": { "$sum": 1 }
    }},

    // Unwind the array copy again
    { "$unwind": "$users" },

    // Filter array contents that do not match
    { "$match": { 
        "users.user": { "$in": [ 1, 5, 7 ] } 
    }},

    // Count the array elements that did match
    { "$group": {
        "_id": "$_id",
        "size": { "$first": "$size" },
        "count": { "$sum": 1 }
    }},

    // Compare the original size to the matched count
    { "$project": { 
        "match": { "$eq": [ "$size", "$count" ] } 
    }},

    // Filter out documents that were not the same
    { "$match": { "match": true } },

    // Return the original document form
    { "$project": {
        "_id": "$_id._id",
        "date": "$_id.date",
        "users": "$_id.users"
    }}
])

That generally rounds out the different ways, try them out and see what works best for you. In all likelihood the simple combination of $inwith your existing form is probably going to be the best one. But in all cases, make sure you have an index that can be selected:

这通常会完善不同的方法，尝试一下，看看哪种方法最适合您。$in很可能与您现有形式的简单组合可能是最好的。但在所有情况下，请确保您有一个可以选择的索引：

db.collection.ensureIndex({ "users.user": 1 })

Which is going to give you the best performance as long as you are accessing that in some way, as all the examples here do.

只要您以某种方式访问它，这就会为您提供最佳性能，就像这里的所有示例一样。

Verdict

判决书

I was intrigued by this so ultimately contrived a test case in order to see what had the best performance. So first some test data generation:

我对此很感兴趣，因此最终设计了一个测试用例，以了解什么具有最佳性能。所以首先生成一些测试数据：

var batch = [];
for ( var n = 1; n <= 10000; n++ ) {
    var elements = Math.floor(Math.random(10)*10)+1;

    var obj = { date: new Date(), users: [] };
    for ( var x = 0; x < elements; x++ ) {
        var user = Math.floor(Math.random(10)*10)+1,
            group = Math.floor(Math.random(10)*10)+1;

        obj.users.push({ user: user, group: group });
    }

    batch.push( obj );

    if ( n % 500 == 0 ) {
        db.problem.insert( batch );
        batch = [];
    }

}

With 10000 documents in a collection with random arrays from 1..10 in length holding random values of 1..0, I came to a match count of 430 documents (reduced from 7749 from the $inmatch ) with the following results (avg):

集合中有 10000 个文档，其中包含长度为 1..10 的随机数组，其中包含 1..0 的随机值，我得到了 430 个文档的匹配计数（从$inmatch 中的7749 减少），结果如下（平均）：

JavaScript with $inclause: 420ms
Aggregate with $size: 395ms
Aggregate with group array count : 650ms
Aggregate with two set operators : 275ms
Aggregate with $setIsSubset: 250ms

带$in子句的JavaScript ：420 毫秒
聚合时间$size：395ms
聚合组数组计数：650ms
使用两个集合运算符聚合：275 毫秒
聚合时间$setIsSubset：250ms

Noting that over the samples done all but the last two had a peakvariance of approximately 100ms faster, and the last two both exhibited 220ms response. The largest variations were in the JavaScript query which also exhibited results 100ms slower.

请注意，除了最后两个样本外，所有样本的峰值方差都快了大约 100 毫秒，而最后两个样本都表现出 220 毫秒的响应。最大的变化出现在 JavaScript 查询中，它的结果也慢了 100 毫秒。

But the point here is relative to hardware, which on my laptop under a VM is not particularly great, but gives an idea.

但这里的重点是相对于硬件，在我的笔记本电脑上的虚拟机上并不是特别好，但提供了一个想法。

So the aggregate, and specifically the MongoDB 2.6.1 version with set operators clearly wins on performance with the additional slight gain coming from $setIsSubsetas a single operator.

因此，聚合，特别是具有集合运算符的 MongoDB 2.6.1 版本显然在性能上获胜，并且$setIsSubset作为单个运算符获得了额外的轻微增益。

This is particularly interesting given (as indicated by the 2.4 compatible method) the largest cost in this process will be the $unwindstatement ( over 100ms avg ), so with the $inselection having a mean around 32ms the rest of the pipeline stages execute in less than 100ms on average. So that gives a relative idea of aggregation versus JavaScript performance.

这是特别有趣的（如 2.4 兼容方法所示）此过程中最大的成本将是$unwind语句（平均超过 100 毫秒），因此$in选择的平均时间约为 32 毫秒，其余流水线阶段的执行时间不到 100 毫秒一般。所以这给出了聚合与 JavaScript 性能的相对概念。

Answer 3

回答by Mark Bryk

I just spent a substantial portion of my day trying to implement Asya's solution above with object-comparisons rather than strict equality. So I figured I'd share it here.

我一天中的大部分时间都在尝试使用对象比较而不是严格相等来实现上述 Asya 的解决方案。所以我想我会在这里分享。

Let's say you expanded your question from userIds to full users. You want to find all documents where every item in its usersarray is present in another users array: [{user: 1, group: 3}, {user: 2, group: 5},...]

假设您将问题从 userIds 扩展到完整用户。您想查找其users数组中的每个项目都存在于另一个用户数组中的所有文档：[{user: 1, group: 3}, {user: 2, group: 5},...]

This won't work: db.collection.find({"users":{"$not":{"$elemMatch":{"$nin":[{user: 1, group: 3},{user: 2, group: 5},...]}}}}})because $nin only works for strict equality. So we need to find a different way of expressing "Not in array" for arrays of objects. And using $wherewould slow down the query too much.

这行不通：db.collection.find({"users":{"$not":{"$elemMatch":{"$nin":[{user: 1, group: 3},{user: 2, group: 5},...]}}}}})因为 $nin 只适用于严格相等。所以我们需要找到一种不同的方式来表达对象数组的“不在数组中”。并且使用$where会减慢查询太多。

Solution:

解决方案：

db.collection.find({
 "users": {
   "$not": {
     "$elemMatch": {
       // if all of the OR-blocks are true, element is not in array
       "$and": [{
         // each OR-block == true if element != that user
         "$or": [
           "user": { "ne": 1 },
           "group": { "ne": 3 }
         ]
       }, {
         "$or": [
           "user": { "ne": 2 },
           "group": { "ne": 5 }
         ]
       }, {
         // more users...
       }]
     }
   }
 }
})

To round out the logic: $elemMatch matches all documents that have a user not in the array. So $not will match all documents that have all of the users in the array.

完善逻辑： $elemMatch 匹配所有用户不在数组中的文档。所以 $not 将匹配具有数组中所有用户的所有文档。

mongodb 检查数组中的每个元素是否符合条件

提问by Wex

回答by Asya Kamsky

回答by Neil Lunn

Verdict

判决书

回答by Mark Bryk

相关推荐

最近更新

标签

mongodb 检查数组中的每个元素是否符合条件

提问by Wex

回答by Asya Kamsky

回答by Neil Lunn

Verdict

判决书

回答by Mark Bryk

相关推荐

mongodb 猫鼬和独特的领域

如何将 mongodb 客户端连接到本地 Meteor MongoDB

MongoDB 无权查询 admin.system.users

在 mongodb 中使用 findOne 获取具有最大 id 的元素

相关推荐

最近更新

标签