mongodb 如何从数组中删除重复的条目?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9862255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 12:33:55  来源:igfitidea点击:

How to remove duplicate entries from an array?

mongodbduplicates

提问by P K

In the following example, "Algorithms in C++"is present twice.

在下面的例子中,"Algorithms in C++"出现了两次。

The $unsetmodifier can remove a particular field but how to remove an entry from a field?

$unset修改可以删除特定的领域,但如何从一个字段中输入?

{
  "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), 
  "favorites" : {
    "books" : [
      "Algorithms in C++",    
      "The Art of Computer Programming", 
      "Graph Theory",      
      "Algorithms in C++"
    ]
  }, 
  "name" : "robert"
}

采纳答案by Baba

What you have to do is use map reduce to detect and count duplicate tags .. then use $setto replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),

您需要做的是使用 map reduce 来检测和计算重复标签 .. 然后使用$set基于替换整本书{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),

This has been discussed sevel times here .. please seee

这已经在这里讨论了七次..请参阅

Removing duplicate records using MapReduce

使用 MapReduce 删除重复记录

Fast way to find duplicates on indexed column in mongodb

在 mongodb 中的索引列上查找重复项的快速方法

http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce

http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce

http://www.mongodb.org/display/DOCS/MapReduce

http://www.mongodb.org/display/DOCS/MapReduce

How to remove duplicate record in MongoDB by MapReduce?

如何通过 MapReduce 删除 MongoDB 中的重复记录?

回答by kynan

As of MongoDB 2.2 you can use the aggregation frameworkwith an $unwind, $groupand $projectstage to achieve this:

从 MongoDB 2.2 开始,您可以使用带有,和stage的聚合框架来实现这一点:$unwind$group$project

db.users.aggregate([{$unwind: '$favorites.books'},
                    {$group: {_id: '$_id',
                              books: {$addToSet: '$favorites.books'},
                              name: {$first: '$name'}}},
                    {$project: {'favorites.books': '$books', name: '$name'}}
                   ])

Note the need for the $projectto rename the favoritesfield, since $groupaggregate fields cannot be nested.

请注意需要$project重命名favorites字段,因为$group聚合字段不能嵌套。

回答by Dennis Golomazov

The easiest solution is to use setUnion(Mongo 2.6+):

最简单的解决方案是使用setUnion(Mongo 2.6+):

db.users.aggregate([
    {'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])

Another (more lengthy) version that is based on the idea from @kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):

另一个(更冗长)版本基于@kynan's answer的想法,但保留了所有其他字段而没有明确指定它们(Mongo 3.4+):

> db.users.aggregate([
    {'$unwind': {
        'path': '$favorites.books',
        // output the document even if its list of books is empty
        'preserveNullAndEmptyArrays': true
    }},
    {'$group': {
        '_id': '$_id',
        'books': {'$addToSet': '$favorites.books'},
        // arbitrary name that doesn't exist on any document
        '_other_fields': {'$first': '$$ROOT'},
    }},
    {
      // the field, in the resulting document, has the value from the last document merged for the field. (c) docs
      // so the new deduped array value will be used
      '$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
    },
    // this stage wouldn't be necessary if the field wasn't nested
    {'$addFields': {'favorites.books': '$books'}},
    {'$project': {'_other_fields': 0, 'books': 0}}
])

{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" : 
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }    

回答by Xavier Guihot

Starting in Mongo 4.4, the $functionaggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.

从 开始Mongo 4.4$function聚合运算符允许应用自定义 javascript 函数来实现 MongoDB 查询语言不支持的行为。

For instance, in order to remove duplicates from an array:

例如,为了从数组中删除重复项:

// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory",
//     "Algorithms in C++"
//   ]},
//   "name" : "robert"
// }
db.collection.aggregate(
  { $set:
    { "favorites.books":
      { $function: {
          body: function(books) { return books.filter((v, i, a) => a.indexOf(v) === i) },
          args: ["$favorites.books"],
          lang: "js"
      }}
    }
  }
)
// {
//   "favorites" : { "books" : [
//     "Algorithms in C++",
//     "The Art of Computer Programming",
//     "Graph Theory"
//   ]},
//   "name" : "robert"
// }

This has the advantages of:

这具有以下优点:

  • keeping the original order of the array (if that's not a requirement, then prefer @Dennis Golomazov's $setUnion answer)
  • being more efficient than a combination of expensive $unwindand $groupstages.

$functiontakes 3 parameters:

$function需要3个参数:

  • body, which is the function to apply, whose parameter is the array to modify.
  • args, which contains the fields from the record that the bodyfunction takes as parameter. In our case "$favorites.books".
  • lang, which is the language in which the bodyfunction is written. Only jsis currently available.
  • body,这是要应用的函数,其参数是要修改的数组。
  • args,其中包含该body函数作为参数的记录字段。在我们的情况下"$favorites.books"
  • lang,这body是编写函数的语言。仅js当前可用。