mongodb 如何从数组中删除重复的条目?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9862255/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to remove duplicate entries from an array?
提问by P K
In the following example, "Algorithms in C++"
is present twice.
在下面的例子中,"Algorithms in C++"
出现了两次。
The $unset
modifier can remove a particular field but how to remove an entry from a field?
该$unset
修改可以删除特定的领域,但如何从一个字段中输入?
{
"_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
"favorites" : {
"books" : [
"Algorithms in C++",
"The Art of Computer Programming",
"Graph Theory",
"Algorithms in C++"
]
},
"name" : "robert"
}
采纳答案by Baba
What you have to do is use map reduce to detect and count duplicate tags .. then use $set
to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
您需要做的是使用 map reduce 来检测和计算重复标签 .. 然后使用$set
基于替换整本书{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
This has been discussed sevel times here .. please seee
这已经在这里讨论了七次..请参阅
Removing duplicate records using MapReduce
Fast way to find duplicates on indexed column in mongodb
http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce
http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce
http://www.mongodb.org/display/DOCS/MapReduce
http://www.mongodb.org/display/DOCS/MapReduce
回答by kynan
As of MongoDB 2.2 you can use the aggregation frameworkwith an $unwind
, $group
and $project
stage to achieve this:
从 MongoDB 2.2 开始,您可以使用带有,和stage的聚合框架来实现这一点:$unwind
$group
$project
db.users.aggregate([{$unwind: '$favorites.books'},
{$group: {_id: '$_id',
books: {$addToSet: '$favorites.books'},
name: {$first: '$name'}}},
{$project: {'favorites.books': '$books', name: '$name'}}
])
Note the need for the $project
to rename the favorites
field, since $group
aggregate fields cannot be nested.
请注意需要$project
重命名favorites
字段,因为$group
聚合字段不能嵌套。
回答by Dennis Golomazov
The easiest solution is to use setUnion(Mongo 2.6+):
最简单的解决方案是使用setUnion(Mongo 2.6+):
db.users.aggregate([
{'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])
Another (more lengthy) version that is based on the idea from @kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):
另一个(更冗长)版本基于@kynan's answer的想法,但保留了所有其他字段而没有明确指定它们(Mongo 3.4+):
> db.users.aggregate([
{'$unwind': {
'path': '$favorites.books',
// output the document even if its list of books is empty
'preserveNullAndEmptyArrays': true
}},
{'$group': {
'_id': '$_id',
'books': {'$addToSet': '$favorites.books'},
// arbitrary name that doesn't exist on any document
'_other_fields': {'$first': '$$ROOT'},
}},
{
// the field, in the resulting document, has the value from the last document merged for the field. (c) docs
// so the new deduped array value will be used
'$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
},
// this stage wouldn't be necessary if the field wasn't nested
{'$addFields': {'favorites.books': '$books'}},
{'$project': {'_other_fields': 0, 'books': 0}}
])
{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" :
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }
回答by Xavier Guihot
Starting in Mongo 4.4
, the $function
aggregation operator allows applying a custom javascript function to implement behaviour not supported by the MongoDB Query Language.
从 开始Mongo 4.4
,$function
聚合运算符允许应用自定义 javascript 函数来实现 MongoDB 查询语言不支持的行为。
For instance, in order to remove duplicates from an array:
例如,为了从数组中删除重复项:
// {
// "favorites" : { "books" : [
// "Algorithms in C++",
// "The Art of Computer Programming",
// "Graph Theory",
// "Algorithms in C++"
// ]},
// "name" : "robert"
// }
db.collection.aggregate(
{ $set:
{ "favorites.books":
{ $function: {
body: function(books) { return books.filter((v, i, a) => a.indexOf(v) === i) },
args: ["$favorites.books"],
lang: "js"
}}
}
}
)
// {
// "favorites" : { "books" : [
// "Algorithms in C++",
// "The Art of Computer Programming",
// "Graph Theory"
// ]},
// "name" : "robert"
// }
This has the advantages of:
这具有以下优点:
- keeping the original order of the array (if that's not a requirement, then prefer @Dennis Golomazov's $setUnion answer)
- being more efficient than a combination of expensive
$unwind
and$group
stages.
- 保持数组的原始顺序(如果这不是必需的,那么更喜欢@Dennis Golomazov 的 $setUnion 答案)
- 比昂贵
$unwind
和$group
阶段的组合更有效。
$function
takes 3 parameters:
$function
需要3个参数:
body
, which is the function to apply, whose parameter is the array to modify.args
, which contains the fields from the record that thebody
function takes as parameter. In our case"$favorites.books"
.lang
, which is the language in which thebody
function is written. Onlyjs
is currently available.
body
,这是要应用的函数,其参数是要修改的数组。args
,其中包含该body
函数作为参数的记录字段。在我们的情况下"$favorites.books"
。lang
,这body
是编写函数的语言。仅js
当前可用。