database 在 MongoDB 中查找重复记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26984799/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find duplicate records in MongoDB
提问by Chris
How would I find duplicate fields in a mongo collection.
我如何在 mongo 集合中找到重复的字段。
I'd like to check if any of the "name" fields are duplicates.
我想检查是否有任何“名称”字段重复。
{
"name" : "ksqn291",
"__v" : 0,
"_id" : ObjectId("540f346c3e7fc1054ffa7086"),
"channel" : "Sales"
}
Many thanks!
非常感谢!
回答by anhlc
Use aggregation on name
and get name
with count > 1
:
在使用聚合name
,并得到name
有count > 1
:
db.collection.aggregate(
{"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$project": {"name" : "$_id", "_id" : 0} }
)
To sort the results by most to least duplicates:
按最多到最少重复对结果进行排序:
db.collection.aggregate(
{"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } },
{"$sort": {"count" : -1} },
{"$project": {"name" : "$_id", "_id" : 0} }
)
To use with another column name than "name", change "$name" to "$column_name"
要与“name”以外的其他列名一起使用,请将“ $name”更改为“ $column_name”
回答by BatScream
You can find the list
of duplicate
names using the following aggregate
pipeline:
你可以找到list
的duplicate
使用下列名称aggregate
管道:
Group
all the records having similarname
.Match
thosegroups
having records greater than1
.- Then
group
again toproject
all the duplicate names as anarray
.
Group
所有具有相似的记录name
。Match
那些groups
记录大于1
.- 然后
group
再次将project
所有重复名称作为array
.
The Code:
编码:
db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])
o/p:
开/关:
{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }
回答by Juanín
The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.
如果您有一个大型数据库并且属性名称仅出现在某些文档中,那么 anhic 给出的答案可能非常低效。
To improve efficiency you can add a $match to the aggregation.
为了提高效率,您可以在聚合中添加 $match。
db.collection.aggregate(
{"$match": {"name" :{ "$ne" : null } } },
{"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
{"$match": {"count" : {"$gt": 1} } },
{"$project": {"name" : "$_id", "_id" : 0} }
)
回答by Aman shrivastava
db.collectionName.aggregate([
{ $group:{
_id:{Name:"$name"},
uniqueId:{$addToSet:"$_id"},
count:{"$sum":1}
}
},
{ $match:{
duplicate:{"$gt":1}
}
}
]);
First Group Query the group according to the fields.
第一组 根据字段查询组。
Then we check the unique Id and count it, If count is greater then 1 then the field is duplicate in the entire collection so that thing is to be handle by $match query.
然后我们检查唯一的 Id 并对其进行计数,如果计数大于 1,则该字段在整个集合中是重复的,以便由 $match 查询处理。