database 在 MongoDB 中查找重复记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26984799/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:50:50  来源:igfitidea点击:

Find duplicate records in MongoDB

mongodbaggregation-frameworkdatabase

提问by Chris

How would I find duplicate fields in a mongo collection.

我如何在 mongo 集合中找到重复的字段。

I'd like to check if any of the "name" fields are duplicates.

我想检查是否有任何“名称”字段重复。

{
    "name" : "ksqn291",
    "__v" : 0,
    "_id" : ObjectId("540f346c3e7fc1054ffa7086"),
    "channel" : "Sales"
}

Many thanks!

非常感谢!

回答by anhlc

Use aggregation on nameand get namewith count > 1:

在使用聚合name,并得到namecount > 1

db.collection.aggregate(
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

To sort the results by most to least duplicates:

按最多到最少重复对结果进行排序:

db.collection.aggregate(
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$sort": {"count" : -1} },
    {"$project": {"name" : "$_id", "_id" : 0} }     
)

To use with another column name than "name", change "$name" to "$column_name"

要与“name”以外的其他列名一起使用,请将$name”更改为“ $column_name

回答by BatScream

You can find the listof duplicatenames using the following aggregatepipeline:

你可以找到listduplicate使用下列名称aggregate管道:

  • Groupall the records having similar name.
  • Matchthose groupshaving records greater than 1.
  • Then groupagain to projectall the duplicate names as an array.
  • Group所有具有相似的记录name
  • Match那些groups记录大于1.
  • 然后group再次将project所有重复名称作为array.

The Code:

编码:

db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])

o/p:

开/关:

{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }

回答by Juanín

The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.

如果您有一个大型数据库并且属性名称仅出现在某些文档中,那么 anhic 给出的答案可能非常低效。

To improve efficiency you can add a $match to the aggregation.

为了提高效率,您可以在聚合中添加 $match。

db.collection.aggregate(
    {"$match": {"name" :{ "$ne" : null } } }, 
    {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

回答by Aman shrivastava

db.collectionName.aggregate([
{ $group:{
    _id:{Name:"$name"},
    uniqueId:{$addToSet:"$_id"},
    count:{"$sum":1}
  } 
},
{ $match:{
  duplicate:{"$gt":1}
 }
}
]);

First Group Query the group according to the fields.

第一组 根据字段查询组。

Then we check the unique Id and count it, If count is greater then 1 then the field is duplicate in the entire collection so that thing is to be handle by $match query.

然后我们检查唯一的 Id 并对其进行计数,如果计数大于 1,则该字段在整个集合中是重复的,以便由 $match 查询处理。