database 在 MongoDB 中查找重复记录

Question

提问by Chris

How would I find duplicate fields in a mongo collection.

我如何在 mongo 集合中找到重复的字段。

I'd like to check if any of the "name" fields are duplicates.

我想检查是否有任何“名称”字段重复。

{
    "name" : "ksqn291",
    "__v" : 0,
    "_id" : ObjectId("540f346c3e7fc1054ffa7086"),
    "channel" : "Sales"
}

Many thanks!

非常感谢！

Answer 1

回答by anhlc

Use aggregation on nameand get namewith count > 1:

在使用聚合name，并得到name有count > 1：

db.collection.aggregate(
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

To sort the results by most to least duplicates:

按最多到最少重复对结果进行排序：

db.collection.aggregate(
    {"$group" : { "_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"_id" :{ "$ne" : null } , "count" : {"$gt": 1} } }, 
    {"$sort": {"count" : -1} },
    {"$project": {"name" : "$_id", "_id" : 0} }     
)

To use with another column name than "name", change "$name" to "$column_name"

要与“name”以外的其他列名一起使用，请将“ $name”更改为“ $column_name”

Answer 2

回答by BatScream

You can find the listof duplicatenames using the following aggregatepipeline:

你可以找到list的duplicate使用下列名称aggregate管道：

Groupall the records having similar name.
Matchthose groupshaving records greater than 1.
Then groupagain to projectall the duplicate names as an array.

Group所有具有相似的记录name。
Match那些groups记录大于1.
然后group再次将project所有重复名称作为array.

The Code:

编码：

db.collection.aggregate([
{$group:{"_id":"$name","name":{$first:"$name"},"count":{$sum:1}}},
{$match:{"count":{$gt:1}}},
{$project:{"name":1,"_id":0}},
{$group:{"_id":null,"duplicateNames":{$push:"$name"}}},
{$project:{"_id":0,"duplicateNames":1}}
])

o/p:

开/关：

{ "duplicateNames" : [ "ksqn291", "ksqn29123213Test" ] }

Answer 3

回答by Juanín

The answer anhic gave can be very inefficient if you have a large database and the attribute name is present only in some of the documents.

如果您有一个大型数据库并且属性名称仅出现在某些文档中，那么 anhic 给出的答案可能非常低效。

To improve efficiency you can add a $match to the aggregation.

为了提高效率，您可以在聚合中添加 $match。

db.collection.aggregate(
    {"$match": {"name" :{ "$ne" : null } } }, 
    {"$group" : {"_id": "$name", "count": { "$sum": 1 } } },
    {"$match": {"count" : {"$gt": 1} } }, 
    {"$project": {"name" : "$_id", "_id" : 0} }
)

Answer 4

回答by Aman shrivastava

db.collectionName.aggregate([
{ $group:{
    _id:{Name:"$name"},
    uniqueId:{$addToSet:"$_id"},
    count:{"$sum":1}
  } 
},
{ $match:{
  duplicate:{"$gt":1}
 }
}
]);

First Group Query the group according to the fields.

第一组根据字段查询组。

Then we check the unique Id and count it, If count is greater then 1 then the field is duplicate in the entire collection so that thing is to be handle by $match query.

然后我们检查唯一的 Id 并对其进行计数，如果计数大于 1，则该字段在整个集合中是重复的，以便由 $match 查询处理。

database 在 MongoDB 中查找重复记录

提问by Chris

回答by anhlc

回答by BatScream

回答by Juanín

回答by Aman shrivastava

相关推荐

最近更新

标签

database 在 MongoDB 中查找重复记录

提问by Chris

回答by anhlc

回答by BatScream

回答by Juanín

回答by Aman shrivastava

相关推荐

database 在已经运行 64 位 Oracle 数据库服务器的 Windows 服务器上安装 Oracle 32 位客户端

database 在 SQL SERVER 中监视 SQL 查询的进度

database 如何备份sqlite数据库？

database 使用 codeigniter 将数据库中字段的值更新为 1

相关推荐

最近更新

标签