mongodb 如何在mongo中通过查询有效地删除文档?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10014181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 12:35:28  来源:igfitidea点击:

How to delete documents by query efficiently in mongo?

mongodb

提问by mark

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):

我有一个查询,它选择要删除的文档。现在,我手动删除它们,就像这样(使用 python):

for id in mycoll.find(query, fields={}):
  mycoll.remove(id)

This does not seem to be very efficient. Is there a better way?

这似乎不是很有效。有没有更好的办法?

EDIT

编辑

OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:

好吧,我为忘记提及查询细节而道歉,因为这很重要。这是完整的python代码:

def reduce_duplicates(mydb, max_group_size):
  # 1. Count the group sizes
  res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
  # 2. For each entry from the filter scratch collection having count > max_group_size
  deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
  for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
    key = entry['_id']
    group_size = int(entry['value'])
    # 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
    for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
      mydb.static.remove(id)
  return res['counts']['input']

So, what does it do? It reduces the number of duplicate keys to at most max_group_sizeper key value, leaving only the newest records. It works like this:

那么,它有什么作用呢?它将重复键的数量减少到最多max_group_size每个键值,只留下最新的记录。它是这样工作的:

  1. MR the data to (key, count)pairs.
  2. Iterate over all the pairs with count > max_group_size
  3. Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_sizeoldest records
  4. Delete each and every found record.
  1. MR 数据(key, count)成对。
  2. 迭代所有对 count > max_group_size
  3. 按 查询数据key,同时按时间戳升序排序(最旧的在前)并将结果限制为count - max_group_size最旧的记录
  4. 删除每条找到的记录。

As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-removeand this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.

如您所见,这完成了将重复项减少到最多 N 个最新记录的任务。所以,最后两个步骤是foreach-found-remove,这是我的问题的重要细节,它改变了一切,我必须对此更加具体 - 抱歉。

Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:

现在,关于 collection remove 命令。它确实接受查询,但我的包括排序和限制。我可以用 remove 来做吗?好吧,我已经尝试过:

mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])

This attempt fails miserably. Moreover, it seems to screw mongo.Observe:

这次尝试惨遭失败。此外,它似乎搞砸了 mongo.Observe:

C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database

Needless to say, that the foreach-found-remove approach works and yields the expected results.

不用说,foreach-found-remove 方法有效并产生了预期的结果。

Now, I hope I have provided enough context and (hopefully) have restored my lost honour.

现在,我希望我提供了足够的背景信息,并且(希望)恢复了我失去的荣誉。

回答by Sergio Tulentsev

You can use a query to remove all matching documents

您可以使用查询删除所有匹配的文档

var query = {name: 'John'};
db.collection.remove(query);

Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.

但是要小心,如果匹配文档的数量很多,您的数据库响应速度可能会变慢。通常建议以较小的块删除文档。

Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.

假设您有 10 万个文档要从集合中删除。执行 100 次删除 1k 文档的查询比执行 1 次删除所有 100k 文档的查询要好。

回答by Pablo Santa Cruz

You can remove it directly using MongoDB scripting language:

您可以使用 MongoDB 脚本语言直接删除它:

db.mycoll.remove({_id:'your_id_here'});

回答by mils

Would deleteMany()be more efficient? I've recently found that remove()is quite slow for 6m documents in a 100m doc collection. Documentation at (https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany)

deleteMany()更有效率吗?我最近发现remove()对于 100m 文档集合中的 6m 文档来说这很慢。文档位于(https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany

db.collection.deleteMany(
   <filter>,
   {
      writeConcern: <document>,
      collation: <document>
   }
)

回答by Manoj Singh

Run this query in cmd

运行此查询 cmd

db.users.remove( {"_id": ObjectId("5a5f1c472ce1070e11fde4af")});

db.users.remove({"_id": ObjectId("5a5f1c472ce1070e11fde4af")});

If you are using node.js write this code

如果您使用的是 node.js,请编写此代码

User.remove({ _id: req.body.id },, function(err){...});

回答by HMagdy

I would recommend paging if large number of records.

如果大量记录,我会建议分页。

First: Get the count of data you want to delete:

首先:获取要删除的数据的数量:

-------------------------- COUNT --------------------------
var query= {"FEILD":"XYZ", 'DATE': {$lt:new ISODate("2019-11-10")}};
db.COL.aggregate([
    {$match:query},
    {$count: "all"}
  ])

Second: Start deleting chunk by chunk:

第二:开始逐块删除:

-------------------------- DELETE --------------------------
var query= {"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var cursor = db.COL.aggregate([
    {$match:query},
    { $limit : 5 }
  ])
cursor.forEach(function (doc){
    db.COL.remove({"_id": doc._id});
});

and this should be faster:

这应该更快:

var query={"FEILD":"XYZ", 'date': {$lt:new ISODate("2019-11-10")}};
var ids = db.COL.find(query, {_id: 1}).limit(5);
db.tags.deleteMany({"_id": { "$in": ids.map(r => r._id)}});