mongodb 有没有办法在MongoDB中恢复最近删除的文档?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25802786/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is there any way to recover recently deleted documents in MongoDB?
提问by trex
I have removed some documents in my last query by mistake, Is there any way to rollback my last query mongo collection.
我错误地删除了上次查询中的一些文档,有什么办法可以回滚我上次查询的 mongo 集合。
Here it is my last query :
这是我的最后一个查询:
db.foo.remove({ "name" : "some_x_name"})
Is there any rollback/undo option? Can I get my data back?
是否有任何回滚/撤消选项?我可以取回我的数据吗?
回答by Adam Comerford
There is no rollback option (rollback has a different meaningin a MongoDB context), and strictly speaking there is no supported way to get these documents back - the precautions you can/should take are covered in the comments. With that said however, if you are running a replica set, even a single node replica set, then you have an oplog
. With an oplog
that covers when the documents were inserted, you may be able to recover them.
没有回滚选项(回滚在 MongoDB 上下文中具有不同的含义),严格来说,没有支持的方法来取回这些文档 - 您可以/应该采取的预防措施包含在评论中。但是,话虽如此,如果您正在运行副本集,甚至是单节点副本集,那么您将拥有一个oplog
. 使用oplog
覆盖文档插入时间的选项卡,您可以恢复它们。
The easiest way to illustrate this is with an example. I will use a simplified example with just 100 deleted documents that need to be restored. To go beyond this (huge number of documents, or perhaps you wish to only selectively restore etc.) you will either want to change the code to iterate over a cursor or write this using your language of choice outside the MongoDB shell. The basic logic remains the same.
说明这一点的最简单方法是举个例子。我将使用一个简单的示例,其中仅包含 100 个需要恢复的已删除文档。要超越这一点(大量文档,或者您可能只想有选择地恢复等),您要么希望更改代码以遍历游标,要么在 MongoDB shell 之外使用您选择的语言编写此代码。基本逻辑保持不变。
First, let's create our example collection foo
in the database dropTest
. We will insert 100 documents without a name
field and 100 documents with an identical name
field so that they can be mistakenly removed later:
首先,让我们foo
在数据库中创建我们的示例集合dropTest
。我们将插入 100 个没有name
字段的文档和 100 个具有相同name
字段的文档,以便以后可以错误地删除它们:
use dropTest;
for(i=0; i < 100; i++){db.foo.insert({_id : i})};
for(i=100; i < 200; i++){db.foo.insert({_id : i, name : "some_x_name"})};
Now, let's simulate the accidental removal of our 100 name
documents:
现在,让我们模拟一下我们 100 个name
文档的意外删除:
> db.foo.remove({ "name" : "some_x_name"})
WriteResult({ "nRemoved" : 100 })
Because we are running in a replica set, we still have a record of these documents in the oplog
(being inserted) and thankfully those inserts have not (yet) fallen off the end of the oplog
(the oplog
is a capped collectionremember) . Let's see if we can find them:
因为我们在一个副本集中运行,我们仍然有这些文档的记录oplog
(被插入),谢天谢地,这些插入还没有(还)从 的末尾脱落oplog
(记住这oplog
是一个上限集合)。让我们看看我们是否能找到它们:
use local;
db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}).count();
100
The count looks correct, we seem to have our documents still. I know from experience that the only piece of the oplog
entry we will need here is the o
field, so let's add a projection to only return that (output snipped for brevity, but you get the idea):
计数看起来是正确的,我们似乎还有我们的文件。我从经验中知道,oplog
我们在这里唯一需要的条目是o
字段,所以让我们添加一个投影来只返回那个(输出为简洁起见,但你明白了):
db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}, {"o" : 1});
{ "o" : { "_id" : 100, "name" : "some_x_name" } }
{ "o" : { "_id" : 101, "name" : "some_x_name" } }
{ "o" : { "_id" : 102, "name" : "some_x_name" } }
{ "o" : { "_id" : 103, "name" : "some_x_name" } }
{ "o" : { "_id" : 104, "name" : "some_x_name" } }
To re-insert those documents, we can just store them in an array, then iterate over the array and insert the relevant pieces. First, let's create our array:
要重新插入这些文档,我们可以将它们存储在一个数组中,然后遍历该数组并插入相关部分。首先,让我们创建我们的数组:
var deletedDocs = db.oplog.rs.find({op : "i", ns : "dropTest.foo", "o.name" : "some_x_name"}, {"o" : 1}).toArray();
> deletedDocs.length
100
Next we remind ourselves that we only have 100 docs in the collection now, then loop over the 100 inserts, and finally revalidate our counts:
接下来我们提醒自己,现在集合中只有 100 个文档,然后循环遍历 100 个插入,最后重新验证我们的计数:
use dropTest;
db.foo.count();
100
// simple for loop to re-insert the relevant elements
for (var i = 0; i < deletedDocs.length; i++) {
db.foo.insert({_id : deletedDocs[i].o._id, name : deletedDocs[i].o.name});
}
// check total and name counts again
db.foo.count();
200
db.foo.count({name : "some_x_name"})
100
And there you have it, with some caveats:
有了它,还有一些警告:
- This is not meant to be a true restoration strategy, look at backups (MMS, other), delayed secondaries for that, as mentioned in the comments
- It's not going to be particularly quick to query the documents out of the oplog (any oplog query is a table scan) on a large busy system.
- The documents may age out of the oplog at any time (you can, of course, make a copy of the oplog for later use to give you more time)
- Depending on your workload you might have to de-dupe the results before re-inserting them
- Larger sets of documents will be too large for an array as demonstrated, so you will need to iterate over a cursor instead
- The format of the
oplog
is considered internal and may change at any time (without notice), so use at your own risk
- 这并不意味着是一个真正的恢复策略,看看备份(彩信,其他),延迟辅助,如评论中所述
- 在大型繁忙的系统上从 oplog(任何 oplog 查询都是表扫描)中查询文档不会特别快。
- 文档可能随时从 oplog 中老化(当然,您可以制作 oplog 的副本以备后用,以便给您更多时间)
- 根据您的工作量,您可能必须在重新插入结果之前删除重复数据
- 如图所示,较大的文档集对于数组来说太大了,因此您需要迭代游标
- 的格式
oplog
被认为是内部的,可能会随时更改(恕不另行通知),因此使用风险自负
回答by Yazad
While I understand this is a bit old but I wanted to share something that I researched in this area that may be useful to others with a similar problem.
虽然我知道这有点旧,但我想分享我在该领域研究的一些内容,这些内容可能对遇到类似问题的其他人有用。
The fact is that MongoDB does not Physically delete data immediately - it only marks it for deletion. This is however version specific and there is currently no documentation or standardization - which could enable a third party tool developer (or someone in desperate need) to build a tool or write a simple script reliably that works across versions. I opened a ticket for this - https://jira.mongodb.org/browse/DOCS-5151.
事实是 MongoDB 不会立即物理删除数据 - 它只会将其标记为删除。然而,这是特定于版本的,目前没有文档或标准化——这可以使第三方工具开发人员(或迫切需要的人)构建一个工具或编写一个可靠的跨版本工作的简单脚本。我为此开了一张票 - https://jira.mongodb.org/browse/DOCS-5151。
I did explore one option which is at a much lower level and may need fine tuning based on the version of MongoDB used. Understandably too low level for most people's linking, however it works and can be handy when all else fails.
我确实探索了一个较低级别的选项,可能需要根据所使用的 MongoDB 版本进行微调。对于大多数人的链接来说,可以理解的是太低的级别,但是它可以工作并且在所有其他方法都失败时可以很方便。
My approach involves directly working with the binary in the file and using a Python script (or commands) to identify, read and unpack (BSON) the deleted data.
我的方法涉及直接使用文件中的二进制文件并使用 Python 脚本(或命令)来识别、读取和解包 (BSON) 已删除的数据。
My approach is inspired by thisGitHub project (I am NOT the developer of this project). Here on my blogI have tried to simplify the script and extract a specific deleted record from a Raw MongoDB file.
我的方法受到这个GitHub 项目的启发(我不是这个项目的开发者)。在我的博客上,我尝试简化脚本并从原始 MongoDB 文件中提取特定的已删除记录。
Currently a record is marked for deletion as "\xee
" at the start of the record. This is what a deleted record looks like in the raw db file,
当前,在记录\xee
的开头将记录标记为“ ”以进行删除。这是原始数据库文件中已删除记录的样子,
‘\xee\xee\xee\xee\x07_id\x00U\x19\xa6g\x9f\xdf\x19\xc1\xads\xdb\xa8\x02name\x00\x04\x00\x00\x00AAA\x00\x01marks\x00\x00\x00\x00\x00\x00@\x9f@\x00′
I replaced the first block with the size of the record which I identified earlier based on other records.
我用我之前根据其他记录确定的记录大小替换了第一个块。
y=”3\x00\x00\x00″+x[20804:20800+51]
Finally using the BSON package (that comes with pymongo), I decoded the binary to a Readable object.
最后使用 BSON 包(随 pymongo 提供),我将二进制文件解码为可读对象。
bson.decode_all(y)
[{u'_id': ObjectId(‘5519a6679fdf19c1ad73dba8′), u'name': u'AAA', u'marks': 2000.0}]
This BSON is a python object now and can be dumped into a recover collection or simply logged somewhere.
这个 BSON 现在是一个 python 对象,可以转储到恢复集合中或简单地记录在某个地方。
Needless to say this or any other recovery technique should be ideally done in a staging area on a backup copy of the database file.
毋庸置疑,这种或任何其他恢复技术最好在数据库文件备份副本的暂存区中完成。