减少 MongoDB 数据库文件大小

Question

提问by Meuble

I've got a MongoDB database that was once large (>3GB). Since then, documents have been deleted and I was expecting the size of the database files to decrease accordingly.

我有一个曾经很大（> 3GB）的 MongoDB 数据库。从那时起，文档已被删除，我期待数据库文件的大小相应减少。

But since MongoDB keeps allocated space, the files are still large.

但是由于 MongoDB 保持分配的空间，文件仍然很大。

I read here and there that the admin command mongod --repairis used to free the unused space, but I don't have enough space on the disk to run this command.

我在这里和那里读到 admin 命令mongod --repair用于释放未使用的空间，但磁盘上没有足够的空间来运行此命令。

Do you know a way I can freed up unused space?

你知道我可以释放未使用空间的方法吗？

Answer 1

回答by Gates VP

UPDATE:with the compactcommand andWiredTiger it looks like the extra disk space will actually be released to the OS.

更新：使用compact命令和WiredTiger 看起来额外的磁盘空间实际上将释放给 OS。

UPDATE:as of v1.9+ there is a compactcommand.

更新：从 v1.9+ 开始，有一个compact命令。

This command will perform a compaction "in-line". It will still need some extra space, but not as much.

此命令将执行“内嵌”压缩。它仍然需要一些额外的空间，但不是那么多。

MongoDB compresses the files by:

MongoDB 通过以下方式压缩文件：

copying the files to a new location
looping through the documents and re-ordering / re-solving them
replacing the original files with the new files

将文件复制到新位置
遍历文档并重新排序/重新解决它们
用新文件替换原始文件

You can do this "compression" by running mongod --repairor by connecting directly and running db.repairDatabase().

您可以通过运行mongod --repair或直接连接并运行db.repairDatabase().

In either case you need the space somewhere to copy the files. Now I don't know why you don't have enough space to perform a compress, however, you do have some options if you have another computer with more space.

无论哪种情况，您都需要在某处留出空间来复制文件。现在我不知道为什么你没有足够的空间来执行压缩，但是，如果你有另一台有更多空间的计算机，你确实有一些选择。

Export the database to another computer with Mongo installed (using mongoexport) and then you can Import that same database (using mongoimport). This will result in a new database that is more compressed. Now you can stop the original mongodreplace with the new database files and you're good to go.
Stop the current mongod and copy the database files to a bigger computer and run the repair on that computer. You can then move the new database files back to the original computer.

将数据库导出到安装了 Mongo 的另一台计算机（使用mongoexport），然后您可以导入相同的数据库（使用mongoimport）。这将导致新数据库更加压缩。现在您可以停止mongod用新的数据库文件替换原来的文件，一切顺利。
停止当前的 mongod 并将数据库文件复制到更大的计算机并在该计算机上运行修复。然后，您可以将新的数据库文件移回原始计算机。

There is not currently a good way to "compact in place" using Mongo. And Mongo can definitely suck up a lot of space.

目前还没有使用 Mongo 进行“就地压缩”的好方法。而且 Mongo 绝对可以占用大量空间。

The best strategy right now for compaction is to run a Master-Slave setup. You can then compact the Slave, let it catch up and switch them over. I know still a little hairy. Maybe the Mongo team will come up with better in place compaction, but I don't think it's high on their list. Drive space is currently assumed to be cheap (and it usually is).

现在压缩的最佳策略是运行主从设置。然后你可以压缩 Slave，让它赶上并切换它们。我知道还是有点毛茸茸的。也许 Mongo 团队会提出更好的就地压缩，但我不认为它在他们的名单上名列前茅。驱动器空间目前被认为是便宜的（而且通常是这样）。

Answer 2

回答by user435943

I had the same problem, and solved by simply doing this at the command line:

我遇到了同样的问题，只需在命令行执行此操作即可解决：

mongodump -d databasename
echo 'db.dropDatabase()'?|?mongo databasename
mongorestore dump/databasename

Answer 3

回答by awaage

It looks like Mongo v1.9+ has support for the compact in place!

看起来 Mongo v1.9+ 已经支持压缩到位了！

> db.runCommand( { compact : 'mycollectionname' } )

See the docs here: http://docs.mongodb.org/manual/reference/command/compact/

请参阅此处的文档：http: //docs.mongodb.org/manual/reference/command/compact/

"Unlike repairDatabase, the compact command does not require double disk space to do its work. It does require a small amount of additional space while working. Additionally, compact is faster."

“与 repairDatabase 不同，compact 命令不需要双倍磁盘空间来完成其工作。它在工作时确实需要少量额外空间。此外，compact 速度更快。”

Answer 4

回答by OzzyCzech

Compact all collections in current database

压缩当前数据库中的所有集合

db.getCollectionNames().forEach(function (collectionName) {
    print('Compacting: ' + collectionName);
    db.runCommand({ compact: collectionName });
});

Answer 5

回答by David J.

If you need to run a full repair, use the repairpathoption. Point it to a disk with more available space.

如果您需要运行完全修复，请使用该repairpath选项。将其指向具有更多可用空间的磁盘。

For example, on my Mac I've used:

例如，在我的 Mac 上我使用过：

mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair

Update: Per MongoDB Core Server Ticket 4266, you may need to add --nojournalto avoid an error:

更新：根据MongoDB Core Server Ticket 4266，您可能需要添加--nojournal以避免错误：

mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair --nojournal

Answer 6

回答by Salvador Dali

Starting with 2.8 version of Mongo, you can use compression. You will have 3 levels of compression with WiredTiger engine, mmap (which is default in 2.6 does not provide compression):

从2.8 版本的 Mongo 开始，您可以使用压缩。您将使用 WiredTiger 引擎进行 3 级压缩，mmap（2.6 中的默认值不提供压缩）：

None
snappy(by default)
zlib

没有任何
活泼（默认）
zlib

Here is an example of how much space will you be able to save for 16 GB of data:

以下是您可以为 16 GB 数据节省多少空间的示例：

enter image description here

在此处输入图片说明

data is taken from thisarticle.

数据取自这篇文章。

Answer 7

回答by Karthickkumar Nagaraj

We need solve 2 ways, based on StorageEngine.

我们需要解决 2 种方式，基于 StorageEngine。

1. MMAP() engine:

1. MMAP() 引擎：

command: db.repairDatabase()

命令：db.repairDatabase()

NOTE:repairDatabase requires free disk space equal to the size of your current data set plus 2 gigabytes. If the volume that holds dbpath lacks sufficient space, you can mount a separate volume and use that for the repair. When mounting a separate volume for repairDatabase you must run repairDatabase from the command line and use the --repairpath switch to specify the folder in which to store temporary repair files. eg: Imagine DB size is 120 GB means, (120*2)+2 = 242GB Hard Disk space required.

注意：repairDatabase 需要等于当前数据集大小加上 2 GB 的可用磁盘空间。如果保存 dbpath 的卷空间不足，您可以安装一个单独的卷并使用它进行修复。为 repairDatabase 安装单独的卷时，您必须从命令行运行 repairDatabase 并使用 --repairpath 开关指定存储临时修复文件的文件夹。例如：想象一下 DB 大小是 120 GB 意味着，(120*2)+2 = 242GB 所需的硬盘空间。

another way you do collection wise, command: db.runCommand({compact: 'collectionName'})

另一种收集方式，命令： db.runCommand({compact: 'collectionName'})

2. WiredTiger:Its automatically resolved it-self.

2. WiredTiger：它自己自动解决的。

Answer 8

回答by kevinadi

There has been some considerable confusion over space reclamation in MongoDB, and some recommended practice are downright dangerous to do in certain deployment types. More details below:

关于 MongoDB 中的空间回收存在一些相当大的混淆，并且在某些部署类型中，一些推荐的做法是非常危险的。更多详情如下：

TL;DRrepairDatabaseattempts to salvage data from a standalone MongoDB deployments that is trying to recover from a disk corruption. If it recovers space, it is purely a side effect. Recovering space should never be the primary consideration of running repairDatabase.

TL;DRrepairDatabase尝试从试图从磁盘损坏中恢复的独立 MongoDB 部署中抢救数据。如果它恢复了空间，那纯粹是副作用。恢复空间永远不应该是运行的首要考虑repairDatabase。

Recover space in a standalone node

恢复独立节点中的空间

WiredTiger:For a standalone node with WiredTiger, running compactwill release space to the OS, with one caveat: The compactcommand on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833which was fixed in MongoDB 3.2.3. Prior to this version, compacton WiredTiger could silently fail.

WiredTiger：对于带有 WiredTiger 的独立节点，运行compact将向操作系统释放空间，但有一个警告：compactMongoDB 3.0.x 上 WiredTiger 上的命令受此错误的影响：SERVER-21833，已在 MongoDB 3.2.3 中修复。在此版本之前，compactWiredTiger 可能会默默地失败。

MMAPv1:Due to the way MMAPv1 works, there is no safe and supported method to recover space using the MMAPv1 storage engine. compactin MMAPv1 will defragment the data files, potentially making more space available for new documents, but it will not release space back to the OS.

MMAPv1：由于 MMAPv1 的工作方式，没有安全且受支持的方法来使用 MMAPv1 存储引擎恢复空间。compact在 MMAPv1 中会对数据文件进行碎片整理，可能会为新文档提供更多可用空间，但不会将空间释放回操作系统。

You maybe able to run repairDatabaseif you fully understand the consequences of this potentially dangerouscommand (see below), since repairDatabaseessentially rewrites the whole database by discarding corrupt documents. As a side effect, this will create new MMAPv1 data files without any fragmentation on it and release space back to the OS.

如果您完全理解这个潜在危险命令的后果（见下文），您可能能够运行，因为基本上通过丢弃损坏的文档来重写整个数据库。作为副作用，这将创建新的 MMAPv1 数据文件，其中没有任何碎片，并将空间释放回操作系统。repairDatabaserepairDatabase

For a less adventurous method, running mongodumpand mongorestoremay be possible as well in an MMAPv1 deployment, subject to the size of your deployment.

对于不太冒险的方法，在 MMAPv1 部署中运行mongodump和mongorestore也可能是可能的，具体取决于您的部署规模。

Recover space in a replica set

恢复副本集中的空间

For replica set configurations, the best and the safest method to recover space is to perform an initial sync, for both WiredTiger and MMAPv1.

对于副本集配置，恢复空间的最佳和最安全的方法是为 WiredTiger 和 MMAPv1执行初始同步。

If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it. Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.

如果您需要从集合中的所有节点恢复空间，您可以执行滚动初始同步。也就是说，在每个辅助节点上执行初始同步，然后最终降级主节点并对其执行初始同步。滚动初始同步方法是执行副本集维护的最安全的方法，并且它也没有停机作为奖励。

Please note that the feasibility of doing a rolling initial sync also depends on the size of your deployment. For extremely large deployments, it may not be feasible to do an initial sync, and thus your options are somewhat more limited. If WiredTiger is used, you maybe able to take one secondary out of the set, start it as a standalone, run compacton it, and rejoin it to the set.

请注意，进行滚动初始同步的可行性还取决于您的部署规模。对于非常大的部署，进行初始同步可能不可行，因此您的选择更加有限。如果使用 WiredTiger，您可以从集合中取出一个辅助节点，将其作为独立启动，compact在其上运行，然后将其重新加入集合。

Regarding `repairDatabase`

关于 `repairDatabase`

Please don't run repairDatabaseon replica set nodes. This is very dangerous, as mentioned in the repairDatabase pageand described in more details below.

请不要repairDatabase在副本集节点上运行。这是非常危险的，正如在repairDatabase 页面中提到的，并在下面有更详细的描述。

The name repairDatabaseis a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption on a standalone node, which could lead to corrupt documents.

该名称repairDatabase有点误导，因为该命令不会尝试修复任何内容。该命令旨在在独立节点上的磁盘损坏时使用，这可能导致文档损坏。

The repairDatabasecommand could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents in an attempt to get the database into a state where you can start it and salvage intact document from it.

该repairDatabase命令可以更准确地描述为“打捞数据库”。也就是说，它通过丢弃损坏的文档来重新创建数据库，试图使数据库进入可以启动它并从中挽救完整文档的状态。

In MMAPv1 deployments, this rebuilding of the database files releases space to the OS as a side effect. Releasing space to the OS was never the purpose.

在 MMAPv1 部署中，这种数据库文件的重建会向操作系统释放空间作为副作用。向操作系统释放空间从来都不是目的。

Consequences of `repairDatabase`on a replica set

`repairDatabase`在副本集上的后果

In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabaseon a replica set node, there is a chance that the node contains undetected corruption, and repairDatabasewill dutifully remove the corrupt documents for you.

在副本集中，MongoDB 期望副本集中的所有节点都包含相同的数据。如果您repairDatabase在副本集节点上运行，则该节点可能包含未检测到的损坏，repairDatabase并将尽职尽责地为您删除损坏的文档。

Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash.

可以预见，这使得该节点包含与集合其余部分不同的数据集。如果更新碰巧命中该单个文档，则整个集合可能会崩溃。

To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.

更糟糕的是，这种情况完全有可能会长期处于休眠状态，只是在没有明显原因的情况下突然发生。

Answer 9

回答by VISHAL KUMAWAT

In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.

如果从集合中删除了大量数据，并且集合从未将删除的空间用于新文档，则需要将此空间返回给操作系统，以便其他数据库或集合可以使用它。您将需要运行压缩或修复操作，以便对磁盘空间进行碎片整理并重新获得可用的可用空间。

Behavior of compaction process is dependent on MongoDB engine as follows

压缩过程的行为取决于 MongoDB 引擎，如下所示

db.runCommand({compact: collection-name })

MMAPv1

Compaction operation defragments data files & indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.

压缩操作对数据文件和索引进行碎片整理。但是，它不会向操作系统释放空间。该操作对于碎片整理和创建更多连续空间供 MongoDB 重用仍然很有用。但是，当可用磁盘空间非常低时，它没有用。

An additional disk space up to 2GB is required during the compaction operation.

压缩操作期间需要最多 2GB 的额外磁盘空间。

A database level lock is held during the compaction operation.

在压缩操作期间持有数据库级锁。

WiredTiger

有线老虎

The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.

WiredTiger 引擎默认提供压缩，比 MMAPv1 消耗更少的磁盘空间。

The compact process releases the free space to the operating system. Minimal disk space is required to run the compact operation. WiredTiger also blocks all operations on the database as it needs database level lock.

紧凑进程将可用空间释放给操作系统。运行压缩操作所需的磁盘空间最少。WiredTiger 还阻止对数据库的所有操作，因为它需要数据库级锁。

For MMAPv1engine, compact doest not return the space to operating system. You require to run repair operation to release the unused space.

对于MMAPv1引擎，compact 不会将空间返回给操作系统。您需要运行修复操作以释放未使用的空间。

db.runCommand({repairDatabase: 1})

Answer 10

回答by Hett

Mongodb 3.0 and higher has a new storage engine - WiredTiger. In my case switching engine reduced disk usage from 100 Gb to 25Gb.

Mongodb 3.0 及更高版本有一个新的存储引擎 - WiredTiger。就我而言，交换引擎将磁盘使用量从 100 Gb 减少到 25 Gb。

减少 MongoDB 数据库文件大小

提问by Meuble

回答by Gates VP

回答by user435943

回答by awaage

回答by OzzyCzech

回答by David J.

回答by Salvador Dali

回答by Karthickkumar Nagaraj

回答by kevinadi

Recover space in a standalone node

恢复独立节点中的空间

Recover space in a replica set

恢复副本集中的空间

Regarding `repairDatabase`

关于 `repairDatabase`

Consequences of `repairDatabase`on a replica set

`repairDatabase`在副本集上的后果

回答by VISHAL KUMAWAT

回答by Hett

相关推荐

最近更新

标签

减少 MongoDB 数据库文件大小

提问by Meuble

回答by Gates VP

回答by user435943

回答by awaage

回答by OzzyCzech

回答by David J.

回答by Salvador Dali

回答by Karthickkumar Nagaraj

回答by kevinadi

Recover space in a standalone node

恢复独立节点中的空间

Recover space in a replica set

恢复副本集中的空间

Regarding repairDatabase

关于 repairDatabase

Consequences of repairDatabaseon a replica set

repairDatabase在副本集上的后果

回答by VISHAL KUMAWAT

回答by Hett

相关推荐

在 rdp 文件中保存密码 | Windows 7的

git-bash $PATH 无法解析带有空格的 Windows 目录

Windows 7 命令提示符的代理设置

windows WMIC：如何在特定工作目录中使用 **process call create**？

相关推荐

最近更新

标签

Regarding `repairDatabase`

关于 `repairDatabase`

Consequences of `repairDatabase`on a replica set

`repairDatabase`在副本集上的后果

windows WMIC：如何在特定工作目录中使用 process call create？