减少 MongoDB 数据库文件大小
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2966687/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reducing MongoDB database file size
提问by Meuble
I've got a MongoDB database that was once large (>3GB). Since then, documents have been deleted and I was expecting the size of the database files to decrease accordingly.
我有一个曾经很大(> 3GB)的 MongoDB 数据库。从那时起,文档已被删除,我期待数据库文件的大小相应减少。
But since MongoDB keeps allocated space, the files are still large.
但是由于 MongoDB 保持分配的空间,文件仍然很大。
I read here and there that the admin command mongod --repair
is used to free the unused space, but I don't have enough space on the disk to run this command.
我在这里和那里读到 admin 命令mongod --repair
用于释放未使用的空间,但磁盘上没有足够的空间来运行此命令。
Do you know a way I can freed up unused space?
你知道我可以释放未使用空间的方法吗?
回答by Gates VP
UPDATE:with the compact
command andWiredTiger it looks like the extra disk space will actually be released to the OS.
更新:使用compact
命令和WiredTiger 看起来额外的磁盘空间实际上将释放给 OS。
UPDATE:as of v1.9+ there is a compact
command.
更新:从 v1.9+ 开始,有一个compact
命令。
This command will perform a compaction "in-line". It will still need some extra space, but not as much.
此命令将执行“内嵌”压缩。它仍然需要一些额外的空间,但不是那么多。
MongoDB compresses the files by:
MongoDB 通过以下方式压缩文件:
- copying the files to a new location
- looping through the documents and re-ordering / re-solving them
- replacing the original files with the new files
- 将文件复制到新位置
- 遍历文档并重新排序/重新解决它们
- 用新文件替换原始文件
You can do this "compression" by running mongod --repair
or by connecting directly and running db.repairDatabase()
.
您可以通过运行mongod --repair
或直接连接并运行db.repairDatabase()
.
In either case you need the space somewhere to copy the files. Now I don't know why you don't have enough space to perform a compress, however, you do have some options if you have another computer with more space.
无论哪种情况,您都需要在某处留出空间来复制文件。现在我不知道为什么你没有足够的空间来执行压缩,但是,如果你有另一台有更多空间的计算机,你确实有一些选择。
- Export the database to another computer with Mongo installed (using
mongoexport
) and then you can Import that same database (usingmongoimport
). This will result in a new database that is more compressed. Now you can stop the originalmongod
replace with the new database files and you're good to go. - Stop the current mongod and copy the database files to a bigger computer and run the repair on that computer. You can then move the new database files back to the original computer.
- 将数据库导出到安装了 Mongo 的另一台计算机(使用
mongoexport
),然后您可以导入相同的数据库(使用mongoimport
)。这将导致新数据库更加压缩。现在您可以停止mongod
用新的数据库文件替换原来的文件,一切顺利。 - 停止当前的 mongod 并将数据库文件复制到更大的计算机并在该计算机上运行修复。然后,您可以将新的数据库文件移回原始计算机。
There is not currently a good way to "compact in place" using Mongo. And Mongo can definitely suck up a lot of space.
目前还没有使用 Mongo 进行“就地压缩”的好方法。而且 Mongo 绝对可以占用大量空间。
The best strategy right now for compaction is to run a Master-Slave setup. You can then compact the Slave, let it catch up and switch them over. I know still a little hairy. Maybe the Mongo team will come up with better in place compaction, but I don't think it's high on their list. Drive space is currently assumed to be cheap (and it usually is).
现在压缩的最佳策略是运行主从设置。然后你可以压缩 Slave,让它赶上并切换它们。我知道还是有点毛茸茸的。也许 Mongo 团队会提出更好的就地压缩,但我不认为它在他们的名单上名列前茅。驱动器空间目前被认为是便宜的(而且通常是这样)。
回答by user435943
I had the same problem, and solved by simply doing this at the command line:
我遇到了同样的问题,只需在命令行执行此操作即可解决:
mongodump -d databasename
echo 'db.dropDatabase()'?|?mongo databasename
mongorestore dump/databasename
回答by awaage
It looks like Mongo v1.9+ has support for the compact in place!
看起来 Mongo v1.9+ 已经支持压缩到位了!
> db.runCommand( { compact : 'mycollectionname' } )
See the docs here: http://docs.mongodb.org/manual/reference/command/compact/
请参阅此处的文档:http: //docs.mongodb.org/manual/reference/command/compact/
"Unlike repairDatabase, the compact command does not require double disk space to do its work. It does require a small amount of additional space while working. Additionally, compact is faster."
“与 repairDatabase 不同,compact 命令不需要双倍磁盘空间来完成其工作。它在工作时确实需要少量额外空间。此外,compact 速度更快。”
回答by OzzyCzech
Compact all collections in current database
压缩当前数据库中的所有集合
db.getCollectionNames().forEach(function (collectionName) {
print('Compacting: ' + collectionName);
db.runCommand({ compact: collectionName });
});
回答by David J.
If you need to run a full repair, use the repairpath
option. Point it to a disk with more available space.
如果您需要运行完全修复,请使用该repairpath
选项。将其指向具有更多可用空间的磁盘。
For example, on my Mac I've used:
例如,在我的 Mac 上我使用过:
mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair
Update: Per MongoDB Core Server Ticket 4266, you may need to add --nojournal
to avoid an error:
更新:根据MongoDB Core Server Ticket 4266,您可能需要添加--nojournal
以避免错误:
mongod --config /usr/local/etc/mongod.conf --repair --repairpath /Volumes/X/mongo_repair --nojournal
回答by Salvador Dali
Starting with 2.8 version of Mongo, you can use compression. You will have 3 levels of compression with WiredTiger engine, mmap (which is default in 2.6 does not provide compression):
从2.8 版本的 Mongo 开始,您可以使用压缩。您将使用 WiredTiger 引擎进行 3 级压缩,mmap(2.6 中的默认值不提供压缩):
Here is an example of how much space will you be able to save for 16 GB of data:
以下是您可以为 16 GB 数据节省多少空间的示例:
data is taken from thisarticle.
数据取自这篇文章。
回答by Karthickkumar Nagaraj
We need solve 2 ways, based on StorageEngine.
我们需要解决 2 种方式,基于 StorageEngine。
1. MMAP() engine:
1. MMAP() 引擎:
command: db.repairDatabase()
命令:db.repairDatabase()
NOTE:repairDatabase requires free disk space equal to the size of your current data set plus 2 gigabytes. If the volume that holds dbpath lacks sufficient space, you can mount a separate volume and use that for the repair. When mounting a separate volume for repairDatabase you must run repairDatabase from the command line and use the --repairpath switch to specify the folder in which to store temporary repair files. eg: Imagine DB size is 120 GB means, (120*2)+2 = 242GB Hard Disk space required.
注意:repairDatabase 需要等于当前数据集大小加上 2 GB 的可用磁盘空间。如果保存 dbpath 的卷空间不足,您可以安装一个单独的卷并使用它进行修复。为 repairDatabase 安装单独的卷时,您必须从命令行运行 repairDatabase 并使用 --repairpath 开关指定存储临时修复文件的文件夹。例如:想象一下 DB 大小是 120 GB 意味着,(120*2)+2 = 242GB 所需的硬盘空间。
another way you do collection wise, command: db.runCommand({compact: 'collectionName'})
另一种收集方式,命令: db.runCommand({compact: 'collectionName'})
2. WiredTiger:Its automatically resolved it-self.
2. WiredTiger:它自己自动解决的。
回答by kevinadi
There has been some considerable confusion over space reclamation in MongoDB, and some recommended practice are downright dangerous to do in certain deployment types. More details below:
关于 MongoDB 中的空间回收存在一些相当大的混淆,并且在某些部署类型中,一些推荐的做法是非常危险的。更多详情如下:
TL;DRrepairDatabase
attempts to salvage data from a standalone MongoDB deployments that is trying to recover from a disk corruption. If it recovers space, it is purely a side effect. Recovering space should never be the primary consideration of running repairDatabase
.
TL;DRrepairDatabase
尝试从试图从磁盘损坏中恢复的独立 MongoDB 部署中抢救数据。如果它恢复了空间,那纯粹是副作用。恢复空间永远不应该是运行的首要考虑repairDatabase
。
Recover space in a standalone node
恢复独立节点中的空间
WiredTiger:For a standalone node with WiredTiger, running compact
will release space to the OS, with one caveat: The compact
command on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833which was fixed in MongoDB 3.2.3. Prior to this version, compact
on WiredTiger could silently fail.
WiredTiger:对于带有 WiredTiger 的独立节点,运行compact
将向操作系统释放空间,但有一个警告:compact
MongoDB 3.0.x 上 WiredTiger 上的命令受此错误的影响:SERVER-21833,已在 MongoDB 3.2.3 中修复。在此版本之前,compact
WiredTiger 可能会默默地失败。
MMAPv1:Due to the way MMAPv1 works, there is no safe and supported method to recover space using the MMAPv1 storage engine. compact
in MMAPv1 will defragment the data files, potentially making more space available for new documents, but it will not release space back to the OS.
MMAPv1:由于 MMAPv1 的工作方式,没有安全且受支持的方法来使用 MMAPv1 存储引擎恢复空间。compact
在 MMAPv1 中会对数据文件进行碎片整理,可能会为新文档提供更多可用空间,但不会将空间释放回操作系统。
You maybe able to run repairDatabase
if you fully understand the consequences of this potentially dangerouscommand (see below), since repairDatabase
essentially rewrites the whole database by discarding corrupt documents. As a side effect, this will create new MMAPv1 data files without any fragmentation on it and release space back to the OS.
如果您完全理解这个潜在危险命令的后果(见下文),您可能能够运行,因为基本上通过丢弃损坏的文档来重写整个数据库。作为副作用,这将创建新的 MMAPv1 数据文件,其中没有任何碎片,并将空间释放回操作系统。repairDatabase
repairDatabase
For a less adventurous method, running mongodump
and mongorestore
may be possible as well in an MMAPv1 deployment, subject to the size of your deployment.
对于不太冒险的方法,在 MMAPv1 部署中运行mongodump
和mongorestore
也可能是可能的,具体取决于您的部署规模。
Recover space in a replica set
恢复副本集中的空间
For replica set configurations, the best and the safest method to recover space is to perform an initial sync, for both WiredTiger and MMAPv1.
对于副本集配置,恢复空间的最佳和最安全的方法是为 WiredTiger 和 MMAPv1执行初始同步。
If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it. Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.
如果您需要从集合中的所有节点恢复空间,您可以执行滚动初始同步。也就是说,在每个辅助节点上执行初始同步,然后最终降级主节点并对其执行初始同步。滚动初始同步方法是执行副本集维护的最安全的方法,并且它也没有停机作为奖励。
Please note that the feasibility of doing a rolling initial sync also depends on the size of your deployment. For extremely large deployments, it may not be feasible to do an initial sync, and thus your options are somewhat more limited. If WiredTiger is used, you maybe able to take one secondary out of the set, start it as a standalone, run compact
on it, and rejoin it to the set.
请注意,进行滚动初始同步的可行性还取决于您的部署规模。对于非常大的部署,进行初始同步可能不可行,因此您的选择更加有限。如果使用 WiredTiger,您可以从集合中取出一个辅助节点,将其作为独立启动,compact
在其上运行,然后将其重新加入集合。
Regarding repairDatabase
关于 repairDatabase
Please don't run repairDatabase
on replica set nodes. This is very dangerous, as mentioned in the repairDatabase pageand described in more details below.
请不要repairDatabase
在副本集节点上运行。这是非常危险的,正如在repairDatabase 页面中提到的,并在下面有更详细的描述。
The name repairDatabase
is a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption on a standalone node, which could lead to corrupt documents.
该名称repairDatabase
有点误导,因为该命令不会尝试修复任何内容。该命令旨在在独立节点上的磁盘损坏时使用,这可能导致文档损坏。
The repairDatabase
command could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents in an attempt to get the database into a state where you can start it and salvage intact document from it.
该repairDatabase
命令可以更准确地描述为“打捞数据库”。也就是说,它通过丢弃损坏的文档来重新创建数据库,试图使数据库进入可以启动它并从中挽救完整文档的状态。
In MMAPv1 deployments, this rebuilding of the database files releases space to the OS as a side effect. Releasing space to the OS was never the purpose.
在 MMAPv1 部署中,这种数据库文件的重建会向操作系统释放空间作为副作用。向操作系统释放空间从来都不是目的。
Consequences of repairDatabase
on a replica set
repairDatabase
在副本集上的后果
In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabase
on a replica set node, there is a chance that the node contains undetected corruption, and repairDatabase
will dutifully remove the corrupt documents for you.
在副本集中,MongoDB 期望副本集中的所有节点都包含相同的数据。如果您repairDatabase
在副本集节点上运行,则该节点可能包含未检测到的损坏,repairDatabase
并将尽职尽责地为您删除损坏的文档。
Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash.
可以预见,这使得该节点包含与集合其余部分不同的数据集。如果更新碰巧命中该单个文档,则整个集合可能会崩溃。
To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.
更糟糕的是,这种情况完全有可能会长期处于休眠状态,只是在没有明显原因的情况下突然发生。
回答by VISHAL KUMAWAT
In case a large chunk of data is deleted from a collection and the collection never uses the deleted space for new documents, this space needs to be returned to the operating system so that it can be used by other databases or collections. You will need to run a compact or repair operation in order to defragment the disk space and regain the usable free space.
如果从集合中删除了大量数据,并且集合从未将删除的空间用于新文档,则需要将此空间返回给操作系统,以便其他数据库或集合可以使用它。您将需要运行压缩或修复操作,以便对磁盘空间进行碎片整理并重新获得可用的可用空间。
Behavior of compaction process is dependent on MongoDB engine as follows
压缩过程的行为取决于 MongoDB 引擎,如下所示
db.runCommand({compact: collection-name })
MMAPv1
MMAPv1
Compaction operation defragments data files & indexes. However, it does not release space to the operating system. The operation is still useful to defragment and create more contiguous space for reuse by MongoDB. However, it is of no use though when the free disk space is very low.
压缩操作对数据文件和索引进行碎片整理。但是,它不会向操作系统释放空间。该操作对于碎片整理和创建更多连续空间供 MongoDB 重用仍然很有用。但是,当可用磁盘空间非常低时,它没有用。
An additional disk space up to 2GB is required during the compaction operation.
压缩操作期间需要最多 2GB 的额外磁盘空间。
A database level lock is held during the compaction operation.
在压缩操作期间持有数据库级锁。
WiredTiger
有线老虎
The WiredTiger engine provides compression by default which consumes less disk space than MMAPv1.
WiredTiger 引擎默认提供压缩,比 MMAPv1 消耗更少的磁盘空间。
The compact process releases the free space to the operating system. Minimal disk space is required to run the compact operation. WiredTiger also blocks all operations on the database as it needs database level lock.
紧凑进程将可用空间释放给操作系统。运行压缩操作所需的磁盘空间最少。WiredTiger 还阻止对数据库的所有操作,因为它需要数据库级锁。
For MMAPv1engine, compact doest not return the space to operating system. You require to run repair operation to release the unused space.
对于MMAPv1引擎,compact 不会将空间返回给操作系统。您需要运行修复操作以释放未使用的空间。
db.runCommand({repairDatabase: 1})
回答by Hett
Mongodb 3.0 and higher has a new storage engine - WiredTiger. In my case switching engine reduced disk usage from 100 Gb to 25Gb.
Mongodb 3.0 及更高版本有一个新的存储引擎 - WiredTiger。就我而言,交换引擎将磁盘使用量从 100 Gb 减少到 25 Gb。