bash 删除包含数千个文件的大目录的最佳和最快的方法是什么(在 ubuntu 中)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11339534/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the best and the fastest way to delete large directory containing thousands of files (in ubuntu)
提问by itereshchenkov
As I know the commands like
据我所知,命令如下
find <dir> -type f -exec rm {} \;
are not the best variant to remove large amount of files (total files, including subfolder). It works good if you have small amount of files, but if you have 10+ mlns files in subfolders, it can hang a server.
不是删除大量文件(总文件,包括子文件夹)的最佳变体。如果您有少量文件,它会很好地工作,但是如果您的子文件夹中有 10 多个 mlns 文件,它可能会挂起服务器。
Does anyone know any specific linux commands to solve this problem?
有谁知道任何特定的 linux 命令来解决这个问题?
采纳答案by Rody Oldenhuis
Here's an example bash script:
这是一个示例 bash 脚本:
#!/bin/bash
local LOCKFILE=/tmp/rmHugeNumberOfFiles.lock
# this process gets ultra-low priority
ionice -c2 -n7 -p $$ > /dev/null
if [ $? ]; then
echo "Could not set disk IO priority. Exiting..."
exit
fi
renice +19 -p $$ > /dev/null
if [ $? ]; then
echo "Could not renice process. Exiting..."
exit
fi
# check if there's an instance running already. If so--exit
if [ -e ${LOCKFILE} ] && kill -0 `cat ${LOCKFILE}`; then
echo "An instance of this script is already running."
exit
fi
# make sure the lockfile is removed when we exit. Then: claim the lock
trap "command rm -f -- $LOCKFILE; exit" INT TERM EXIT
echo $$ > $LOCKFILE
# also create a tempfile, and make sure that's removed too upon exit
tmp=$(tempfile) || exit
trap "command rm -f -- '$tmp'" INT TERM EXIT
# ----------------------------------------
# option 1
# ----------------------------------------
# find your specific files
find "" -type f [INSERT SPECIFIC SEARCH PATTERN HERE] > "$tmp"
cat $tmp | rm
# ----------------------------------------
# option 2
# ----------------------------------------
command rm -r ""
# remove the lockfile, tempfile
command rm -f -- "$tmp" $LOCKFILE
This script starts by setting its own process priority and diskIO priority to very low values, to ensure other running processes are as unaffected as possible.
该脚本首先将自己的进程优先级和磁盘IO 优先级设置为非常低的值,以确保其他正在运行的进程尽可能不受影响。
Then it makes sure that it is the ONLY such process running.
然后它确保它是唯一运行的此类进程。
The core of the script is really up to your preference. You can use rm -rif you are sure that the whole dir can be deleted indesciminately (option 2), or you can use findfor more specific file deletion (option 1, possibly using command line options "$2" and onw. for convenience).
脚本的核心完全取决于您的喜好。您可以使用rm -r,如果你是确保整个目录可以indesciminately删除(选项2),也可以使用find更具体的文件删除(选项1,可能是使用命令行选项“$ 2”和ONW。为了方便)。
In the implementation above, Option 1 (find) first outputs everything to a tempfile, so that the rmfunction is only called onceinstead of after each file found by find. When the number of files is indeed huge, this can amount to a significant time saving. On the downside, the size of the tempfile may become an issue, but this is only likely if you're deleting literally billions of files, plus, because the diskIO has such low priority, using a tempfile followed by a single rmmay in total be slower than using the find (...) -exec rm {} \;option. As always, you should experiment a bit to see what best fits your needs.
在上面的实现中,选项 1 ( find) 首先将所有内容输出到临时文件,因此该rm函数仅被调用一次,而不是在find. 当文件数量确实很大时,这可以节省大量时间。不利的一面是,临时文件的大小可能会成为一个问题,但这仅在您删除数十亿个文件时才有可能发生,此外,由于 diskIO 的优先级如此低,因此使用临时文件后跟单个文件rm可能总共是比使用find (...) -exec rm {} \;选项慢。与往常一样,您应该尝试一下,看看什么最适合您的需求。
EDIT: As suggested by user946850, you can also skip the whole tempfile and use
find (...) -print0 | xargs -0 rm. This has a larger memory footprint, since all full paths to all matching files will be inserted in RAM until thefindcommand is completely finished. On the upside: there is no additional file IO due to writes to the tempfile. Which one to choose depends on your use-case.
编辑:根据 user946850 的建议,您也可以跳过整个临时文件并使用
find (...) -print0 | xargs -0 rm. 这具有更大的内存占用,因为所有匹配文件的所有完整路径都将插入到 RAM 中,直到find命令完全完成。从好的方面来说:由于写入临时文件,没有额外的文件 IO。选择哪一个取决于您的用例。
回答by Igor Chubin
It may seem strange but:
这可能看起来很奇怪,但是:
$ rm -rf <dir>
回答by krlmlr
The -r(recursive) switch removes everything below a directory, too -- including subdirectories. (Your command does not remove the directories, only the files.)
该-r(递归)开关删除一切目录下,太-包括子目录。(您的命令不会删除目录,只会删除文件。)
You can also speed up the findapproach:
您还可以加快find方法:
find -type f -print0 | xargs -0 rm
回答by Nick Woodhams
I tried every one of these commands, but problem I had was that the deletion process was locking the disk, and since no other processes could access it, there was a big pileup of processes trying to access the disk making the problem worse. Run "iotop" and see how much disk IO your process is using.
我尝试了这些命令中的每一个,但我遇到的问题是删除过程锁定了磁盘,并且由于没有其他进程可以访问它,因此尝试访问磁盘的进程堆积如山,使问题变得更糟。运行“iotop”并查看您的进程使用了多少磁盘 IO。
Here's the python script that solved my problem. It deletes 500 files at a time, then takes a 2 second break to let the other processes do their business, then continues.
这是解决我的问题的python脚本。它一次删除 500 个文件,然后休息 2 秒让其他进程处理它们的业务,然后继续。
import os, os.path
import time
for root, dirs, files in os.walk('/dir/to/delete/files'):
i = 0
file_num = 0
for f in files:
fullpath = os.path.join(root, f)
i = i + 1
file_num = file_num + 1
os.remove(fullpath)
if i%500 == 1:
time.sleep(2)
print "Deleted %i files" % file_num
Hope this helps some people.
希望这可以帮助一些人。
回答by Noam Manos
If you need to deal with space limit issue on a very large file tree (in my case many perforce branches), that sometimes being hanged while runningthe find and delete process -
如果您需要处理非常大的文件树(在我的情况下有很多 perforce 分支)上的空间限制问题,那么在运行查找和删除过程时有时会被挂起-
Here's a script that I schedule daily to find all directories with specific file("ChangesLog.txt"), and then Sort all directoriesfound that are older than2 days, and Remove the first matched directory (each schedule there could be a new match):
这是我每天安排的脚本,用于查找具有特定文件(“ChangesLog.txt”)的所有目录,然后对找到的所有超过2 天的目录进行排序,并删除第一个匹配的目录(每个计划都可能有一个新的匹配项) ):
bash -c "echo @echo Creating Cleanup_Branch.cmd on %COMPUTERNAME% - %~dp0 > Cleanup_Branch.cmd"
bash -c "echo -n 'bash -c \"find ' >> Cleanup_Branch.cmd"
rm -f dirToDelete.txt
rem cd. > dirToDelete.txt
bash -c "find .. -maxdepth 9 -regex ".+ChangesLog.txt" -exec echo {} >> dirToDelete.txt \; & pid=$!; sleep 100; kill $pid "
sed -e 's/\(.*\)\/.*//' -e 's/^./"&/;s/.$/&" /' dirToDelete.txt | tr '\n' ' ' >> Cleanup_Branch.cmd
bash -c "echo -n '-maxdepth 0 -type d -mtime +2 | xargs -r ls -trd | head -n1 | xargs -t rm -Rf' >> Cleanup_Branch.cmd"
bash -c 'echo -n \" >> Cleanup_Branch.cmd'
call Cleanup_Branch.cmd
Note the requirements:
请注意以下要求:
- Deleting only those directories with "ChangesLog.txt", since other old directories should not be deleted.
- Calling the OS commands in cygwin directly, since otherwise it used Windows default commands.
- Collecting the directories to delete into external text file, in order to save find results, since sometimes the find process has hanged.
- Setting a timeout to the find process by using & background process that being killed after 100 seconds.
- Sorting the directories oldest first, for the delete priority.
- 仅删除那些带有“ChangesLog.txt”的目录,因为不应删除其他旧目录。
- 直接在cygwin 中调用操作系统命令,否则它使用 Windows 默认命令。
- 将要删除的目录收集到外部文本文件中,以保存查找结果,因为有时查找过程会挂起。
- 通过使用 100 秒后被杀死的后台进程为查找进程设置超时。
- 首先对最旧的目录进行排序,以获得删除优先级。
回答by T'Saavik
If you have a reasonably modern version of find (4.2.3 or greater) you can use the -delete flag.
如果您有相当现代的 find 版本(4.2.3 或更高版本),您可以使用 -delete 标志。
find <dir> -type f -delete
If you have version 4.2.12 or greater you can take advantage of xargs style command line stacking via the \+-exec modifier. This way you don't run a separate copy of /bin/rmfor every file.
如果您有 4.2.12 或更高版本,您可以通过\+-exec 修饰符利用 xargs 样式命令行堆叠。这样您就不会/bin/rm为每个文件运行单独的副本。
find <dir> -type f -exec rm {} \+
回答by md rashadul hasan rakib
The previous commands are good.
前面的命令很好。
rm -rf directory/also works faster for billion of files in one folder. I tried that.
rm -rf directory/也可以更快地处理一个文件夹中的数十亿个文件。我试过了。
回答by VIGNESH
You can create a empty directory and RSYNC it to the directory which you need to empty. You will avoid time out and memory out issue
您可以创建一个空目录并将其 RSYNC 到您需要清空的目录。您将避免超时和内存不足问题
回答by knaive
If you would like delete tons of files as soon as possible, try this:
如果您想尽快删除大量文件,请尝试以下操作:
find . -type f -print0 | xargs -P 0 -0 rm -f
find . -type f -print0 | xargs -P 0 -0 rm -f
Note the -Poption will make xargsuse processes as many as possible.
请注意,该-P选项将xargs尽可能多地使用进程。

