bash 删除所有早于 X 天的文件,但至少保留 Y 天最年轻的文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20358865/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove all files older than X days, but keep at least the Y youngest
提问by Nils Toedtmann
I have a script that removes DB dumps that are older than say X=21 days from a backup dir:
我有一个脚本可以从备份目录中删除比 X=21 天还早的数据库转储:
DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60)) # 3 weeks
find ${DB_DUMP_DIR} -type f -mmin +${RETENTION} -delete
But if for whatever reason the DB dump jobs fails to complete for a while, all dumps will eventually be thrown away. So as a safeguard i want to keep at least the youngest Y=7 dumps, even it all or some of them are older than 21 days.
但是,如果由于某种原因数据库转储作业在一段时间内无法完成,则所有转储最终都会被丢弃。因此,作为保护措施,我希望至少保留最年轻的 Y=7 转储,即使它们全部或部分都超过 21 天。
I look for something that is more elegant than this spaghetti:
我寻找比这个意大利面更优雅的东西:
DB_DUMP_DIR=/var/backups/dbs
RETENTION=$((21*24*60)) # 3 weeks
KEEP=7
find ${DB_DUMP_DIR} -type f -printf '%T@ %p\n' | \ # list all dumps with epoch
sort -n | \ # sort by epoch, oldest 1st
head --lines=-${KEEP} |\ # Remove youngest/bottom 7 dumps
while read date filename ; do # loop through the rest
find $filename -mmin +${RETENTION} -delete # delete if older than 21 days
done
(This snippet might have minor bugs - Ignore them. It's to illustrate what i can come up with myself, and why i don't like it)
(这个片段可能有小错误 - 忽略它们。这是为了说明我能想出什么,以及为什么我不喜欢它)
Edit: The find option "-mtime" is one-off: "-mtime +21" means actually "at least 22 days old". That always confused me, so i use -mmin instead. Still one-off, but only a minute.
编辑:查找选项“-mtime”是一次性的:“-mtime +21”实际上意味着“至少 22 天”。这总是让我感到困惑,所以我使用 -mmin 代替。仍然是一次性的,但只有一分钟。
回答by chepner
Use find
to get all files that are old enough to delete, filter out the $KEEP
youngest with tail
, then pass the rest to xargs
.
使用find
让那些老足以删除,筛选出的所有文件$KEEP
老三用tail
,那么剩下的传给xargs
。
find ${DB_DUMP_DIR} -type f -printf '%T@ %p\n' -mmin +$RETENTION |
sort -nr | tail -n +$KEEP |
xargs -r echo
Replace echo
with rm
if the reported list of files is the list you want to remove.
更换echo
用rm
,如果文件报告的列表要删除列表中。
(I assume none of the dump files have newlines in their names.)
(我假设所有转储文件的名称中都没有换行符。)
回答by David W.
You can use -mtime
instead of -mmin
which means you don't have to calculate the number of minutes in a day:
您可以使用-mtime
代替-mmin
which 意味着您不必计算一天中的分钟数:
find $DB_DUMP_DIR -type f -mtime +21
Instead of deleting them, you could use stat
command to sort the files in order:
您可以使用stat
命令按顺序对文件进行排序,而不是删除它们:
find $DB_DUMP_DIR -type f -mtime +21 | while read file
do
stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print }'
This will list all files older than 21 days, but not the seven youngest that are older than 21 days.
这将列出超过 21 天的所有文件,但不会列出超过 21 天的七个最年轻的文件。
From there, you could feed this into xargs to do the remove:
从那里,您可以将其输入 xargs 以进行删除:
find $DB_DUMP_DIR -type f -mtime +21 | while read file
do
stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print ]' | xargs rm
Of course, this is all assuming that you don't have spaces in your file names. If you do, you'll have to take a slightly different tack.
当然,这一切都是假设您的文件名中没有空格。如果你这样做,你将不得不采取稍微不同的策略。
This will also keep the seven youngest files over 21 days old. You might have files younger than that, and don't want to really keep those. However, you could simply run the same sequence again (except remove the -mtime
parameter:
这还将保留 7 个最年轻的文件超过 21 天。您可能有比这更年轻的文件,并且不想真正保留这些文件。但是,您可以简单地再次运行相同的序列(除了删除-mtime
参数:
find $DB_DUMP_DIR -type f | while read file
do
stat -f "%-10m %40N" $file
done | sort | awk 'NR > 7 {print } | xargs rm
You need to look at your statcommand to see what the options are for the format. This varies from system to system. The one I used is for OS X. Linux is different.
您需要查看stat命令以了解格式的选项。这因系统而异。我使用的是 OS X。Linux 是不同的。
Let's take a slightly different approach. I haven't thoroughly tested this, but:
让我们采用稍微不同的方法。我还没有彻底测试过这个,但是:
If all of the files are in the same directory, and none of the file names have whitespace in them:
如果所有文件都在同一目录中,并且文件名中没有空格:
ls -t | awk 'NR > 7 {print current_seconds=$(date +%S) # Seconds since the epoch
((days = 60 * 60 * 24 * 21)) # Number of seconds in 21 days
((oldest_allowed = $current_seconds - $days)) # Oldest allowed file
ls -t | awk 'NR > 7 {print DB_DUMP_DIR=/var/backups/dbs
RETENTION=21*24*60*60 # 3 weeks
CURR_TIME=`date +%s`
find ${DB_DUMP_DIR} -type f -printf '%T@ %p\n' | \
awk '{ print int() -'${CURR_TIME}' + '${RETENTION}' ":" }' | \
sort -n | head -n -7 | grep '^-' | cut -d ':' -f 2- | xargs rm -rf
}' | stat -f "%Dm %N" $file | while date file
do
[ $date < $oldest_allowed ] || rm $file
done
}'
Will print out all of the files except for the seven youngest files. Maybe we can go with that?
将打印出除七个最年轻的文件之外的所有文件。也许我们可以这样做?
find ${DB_DUMP_DIR} -printf '%T@ %p\n' | # print entries with creation time
sort -n | # sort in date-ascending order
head -n -$KEEP | # remove the $KEEP most recent entries
awk '{ print }' | # select the file paths
xargs -r rm # remove the file paths
The ls ... | awk
will shave off the seven youngest. After that, we can take stat to get the name of the file and the date. Since the date is seconds after the epoch, we had to calculate what 21 days prior to the current time would be in seconds before the epoch.
在ls ... | awk
将剔除的七个最年轻的。之后,我们可以使用 stat 来获取文件名和日期。由于日期是纪元之后的秒数,我们必须计算当前时间之前的 21 天在纪元之前的秒数。
After that, it's pretty simple. We look at the date of the file. If it's older than 21 days before the epoch (i.e., it's timestamp is lower) we can delete it.
之后,这很简单。我们查看文件的日期。如果它早于纪元之前的 21 天(即它的时间戳较低),我们可以将其删除。
As I said, I haven't thoroughly tested this, but this will delete all files over 21 days, and only files over 21 days, but always keep the seven youngest.
正如我所说,我还没有彻底测试过这个,但这将删除超过 21 天的所有文件,并且只删除超过 21 天的文件,但始终保留最年轻的 7 个。
回答by rabensky
I'm opening a second answer because I just I have a different solution - one using awk
: just add the time to the 21 day (in seconds) period, minus the current time and remove the negative ones! (after sorting and removing the newest 7 from the list):
我正在打开第二个答案,因为我只是有一个不同的解决方案 - 一个使用awk
:只需将时间添加到 21 天(以秒为单位)期间,减去当前时间并删除负值!(从列表中排序并删除最新的 7 个之后):
# A "safe" function for removing backups older than REMOVE_AGE + 1 day(s), always keeping at least the ALWAYS_KEEP youngest
remove_old_backups() {
local file_prefix="${backup_file_prefix:-}"
local temp=$(( REMOVE_AGE+1 )) # for inverting the mtime argument: it's quirky ;)
# We consider backups made on the same day to be one (commonly these are temporary backups in manual intervention scenarios)
local keeping_n=`/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime -"$temp" -printf '%Td-%Tm-%TY\n' | sort -d | uniq | wc -l`
local extra_keep=$(( $ALWAYS_KEEP-$keeping_n ))
/usr/bin/find . -maxdepth 1 \( -name "$file_prefix*.tgz" -or -name "$file_prefix*.gz" \) -type f -mtime +$REMOVE_AGE -printf '%T@ %p\n' | sort -n | head -n -$extra_keep | cut -d ' ' -f2 | xargs -r rm
}
回答by ireardon
None of these answers quite worked for me, so I adapted chepner's answer and came to this, which simply retains the last $KEEP
backups.
这些答案都不适合我,所以我改编了 chepner 的答案并来到了这个,它只是保留了最后的$KEEP
备份。
# Sample variable values
BACKUP_PATH='/data/backup'
DUMP_PATTERN='dump_*.tar.gz'
NB_RETENTION_DAYS=10
NB_KEEP=2 # keep at least the 2 most recent files in all cases
find ${BACKUP_PATH} -name ${DUMP_PATTERN} \
-mtime +${NB_RETENTION_DAYS} > /tmp/obsolete_files
find ${BACKUP_PATH} -name ${DUMP_PATTERN} \
-printf '%T@ %p\n' | \
sort -n | \
tail -n ${NB_KEEP} | \
awk '{ print }' > /tmp/files_to_keep
grep -F -f /tmp/files_to_keep -v /tmp/obsolete_files > /tmp/files_to_delete
cat /tmp/files_to_delete | xargs -r rm
I believe chepner's code retains the $KEEP
oldest, rather than the youngest.
我相信 chepner 的代码保留了$KEEP
最旧的,而不是最年轻的。
回答by DylanYoung
Here is a BASH function that should do the trick. I couldn't avoid two invocations of find
easily, but other than that, it was a relative success:
这是一个应该可以解决问题的 BASH 函数。我无法find
轻松避免两次调用,但除此之外,它相对成功:
t21=$(date -d "21 days ago" +%s)
cd "$DB_DUMP_DIR"
for f in *; do
if (( $(stat -c %Y "$f") <= $t21 )); then
echo rm "$f"
fi
done
It takes a backup_file_prefix
env variable or it can be passed as the first argument and expects enviroment variables ALWAYS_KEEP
(minimum number of files to keep) and REMOVE_AGE
(num days to pass to -mtime
). It expects a gz
or tgz
extension. There are a few other assumptions as you can see in the comments, mostly in the name of safety.
它需要一个backup_file_prefix
env 变量,或者它可以作为第一个参数传递,并期望环境变量ALWAYS_KEEP
(要保留的最小文件数)和REMOVE_AGE
(要传递给的天数-mtime
)。它需要一个gz
或tgz
扩展名。正如您在评论中看到的,还有一些其他假设,主要是以安全的名义。
Thanks to ireardonand his answer(which doesn't quiteanswer the question) for the inspiration!
Happy safe backup management :)
快乐安全的备份管理:)
回答by Orab?g
From the solutions given in the other solutions, I've experimented and found many bugs or situations that were not wanted.
从其他解决方案中给出的解决方案中,我进行了试验并发现了许多不想要的错误或情况。
Here is the solution I finally came up with :
这是我最终想出的解决方案:
##代码##The ideas are :
这些想法是:
- Most of the time, I just want to keep files that are not aged more than NB_RETENTION_DAYS.
- However, shit happens, and when for some reason there are no recent files anymore (backup scripts are broken), I don't want to remove the NB_KEEP more recent ones, for security (NB_KEEP should be at least 1).
- 大多数情况下,我只想保留时间不超过 NB_RETENTION_DAYS 的文件。
- 然而,糟糕的事情发生了,当由于某种原因不再有最近的文件(备份脚本被破坏)时,为了安全起见,我不想删除 NB_KEEP 更新的文件(NB_KEEP 应该至少为 1)。
I my case, I have 2 backups a day, and set NB_RETENTION_DAYS to 10 (thus, I normally have 20 files in normal situation) One could think that I would thus set NB_KEEP=20, but in fact, I chose NB_KEEP=2, and that's why :
我的情况,我一天备份2次,把NB_RETENTION_DAYS设置为10(这样,正常情况下我通常有20个文件)有人会认为我会这样设置NB_KEEP=20,但实际上,我选择了NB_KEEP=2,这就是为什么:
Let's imagine my backup scripts are broken, and I don't have backup for a month. I really don't care having my 20 latest files that are more than 30 days old. Having at least one is what I want. However, being able to easily identify that there is a problem is very important (obviously my monitoring system is really blind, but that's another point). And having my backup folder having 10 times less files than usual is maybe something that could ring a bell...
假设我的备份脚本坏了,我一个月没有备份。我真的不在乎拥有超过 30 天的 20 个最新文件。至少拥有一个是我想要的。但是,能够轻松识别出问题是非常重要的(显然我的监控系统确实很盲目,但这是另一点)。让我的备份文件夹中的文件比平时少 10 倍可能会敲响警钟......
回答by glenn Hymanman
You could do the loop yourself:
你可以自己做循环:
##代码##I'm assuming you have GNU date
我假设你有 GNU date