通过删除最后访问的文件来限制目录大小的 Bash 脚本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/11618144/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Bash script to limit a directory size by deleting files accessed last
提问by user690750
I had previously used a simple find command to delete tar files not accessed in the last x days (in this example, 3 days):
我之前使用了一个简单的 find 命令来删除过去 x 天(在本例中为 3 天)内未访问的 tar 文件:
find /PATH/TO/FILES -type f -name "*.tar" -atime +3 -exec rm {} \;
I now need to improve this script by deleting in order of access date and my bash writing skills are a bit rusty. Here's what I need it to do:
我现在需要通过按访问日期的顺序删除来改进这个脚本,而且我的 bash 写作技巧有点生疏。这是我需要它做的:
- check the size of a directory /PATH/TO/FILES
 - if size in 1) is greater than X size, get a list of the files by access date
 - delete files in order until size is less than X
 
- 检查目录 /PATH/TO/FILES 的大小
 - 如果 1) 中的大小大于 X 大小,则按访问日期获取文件列表
 - 按顺序删除文件直到大小小于X
 
The benefit here is for cache and backup directories, I will only delete what I need to to keep it within a limit, whereas the simplified method might go over size limit if one day is particularly large. I'm guessing I need to use stat and a bash for loop?
这里的好处是对于缓存和备份目录,我只会删除我需要将其保持在限制范围内的内容,而如果某一天特别大,简化的方法可能会超过大小限制。我猜我需要使用 stat 和 bash for 循环?
采纳答案by user690750
Here's a simple, easy to read and understand method I came up with to do this:
这是我想出的一个简单、易读和理解的方法:
DIRSIZE=$(du -s /PATH/TO/FILES | awk '{print }')
if [ "$DIRSIZE" -gt "$SOMELIMIT" ]
  then
    for f in `ls -rt --time=atime /PATH/TO/FILES/*.tar`; do
    FILESIZE=`stat -c "%s" $f`
    FILESIZE=$(($FILESIZE/1024))
    DIRSIZE=$(($DIRSIZE - $FILESIZE))
    if [ "$DIRSIZE" -lt "$LIMITSIZE" ]; then
        break
    fi
done
fi
回答by Lari Hotari
I improved brunner314's example and fixed the problems in it.
我改进了 brunner314 的示例并修复了其中的问题。
Here is a working script I'm using:
这是我正在使用的工作脚本:
#!/bin/bash
DELETEDIR=""
MAXSIZE=""
if [[ -z "$DELETEDIR" || -z "$MAXSIZE" || "$MAXSIZE" -lt 1 ]]; then
    echo "usage: find /PATH/TO/FILES -name '*.tar' -type f \
| sed 's/ /\ /g' \
| xargs stat -f "%a::%z::%N" \
| sort -r \
| awk '
  BEGIN{curSize=0; FS="::"}
  {curSize += }
  curSize > $X_SIZE{print }
  '
| sed 's/ /\ /g' \
| xargs rm
 [directory] [maxsize in megabytes]" >&2
    exit 1
fi
find "$DELETEDIR" -type f -printf "%T@::%p::%s\n" \
| sort -rn \
| awk -v maxbytes="$((1024 * 1024 * $MAXSIZE))" -F "::" '
  BEGIN { curSize=0; }
  { 
  curSize += ;
  if (curSize > maxbytes) { print ; }
  }
  ' \
  | tac | awk '{printf "%s##代码##",##代码##}' | xargs -0 -r rm
# delete empty directories
find "$DELETEDIR" -mindepth 1 -depth -type d -empty -exec rmdir "{}" \;
回答by brunner314
I didn't need to use loops, just some careful application of stat and awk. Details and explanation below, first the code:
我不需要使用循环,只需要仔细应用 stat 和 awk。下面的细节和解释,首先是代码:
##代码##Note that this is one logical command line, but for the sake of sanity I split it up.
请注意,这是一个逻辑命令行,但为了理智起见,我将其拆分。
It starts with a find command based on the one above, without the parts that limit it to files older than 3 days. It pipes that to sed, to escape any spaces in the file names find returns, then uses xargs to run stat on all the results. The -f "%a::%z::%N" tells stat the format to use, with the time of last access in the first field, the size of the file in the second, and the name of the file in the third. I used '::' to separate the fields because it is easier to deal with spaces in the file names that way. Sort then sorts them on the first field, with -r to reverse the ordering.
它以基于上述命令的 find 命令开始,没有将其限制为超过 3 天的文件的部分。它通过管道将其传送到 sed,以转义文件名中的任何空格 find 返回,然后使用 xargs 对所有结果运行 stat。-f "%a::%z::%N" 告诉 stat 要使用的格式,第一个字段是上次访问的时间,第二个字段是文件的大小,以及文件名第三。我使用 '::' 来分隔字段,因为这样处理文件名中的空格更容易。Sort 然后在第一个字段上对它们进行排序,使用 -r 来反转排序。
Now we have a list of all the files we are interested in, in order from latest accessed to earliest accessed. Then the awk script adds up all the sizes as it goes through the list, and begins outputting them when it gets over $X_SIZE. The files that are not output this way will be the ones kept, the other file names go to sed again to escape any spaces and then to xargs, which runs rm them.
现在我们有一个我们感兴趣的所有文件的列表,按照从最近访问到最早访问的顺序。然后 awk 脚本在遍历列表时将所有大小相加,并在超过 $X_SIZE 时开始输出它们。不以这种方式输出的文件将被保留,其他文件名再次转到 sed 以转义任何空格,然后转到运行 rm 它们的 xargs。

