如何在 bash 中汇总文件大小,按日期将结果分组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/643584/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I sum together file sizes in bash, grouping together the results by date?
提问by yukondude
On a Linux server that I work with, a process writes randomly-named files at random intervals. Here's a small sample, showing the file size, modification date & time, and file name:
在我使用的 Linux 服务器上,一个进程以随机间隔写入随机命名的文件。这是一个小示例,显示了文件大小、修改日期和时间以及文件名:
27659 2009-03-09 17:24 APP14452.log
0 2009-03-09 17:24 vim14436.log
20 2009-03-09 17:24 jgU14406.log
15078 2009-03-10 08:06 ySh14450.log
20 2009-03-10 08:06 VhJ14404.log
9044 2009-03-10 15:14 EqQ14296.log
8877 2009-03-10 19:38 Ugp14294.log
8898 2009-03-11 18:21 yzJ14292.log
55629 2009-03-11 18:30 ZjX14448.log
20 2009-03-11 18:31 GwI14402.log
25955 2009-03-12 19:19 lRx14290.log
14989 2009-03-12 19:25 oFw14446.log
20 2009-03-12 19:28 clg14400.log
(Note that sometimes the file size can be zero.)
(请注意,有时文件大小可能为零。)
What I would like is a bash script to sum the size of the files, broken down by date, producing output something like this (assuming my arithmetic is correct):
我想要的是一个 bash 脚本来总结文件的大小,按日期分解,产生这样的输出(假设我的算术是正确的):
27679 2009-03-09
33019 2009-03-10
64527 2009-03-11
40964 2009-03-12
The results would show activity trends over time, and highlight the exceptionally busy days.
结果将显示一段时间内的活动趋势,并突出显示异常繁忙的日子。
In SQL, the operation would be a cinch:
在 SQL 中,操作很简单:
SELECT SUM(filesize), filedate
FROM files
GROUP BY filedate;
Now, this is all probably pretty easy in Perl or Python, but I'd really prefer a bash shell or awk solution. It seems especially tricky to me to group the files by date in bash (especially if you can't assume a particular date format). Summing the sizes could be done in a loop I suppose, but is there an easier, more elegant, approach?
现在,这在 Perl 或 Python 中可能很容易,但我真的更喜欢 bash shell 或 awk 解决方案。在 bash 中按日期对文件进行分组对我来说似乎特别棘手(特别是如果您不能假设特定的日期格式)。我想可以在循环中对大小求和,但是有没有更简单、更优雅的方法?
回答by ashawley
I often use this idiom of Awk:
我经常使用 awk 的这个成语:
awk '{sum[]+= ;}END{for (date in sum){print sum[date], date;}}'
回答by dobrokot
(find ... | xargs stat "--printf=%s+"; echo 0) | bc
(find ... | xargs stat "--printf=%s+"; echo 0) | 公元前
回答by Kristjan Adojaan
Only files, recursively, sorted by date and summed
只有文件,递归地,按日期排序并求和
find ./ -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[]+= ;}END{for (date in sum){print date, sum[date];}}'|sort
Only files, from current directory only, sorted by date and summed
仅文件,仅来自当前目录,按日期排序并求和
find ./ -maxdepth 1 -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[]+= ;}END{for (date in sum){print date, sum[date];}}'|sort
回答by yukondude
Following the suggestions from ashawley and vartec, the following "one-liner" does the trick superbly:
按照 ashawley 和 vartec 的建议,下面的“one-liner”很好地完成了这个技巧:
ls -l --time-style=long-iso *log |
awk '{sum[]+= ;}END{for (s in sum){print sum[s], s;}}' |
sort -k2 |
column -t
回答by Dimitre Radoulov
Consider that on Linux you probably have GNU awk, so you don't need other commands:
考虑到在 Linux 上你可能有 GNU awk,所以你不需要其他命令:
ls -l --time-style=long-iso * |
WHINY_USERS=-9 awk 'END {
for (s in sum)
printf "%-15s\t%s\n", sum[s], s
}
{ sum[]+= }
'

