如何在 bash 中汇总文件大小,按日期将结果分组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/643584/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 20:45:47  来源:igfitidea点击:

How do I sum together file sizes in bash, grouping together the results by date?

bashfile

提问by yukondude

On a Linux server that I work with, a process writes randomly-named files at random intervals. Here's a small sample, showing the file size, modification date & time, and file name:

在我使用的 Linux 服务器上,一个进程以随机间隔写入随机命名的文件。这是一个小示例,显示了文件大小、修改日期和时间以及文件名:

27659   2009-03-09  17:24  APP14452.log
0       2009-03-09  17:24  vim14436.log
20      2009-03-09  17:24  jgU14406.log
15078   2009-03-10  08:06  ySh14450.log
20      2009-03-10  08:06  VhJ14404.log
9044    2009-03-10  15:14  EqQ14296.log
8877    2009-03-10  19:38  Ugp14294.log
8898    2009-03-11  18:21  yzJ14292.log
55629   2009-03-11  18:30  ZjX14448.log
20      2009-03-11  18:31  GwI14402.log
25955   2009-03-12  19:19  lRx14290.log
14989   2009-03-12  19:25  oFw14446.log
20      2009-03-12  19:28  clg14400.log

(Note that sometimes the file size can be zero.)

(请注意,有时文件大小可能为零。)

What I would like is a bash script to sum the size of the files, broken down by date, producing output something like this (assuming my arithmetic is correct):

我想要的是一个 bash 脚本来总结文件的大小,按日期分解,产生这样的输出(假设我的算术是正确的):

27679 2009-03-09
33019 2009-03-10
64527 2009-03-11
40964 2009-03-12

The results would show activity trends over time, and highlight the exceptionally busy days.

结果将显示一段时间内的活动趋势,并突出显示异常繁忙的日子。

In SQL, the operation would be a cinch:

在 SQL 中,操作很简单:

SELECT SUM(filesize), filedate
FROM files
GROUP BY filedate;

Now, this is all probably pretty easy in Perl or Python, but I'd really prefer a bash shell or awk solution. It seems especially tricky to me to group the files by date in bash (especially if you can't assume a particular date format). Summing the sizes could be done in a loop I suppose, but is there an easier, more elegant, approach?

现在,这在 Perl 或 Python 中可能很容易,但我真的更喜欢 bash shell 或 awk 解决方案。在 bash 中按日期对文件进行分组对我来说似乎特别棘手(特别是如果您不能假设特定的日期格式)。我想可以在循环中对大小求和,但是有没有更简单、更优雅的方法?

回答by ashawley

I often use this idiom of Awk:

我经常使用 awk 的这个成语:

awk '{sum[]+= ;}END{for (date in sum){print sum[date], date;}}'

回答by dobrokot

(find ... | xargs stat "--printf=%s+"; echo 0) | bc

(find ... | xargs stat "--printf=%s+"; echo 0) | 公元前

回答by Kristjan Adojaan

Only files, recursively, sorted by date and summed

只有文件,递归地,按日期排序并求和

find ./ -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[]+= ;}END{for (date in sum){print date, sum[date];}}'|sort

Only files, from current directory only, sorted by date and summed

仅文件,仅来自当前目录,按日期排序并求和

find ./ -maxdepth 1 -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[]+= ;}END{for (date in sum){print date, sum[date];}}'|sort

回答by yukondude

Following the suggestions from ashawley and vartec, the following "one-liner" does the trick superbly:

按照 ashawley 和 vartec 的建议,下面的“one-liner”很好地完成了这个技巧:

ls -l --time-style=long-iso *log |
    awk '{sum[]+= ;}END{for (s in sum){print sum[s], s;}}' |
    sort -k2 |
    column -t

回答by Dimitre Radoulov

Consider that on Linux you probably have GNU awk, so you don't need other commands:

考虑到在 Linux 上你可能有 GNU awk,所以你不需要其他命令:

ls -l --time-style=long-iso * | 
  WHINY_USERS=-9 awk 'END {
    for (s in sum)
      printf "%-15s\t%s\n", sum[s], s
      }
  { sum[]+=  }
  '

回答by Harel Ben Attia

There a tool I've created which allows performing SQL-like queries against text data, including grouping, joins, conditions and other stuff. You can take a look herefor details.

我创建了一个工具,它允许对文本数据执行类似 SQL 的查询,包括分组、连接、条件和其他内容。您可以在此处查看详细信息。