如何在 bash 中汇总文件大小，按日期将结果分组？

Question

提问by yukondude

On a Linux server that I work with, a process writes randomly-named files at random intervals. Here's a small sample, showing the file size, modification date & time, and file name:

在我使用的 Linux 服务器上，一个进程以随机间隔写入随机命名的文件。这是一个小示例，显示了文件大小、修改日期和时间以及文件名：

27659   2009-03-09  17:24  APP14452.log
0       2009-03-09  17:24  vim14436.log
20      2009-03-09  17:24  jgU14406.log
15078   2009-03-10  08:06  ySh14450.log
20      2009-03-10  08:06  VhJ14404.log
9044    2009-03-10  15:14  EqQ14296.log
8877    2009-03-10  19:38  Ugp14294.log
8898    2009-03-11  18:21  yzJ14292.log
55629   2009-03-11  18:30  ZjX14448.log
20      2009-03-11  18:31  GwI14402.log
25955   2009-03-12  19:19  lRx14290.log
14989   2009-03-12  19:25  oFw14446.log
20      2009-03-12  19:28  clg14400.log

(Note that sometimes the file size can be zero.)

（请注意，有时文件大小可能为零。）

What I would like is a bash script to sum the size of the files, broken down by date, producing output something like this (assuming my arithmetic is correct):

我想要的是一个 bash 脚本来总结文件的大小，按日期分解，产生这样的输出（假设我的算术是正确的）：

27679 2009-03-09
33019 2009-03-10
64527 2009-03-11
40964 2009-03-12

The results would show activity trends over time, and highlight the exceptionally busy days.

结果将显示一段时间内的活动趋势，并突出显示异常繁忙的日子。

In SQL, the operation would be a cinch:

在 SQL 中，操作很简单：

SELECT SUM(filesize), filedate
FROM files
GROUP BY filedate;

Now, this is all probably pretty easy in Perl or Python, but I'd really prefer a bash shell or awk solution. It seems especially tricky to me to group the files by date in bash (especially if you can't assume a particular date format). Summing the sizes could be done in a loop I suppose, but is there an easier, more elegant, approach?

现在，这在 Perl 或 Python 中可能很容易，但我真的更喜欢 bash shell 或 awk 解决方案。在 bash 中按日期对文件进行分组对我来说似乎特别棘手（特别是如果您不能假设特定的日期格式）。我想可以在循环中对大小求和，但是有没有更简单、更优雅的方法？

Answer 1

回答by ashawley

I often use this idiom of Awk:

我经常使用 awk 的这个成语：

awk '{sum[]+= ;}END{for (date in sum){print sum[date], date;}}'

Answer 2

回答by dobrokot

(find ... | xargs stat "--printf=%s+"; echo 0) | bc

(find ... | xargs stat "--printf=%s+"; echo 0) | 公元前

Answer 3

回答by Kristjan Adojaan

Only files, recursively, sorted by date and summed

只有文件，递归地，按日期排序并求和

find ./ -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[]+= ;}END{for (date in sum){print date, sum[date];}}'|sort

Only files, from current directory only, sorted by date and summed

仅文件，仅来自当前目录，按日期排序并求和

find ./ -maxdepth 1 -type f -printf '%TY-%Tm-%Td %s\n'|awk '{sum[]+= ;}END{for (date in sum){print date, sum[date];}}'|sort

Answer 4

回答by yukondude

Following the suggestions from ashawley and vartec, the following "one-liner" does the trick superbly:

按照 ashawley 和 vartec 的建议，下面的“one-liner”很好地完成了这个技巧：

ls -l --time-style=long-iso *log |
    awk '{sum[]+= ;}END{for (s in sum){print sum[s], s;}}' |
    sort -k2 |
    column -t

Answer 5

回答by Dimitre Radoulov

Consider that on Linux you probably have GNU awk, so you don't need other commands:

考虑到在 Linux 上你可能有 GNU awk，所以你不需要其他命令：

ls -l --time-style=long-iso * | 
  WHINY_USERS=-9 awk 'END {
    for (s in sum)
      printf "%-15s\t%s\n", sum[s], s
      }
  { sum[]+=  }
  '

Answer 6

回答by Harel Ben Attia

There a tool I've created which allows performing SQL-like queries against text data, including grouping, joins, conditions and other stuff. You can take a look herefor details.

我创建了一个工具，它允许对文本数据执行类似 SQL 的查询，包括分组、连接、条件和其他内容。您可以在此处查看详细信息。

如何在 bash 中汇总文件大小，按日期将结果分组？

提问by yukondude

回答by ashawley

回答by dobrokot

回答by Kristjan Adojaan

回答by yukondude

回答by Dimitre Radoulov

回答by Harel Ben Attia

相关推荐

最近更新

标签

如何在 bash 中汇总文件大小，按日期将结果分组？

提问by yukondude

回答by ashawley

回答by dobrokot

回答by Kristjan Adojaan

回答by yukondude

回答by Dimitre Radoulov

回答by Harel Ben Attia

相关推荐

用于列出文件的最新版本的 Bash/DOS/PowerShell 脚本？

bash 在 shell 脚本中将脚本目录更改为用户的 homedir

bash 如何从bash脚本终止Cygwin bash中脚本的进程树

bash 如何将目录设置为具有持久组权限？

相关推荐

最近更新

标签