在 Linux 中,名称与正则表达式匹配的文件的磁盘使用情况?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9485981/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 04:53:38  来源:igfitidea点击:

Disk usage of files whose names match a regex, in Linux?

regexlinuxbashsumdiskspace

提问by Camilo Martin

So, in many situations I wanted a way to know how much of my disk space is used by what, so I know what to get rid of, convert to another format, store elsewhere (such as data DVDs), move to another partition, etc. In this case I'm looking at a Windows partition from a SliTaz Linuxbootable media.

所以,在许多情况下,我想知道我的磁盘空间有多少被什么东西使用,所以我知道要摆脱什么,转换成另一种格式,存储在其他地方(例如数据 DVD),移动到另一个分区,等等。在这种情况下,我正在查看来自SliTaz Linux可启动媒体的 Windows 分区。

In most cases, what I want is the size of files and folders, and for that I use NCurses-based ncdu:

在大多数情况下,我想要的是文件和文件夹的大小,为此我使用基于NCurses 的 ncdu

                ncdu

                全国总工会

But in this case, I want a way to get the size of all files matching a regex. An example regex for .bak files:

但在这种情况下,我想要一种方法来获取与regex 匹配所有文件的大小。.bak 文件的示例正则表达式:

.*\.bak$

How do I get that information, considering a standard Linux with core GNU utilities or BusyBox?

考虑到带有核心 GNU 实用程序或BusyBox的标准 Linux,我如何获得这些信息?

Edit:The output is intended to be parseable by a script.

编辑:输出旨在由脚本解析。

采纳答案by Micha? Kosmulski

I suggest something like: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1

我建议如下: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1

Some notes:

一些注意事项:

  • The -print0option for findand --files0-fromfor duare there to avoid issues with whitespace in file names
  • The regular expression is matched against the whole path, e.g. ./dir1/subdir2/file.bak, not just file.bak, so if you modify it, take that into account
  • I used hflag for du to produce a "human-readable" format but if you want to parse the output, you may be better off with k(always use kilobytes)
  • If you remove the tailcommand, you will additionally see the sizes of particular files and directories
  • -print0forfind--files0-fromfor的选项du是为了避免文件名中的空格问题
  • 正则表达式与整个路径匹配,例如./dir1/subdir2/file.bak, not just file.bak,因此如果您修改它,请考虑到这一点
  • 我使用hdu 标志来生成“人类可读”的格式,但如果你想解析输出,你可能会更好k(总是使用千字节)
  • 如果删除该tail命令,您还将看到特定文件和目录的大小

Sidenote: a nice GUI tool for finding out who ate your disk space is FileLight. It doesn't do regexes, but is very handy for finding big directories or files clogging your disk.

旁注:用于找出谁占用了您的磁盘空间的一个很好的 GUI 工具是FileLight。它不做正则表达式,但对于查找阻塞磁盘的大目录或文件非常方便。

回答by Camilo Martin

Run this in a Bourne Shell to declare a function that calculates the sum of sizes of all the files matching a regex pattern in the current directory:

在 Bourne Shell 中运行它以声明一个函数,该函数计算当前目录中与正则表达式模式匹配的所有文件的大小总和:

sizeofregex() { IFS=$'\n'; for x in $(find . -regex "" 2> /dev/null); do du -sk "$x" | cut -f1; done | awk '{s+=} END {print s}' | sed 's/^$/0/'; unset IFS; }

(Alternatively, you can put it in a script.)

(或者,您可以将其放入脚本中。)

Usage:

用法:

cd /where/to/look
sizeofregex 'myregex'

The result will be a number (in KiB), including 0(if there are no files that match your regex).

结果将是一个数字(以 KiB 为单位),包括0(如果没有与您的正则表达式匹配的文件)。

If you do not want it to look in other filesystems (say you want to look for all .sofiles under /, which is a mount of /dev/sda1, but not under /home, which is a mount of /dev/sdb1, add a -xdevparameter to findin the function above.

如果你不希望它在其他文件系统看(说你要寻找的所有.so下的文件/,这是一个安装的/dev/sda1,但不能下/home,这是一个安装的/dev/sdb1,一个加-xdev参数find在上面的功能。

回答by glenn Hymanman

If you're OK with glob-patterns and you're only interested in the current directory:

如果您对 glob-patterns 没问题,并且您只对当前目录感兴趣:

stat -c "%s" *.bak | awk '{sum += } END {print sum}'

or

或者

sum=0
while read size; do (( sum += size )); done < <(stat -c "%s" *.bak)
echo $sum

The %sdirective to stat gives bytes not kilobytes.

%s指令的统计给出了字节不是字节。

If you want to descend into subdirectories, with bash version 4, you can shopt -s globstarand use the pattern **/*.bak

如果你想进入子目录,使用 bash 版本 4,你可以shopt -s globstar使用模式**/*.bak

回答by MaddHacker

duis my favorite answer. If you have a fixed filesystem structure, you can use:

du是我最喜欢的答案。如果您有固定的文件系统结构,则可以使用:

du -hc *.bak

If you need to add subdirs, just add:

如果您需要添加子目录,只需添加:

du -hc *.bak **/*.bak **/**/*.bak

etc etc

等等等等

However, this isn't a very useful command, so using your find:

但是,这不是一个非常有用的命令,因此请使用您的 find:

TOTAL=0;for I in $(find . -name \*.bak); do  TOTAL=$((TOTAL+$(du $I | awk '{print }'))); done; echo $TOTAL

That will echo the total size in bytes of all of the files you find.

这将回显您找到的所有文件的总大小(以字节为单位)。

Hope that helps.

希望有帮助。

回答by ben.snape

The previous solutions didn't work properly for me (I had trouble piping du) but the following worked great:

以前的解决方案对我来说不能正常工作(我在管道上遇到了问题du),但以下解决方案效果很好:

find path/to/directory -iregex ".*\.bak$" -exec du -csh '{}' + | tail -1

The iregexoption is a case insensitive regular expression. Use regexif you want it to be case sensitive.

iregex选项是不区分大小写的正则表达式。使用regex,如果你希望它是区分大小写的。

If you aren't comfortable with regular expressions, you can use the inameor nameflags (the former being case insensitive):

如果您对正则表达式不满意,可以使用inameorname标志(前者不区分大小写):

find path/to/directory -iname "*.bak" -exec du -csh '{}' + | tail -1

In case you want the size of every match (rather than just the combined total), simply leave out the piped tail command:

如果您想要每个匹配的大小(而不仅仅是组合总数),只需省略管道尾命令:

find path/to/directory -iname "*.bak" -exec du -csh '{}' +

These approaches avoid the subdirectory problem in @MaddHackers' answer.

这些方法避免了@MaddHackers 回答中的子目录问题。

Hope this helps others in the same situation (in my case, finding the size of all DLL's in a .NET solution).

希望这可以帮助处于相同情况的其他人(在我的情况下,在 .NET 解决方案中找到所有 DLL 的大小)。

回答by Mecki

The accepted reply suggests to use

接受的答复建议使用

find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1

but that doesn't work on my system as dudoesn't know a --files-0-fromoption on my system. Only GNU duknows that option, it's neither part of the POSIX Standard(so you won't find it in FreeBSD or macOS), nor will you find it on BusyBox based Linux systems(e.g. most embedded Linux systems) or any other Linux system that does not use the GNU duversion.

但这在我的系统上不起作用,因为我的系统du上不知道一个--files-0-from选项。只有 GNUdu知道这个选项,它既不是POSIX 标准的一部分(所以你不会在 FreeBSD 或 macOS 中找到它),你也不会在基于 BusyBox 的 Linux 系统(例如大多数嵌入式 Linux 系统)或任何其他 Linux 系统上找到它不使用 GNUdu版本。

Then there's a reply suggesting to use:

然后有一个回复建议使用:

find path/to/directory -iregex .*\.bak$ -exec du -csh '{}' + | tail -1

This solution will work as long as there aren't too many files found, as +means that findwill try call duwith as many hits as possible in a single call, however, there might be a maximum number of arguments (N) a system supports and if there are more hits than this value, findwill call dumultiple times, splitting the hits into groups smaller than or equal to N items each and this case the result will be wrong and only show the size of the last ducall.

只要没有找到太多文件,此解决方案就会起作用,因为这+意味着find将尝试du在一次调用中尽可能多地调用,但是,系统支持的参数数量 (N) 可能是最大的,并且如果点击次数多于这个值,find将调用du多次,将点击次数分成小于或等于 N 个项目的组,这种情况下结果将是错误的,只显示最后一次du调用的大小。

Finally there is an answer using statand awk, which is a nice way to do it, but it relies on shell globbing in a way that only Bash 4.x or later supports. It will not work with older versions and if it works with other shells is unpredictable.

最后有一个使用statand的答案awk,这是一个很好的方法,但它依赖于只有 Bash 4.x 或更高版本支持的 shell globbing。它不适用于旧版本,并且它是否适用于其他 shell 是不可预测的。

A POSIX conform solution (works on Linux, macOS and any BSD variants), that doesn't suffer by any limitation and that will surely work with every shell would be:

符合 POSIX 的解决方案(适用于 Linux、macOS 和任何 BSD 变体),不受任何限制,并且肯定适用于每个 shell:

find . -regex '.*\.bak' -exec stat -f "%z" {} \; | awk '{s += } END {print s}'