在 Linux 中,名称与正则表达式匹配的文件的磁盘使用情况?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9485981/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Disk usage of files whose names match a regex, in Linux?
提问by Camilo Martin
So, in many situations I wanted a way to know how much of my disk space is used by what, so I know what to get rid of, convert to another format, store elsewhere (such as data DVDs), move to another partition, etc. In this case I'm looking at a Windows partition from a SliTaz Linuxbootable media.
所以,在许多情况下,我想知道我的磁盘空间有多少被什么东西使用,所以我知道要摆脱什么,转换成另一种格式,存储在其他地方(例如数据 DVD),移动到另一个分区,等等。在这种情况下,我正在查看来自SliTaz Linux可启动媒体的 Windows 分区。
In most cases, what I want is the size of files and folders, and for that I use NCurses-based ncdu:
在大多数情况下,我想要的是文件和文件夹的大小,为此我使用基于NCurses 的 ncdu:
But in this case, I want a way to get the size of all files matching a regex. An example regex for .bak files:
但在这种情况下,我想要一种方法来获取与regex 匹配的所有文件的大小。.bak 文件的示例正则表达式:
.*\.bak$
How do I get that information, considering a standard Linux with core GNU utilities or BusyBox?
考虑到带有核心 GNU 实用程序或BusyBox的标准 Linux,我如何获得这些信息?
Edit:The output is intended to be parseable by a script.
编辑:输出旨在由脚本解析。
采纳答案by Micha? Kosmulski
I suggest something like: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
我建议如下: find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
Some notes:
一些注意事项:
- The
-print0
option forfind
and--files0-from
fordu
are there to avoid issues with whitespace in file names - The regular expression is matched against the whole path, e.g.
./dir1/subdir2/file.bak
, not justfile.bak
, so if you modify it, take that into account - I used
h
flag for du to produce a "human-readable" format but if you want to parse the output, you may be better off withk
(always use kilobytes) - If you remove the
tail
command, you will additionally see the sizes of particular files and directories
-print0
forfind
和--files0-from
for的选项du
是为了避免文件名中的空格问题- 正则表达式与整个路径匹配,例如
./dir1/subdir2/file.bak
, not justfile.bak
,因此如果您修改它,请考虑到这一点 - 我使用
h
du 标志来生成“人类可读”的格式,但如果你想解析输出,你可能会更好k
(总是使用千字节) - 如果删除该
tail
命令,您还将看到特定文件和目录的大小
Sidenote: a nice GUI tool for finding out who ate your disk space is FileLight. It doesn't do regexes, but is very handy for finding big directories or files clogging your disk.
旁注:用于找出谁占用了您的磁盘空间的一个很好的 GUI 工具是FileLight。它不做正则表达式,但对于查找阻塞磁盘的大目录或文件非常方便。
回答by Camilo Martin
Run this in a Bourne Shell to declare a function that calculates the sum of sizes of all the files matching a regex pattern in the current directory:
在 Bourne Shell 中运行它以声明一个函数,该函数计算当前目录中与正则表达式模式匹配的所有文件的大小总和:
sizeofregex() { IFS=$'\n'; for x in $(find . -regex "" 2> /dev/null); do du -sk "$x" | cut -f1; done | awk '{s+=} END {print s}' | sed 's/^$/0/'; unset IFS; }
(Alternatively, you can put it in a script.)
(或者,您可以将其放入脚本中。)
Usage:
用法:
cd /where/to/look
sizeofregex 'myregex'
The result will be a number (in KiB), including 0
(if there are no files that match your regex).
结果将是一个数字(以 KiB 为单位),包括0
(如果没有与您的正则表达式匹配的文件)。
If you do not want it to look in other filesystems (say you want to look for all .so
files under /
, which is a mount of /dev/sda1
, but not under /home
, which is a mount of /dev/sdb1
, add a -xdev
parameter to find
in the function above.
如果你不希望它在其他文件系统看(说你要寻找的所有.so
下的文件/
,这是一个安装的/dev/sda1
,但不能下/home
,这是一个安装的/dev/sdb1
,一个加-xdev
参数find
在上面的功能。
回答by glenn Hymanman
If you're OK with glob-patterns and you're only interested in the current directory:
如果您对 glob-patterns 没问题,并且您只对当前目录感兴趣:
stat -c "%s" *.bak | awk '{sum += } END {print sum}'
or
或者
sum=0
while read size; do (( sum += size )); done < <(stat -c "%s" *.bak)
echo $sum
The %s
directive to stat gives bytes not kilobytes.
该%s
指令的统计给出了字节不是字节。
If you want to descend into subdirectories, with bash version 4, you can shopt -s globstar
and use the pattern **/*.bak
如果你想进入子目录,使用 bash 版本 4,你可以shopt -s globstar
使用模式**/*.bak
回答by MaddHacker
du
is my favorite answer. If you have a fixed filesystem structure, you can use:
du
是我最喜欢的答案。如果您有固定的文件系统结构,则可以使用:
du -hc *.bak
If you need to add subdirs, just add:
如果您需要添加子目录,只需添加:
du -hc *.bak **/*.bak **/**/*.bak
etc etc
等等等等
However, this isn't a very useful command, so using your find:
但是,这不是一个非常有用的命令,因此请使用您的 find:
TOTAL=0;for I in $(find . -name \*.bak); do TOTAL=$((TOTAL+$(du $I | awk '{print }'))); done; echo $TOTAL
That will echo the total size in bytes of all of the files you find.
这将回显您找到的所有文件的总大小(以字节为单位)。
Hope that helps.
希望有帮助。
回答by ben.snape
The previous solutions didn't work properly for me (I had trouble piping du
) but the following worked great:
以前的解决方案对我来说不能正常工作(我在管道上遇到了问题du
),但以下解决方案效果很好:
find path/to/directory -iregex ".*\.bak$" -exec du -csh '{}' + | tail -1
The iregex
option is a case insensitive regular expression. Use regex
if you want it to be case sensitive.
该iregex
选项是不区分大小写的正则表达式。使用regex
,如果你希望它是区分大小写的。
If you aren't comfortable with regular expressions, you can use the iname
or name
flags (the former being case insensitive):
如果您对正则表达式不满意,可以使用iname
orname
标志(前者不区分大小写):
find path/to/directory -iname "*.bak" -exec du -csh '{}' + | tail -1
In case you want the size of every match (rather than just the combined total), simply leave out the piped tail command:
如果您想要每个匹配的大小(而不仅仅是组合总数),只需省略管道尾命令:
find path/to/directory -iname "*.bak" -exec du -csh '{}' +
These approaches avoid the subdirectory problem in @MaddHackers' answer.
这些方法避免了@MaddHackers 回答中的子目录问题。
Hope this helps others in the same situation (in my case, finding the size of all DLL's in a .NET solution).
希望这可以帮助处于相同情况的其他人(在我的情况下,在 .NET 解决方案中找到所有 DLL 的大小)。
回答by Mecki
The accepted reply suggests to use
接受的答复建议使用
find . -regex '.*\.bak' -print0 | du --files0-from=- -ch | tail -1
but that doesn't work on my system as du
doesn't know a --files-0-from
option on my system. Only GNU du
knows that option, it's neither part of the POSIX Standard(so you won't find it in FreeBSD or macOS), nor will you find it on BusyBox based Linux systems(e.g. most embedded Linux systems) or any other Linux system that does not use the GNU du
version.
但这在我的系统上不起作用,因为我的系统du
上不知道一个--files-0-from
选项。只有 GNUdu
知道这个选项,它既不是POSIX 标准的一部分(所以你不会在 FreeBSD 或 macOS 中找到它),你也不会在基于 BusyBox 的 Linux 系统(例如大多数嵌入式 Linux 系统)或任何其他 Linux 系统上找到它不使用 GNUdu
版本。
Then there's a reply suggesting to use:
然后有一个回复建议使用:
find path/to/directory -iregex .*\.bak$ -exec du -csh '{}' + | tail -1
This solution will work as long as there aren't too many files found, as +
means that find
will try call du
with as many hits as possible in a single call, however, there might be a maximum number of arguments (N) a system supports and if there are more hits than this value, find
will call du
multiple times, splitting the hits into groups smaller than or equal to N items each and this case the result will be wrong and only show the size of the last du
call.
只要没有找到太多文件,此解决方案就会起作用,因为这+
意味着find
将尝试du
在一次调用中尽可能多地调用,但是,系统支持的参数数量 (N) 可能是最大的,并且如果点击次数多于这个值,find
将调用du
多次,将点击次数分成小于或等于 N 个项目的组,这种情况下结果将是错误的,只显示最后一次du
调用的大小。
Finally there is an answer using stat
and awk
, which is a nice way to do it, but it relies on shell globbing in a way that only Bash 4.x or later supports. It will not work with older versions and if it works with other shells is unpredictable.
最后有一个使用stat
and的答案awk
,这是一个很好的方法,但它依赖于只有 Bash 4.x 或更高版本支持的 shell globbing。它不适用于旧版本,并且它是否适用于其他 shell 是不可预测的。
A POSIX conform solution (works on Linux, macOS and any BSD variants), that doesn't suffer by any limitation and that will surely work with every shell would be:
符合 POSIX 的解决方案(适用于 Linux、macOS 和任何 BSD 变体),不受任何限制,并且肯定适用于每个 shell:
find . -regex '.*\.bak' -exec stat -f "%z" {} \; | awk '{s += } END {print s}'