Linux 如何计算目录的 md5 校验和?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1657232/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I calculate an md5 checksum of a directory?
提问by victorz
I need to calculate a summary md5 checksum for all files of a particular type (*.py
for example) placed under a directory and all sub-directories.
我需要为*.py
放置在目录和所有子目录下的特定类型(例如)的所有文件计算摘要 md5 校验和。
What is the best way to do that?
最好的方法是什么?
Edit:The proposed solutions are very nice, but this is not exactly what I need. I'm looking for a solution to get a single summarychecksum which will uniquely identify the directory as a whole - including content of all its sub-directories.
编辑:建议的解决方案非常好,但这并不是我所需要的。我正在寻找一种解决方案来获得单个摘要校验和,它将唯一地标识整个目录 - 包括其所有子目录的内容。
采纳答案by unutbu
find /path/to/dir/ -type f -name "*.py" -exec md5sum {} + | awk '{print }' | sort | md5sum
The find command lists all the files that end in .py. The md5sum is computed for each .py file. awk is used to pick off the md5sums (ignoring the filenames, which may not be unique). The md5sums are sorted. The md5sum of this sorted list is then returned.
find 命令列出所有以 .py 结尾的文件。md5sum 是为每个 .py 文件计算的。awk 用于挑选 md5sums(忽略文件名,它可能不是唯一的)。md5sum 已排序。然后返回此排序列表的 md5sum。
I've tested this by copying a test directory:
我通过复制测试目录对此进行了测试:
rsync -a ~/pybin/ ~/pybin2/
I renamed some of the files in ~/pybin2.
我重命名了 ~/pybin2 中的一些文件。
The find...md5sum
command returns the same output for both directories.
该find...md5sum
命令为两个目录返回相同的输出。
2bcf49a4d19ef9abd284311108d626f1 -
回答by Ramon
If you want one md5sum spanning the whole directory, I would do something like
如果你想要一个跨越整个目录的 md5sum,我会做类似的事情
cat *.py | md5sum
回答by ghostdog74
GNU find
GNU 查找
find /path -type f -name "*.py" -exec md5sum "{}" +;
回答by ire_and_curses
Create a tar archive file on the fly and pipe that to md5sum
:
动态创建一个 tar 存档文件并将其通过管道传输到md5sum
:
tar c dir | md5sum
This produces a single md5sum that should be unique to your file and sub-directory setup. No files are created on disk.
这会生成一个 md5sum,它应该是您的文件和子目录设置所独有的。没有在磁盘上创建文件。
回答by jmucchiello
Technically you only need to run ls -lR *.py | md5sum
. Unless you are worried about someone modifying the files and touching them back to their original dates and never changing the files' sizes, the output from ls
should tell you if the file has changed. My unix-foo is weak so you might need some more command line parameters to get the create time and modification time to print. ls
will also tell you if permissions on the files have changed (and I'm sure there are switches to turn that off if you don't care about that).
从技术上讲,您只需要运行ls -lR *.py | md5sum
. 除非您担心有人修改文件并将它们恢复到原始日期并且永远不会更改文件的大小,ls
否则输出应该告诉您文件是否已更改。我的 unix-foo 很弱,因此您可能需要更多命令行参数来获取要打印的创建时间和修改时间。ls
还会告诉您文件的权限是否已更改(如果您不关心,我确信有开关可以将其关闭)。
回答by Dieter_be
ire_and_curses's suggestion of using tar c <dir>
has some issues:
ire_and_curses 的使用建议tar c <dir>
有一些问题:
- tar processes directory entries in the order which they are stored in the filesystem, and there is no way to change this order. This effectively can yield completely different results if you have the "same" directory on different places, and I know no way to fix this (tar cannot "sort" its input files in a particular order).
- I usually care about whether groupid and ownerid numbers are the same, not necessarily whether the string representation of group/owner are the same. This is in line with what for example
rsync -a --delete
does: it synchronizes virtually everything (minus xattrs and acls), but it will sync owner and group based on their ID, not on string representation. So if you synced to a different system that doesn't necessarily have the same users/groups, you should add the--numeric-owner
flag to tar - tar will include the filename of the directory you're checking itself, just something to be aware of.
- tar 按照它们在文件系统中存储的顺序处理目录条目,并且无法更改此顺序。如果您在不同的地方有“相同”的目录,这会有效地产生完全不同的结果,而且我知道没有办法解决这个问题(tar 无法按特定顺序对其输入文件进行“排序”)。
- 我通常关心groupid和ownerid编号是否相同,不一定是group/owner的字符串表示是否相同。这与例如
rsync -a --delete
所做的事情一致:它同步几乎所有内容(减去 xattrs 和 acls),但它会根据所有者和组的 ID 而不是字符串表示同步所有者和组。因此,如果您同步到不一定具有相同用户/组的不同系统,则应该将--numeric-owner
标志添加到 tar - tar 将包含您正在检查的目录的文件名,这是需要注意的。
As long as there is no fix for the first problem (or unless you're sure it does not affect you), I would not use this approach.
只要没有解决第一个问题(或者除非您确定它不会影响您),我就不会使用这种方法。
The find
based solutions proposed above are also no good because they only include files, not directories, which becomes an issue if you the checksumming should keep in mind empty directories.
find
上面提出的基于解决方案也不好,因为它们只包含文件,而不包含目录,如果校验和应该记住空目录,这就会成为一个问题。
Finally, most suggested solutions don't sort consistently, because the collation might be different across systems.
最后,大多数建议的解决方案不会一致排序,因为不同系统的排序规则可能不同。
This is the solution I came up with:
这是我想出的解决方案:
dir=<mydir>; (find "$dir" -type f -exec md5sum {} +; find "$dir" -type d) | LC_ALL=C sort | md5sum
Notes about this solution:
关于此解决方案的注意事项:
- The
LC_ALL=C
is to ensure reliable sorting order across systems - This doesn't differentiate between a directory "named\nwithanewline" and two directories "named" and "withanewline", but the chance of that occuring seems very unlikely. One usually fixes this with a
-print0
flag forfind
but since there's other stuff going on here, I can only see solutions that would make the command more complicated then it's worth.
- 这
LC_ALL=C
是为了确保跨系统的可靠排序 - 这并不区分目录“named\nwithanewline”和两个目录“named”和“withanewline”,但发生这种情况的可能性似乎很小。人们通常用一个
-print0
标志来解决这个问题,find
但由于这里还有其他事情发生,我只能看到会使命令更复杂的解决方案,而不是值得的。
PS: one of my systems uses a limited busybox find
which does not support -exec
nor -print0
flags, and also it appends '/' to denote directories, while findutils find doesn't seem to, so for this machine I need to run:
PS:我的一个系统使用了一个有限的busybox find
,它既不支持-exec
也不支持-print0
标志,它还附加了“/”来表示目录,而 findutils find 似乎没有,所以对于这台机器,我需要运行:
dir=<mydir>; (find "$dir" -type f | while read f; do md5sum "$f"; done; find "$dir" -type d | sed 's#/$##') | LC_ALL=C sort | md5sum
Luckily, I have no files/directories with newlines in their names, so this is not an issue on that system.
幸运的是,我没有名称中带有换行符的文件/目录,因此这在该系统上不是问题。
回答by Michael Shigorin
For the sake of completeness, there's md5deep(1); it's not directly applicable due to *.py filter requirement but should do fine together with find(1).
为了完整起见,有md5deep(1);由于 *.py 过滤器要求,它不直接适用,但应该与 find(1) 一起使用。
回答by alan
I had the same problem so I came up with this script that just lists the md5sums of the files in the directory and if it finds a subdirectory it runs again from there, for this to happen the script has to be able to run through the current directory or from a subdirectory if said argument is passed in $1
我遇到了同样的问题,所以我想出了这个脚本,它只列出目录中文件的 md5sums,如果它找到一个子目录,它会从那里再次运行,为此,脚本必须能够运行当前目录或子目录,如果所述参数在 $1 中传递
#!/bin/bash
if [ -z "" ] ; then
# loop in current dir
ls | while read line; do
ecriv=`pwd`"/"$line
if [ -f $ecriv ] ; then
md5sum "$ecriv"
elif [ -d $ecriv ] ; then
sh myScript "$line" # call this script again
fi
done
else # if a directory is specified in argument
ls "" | while read line; do
ecriv=`pwd`"//"$line
if [ -f $ecriv ] ; then
md5sum "$ecriv"
elif [ -d $ecriv ] ; then
sh myScript "$line"
fi
done
fi
回答by tesujimath
If you only care about files and not empty directories, this works nicely:
如果你只关心文件而不关心空目录,这很好用:
find /path -type f | sort -u | xargs cat | md5sum
回答by peterh - Reinstate Monica
If you want really independance from the filesystem attributes and from the bit-level differences of some tar versions, you could use cpio:
如果您希望真正独立于文件系统属性和某些 tar 版本的位级差异,您可以使用 cpio:
cpio -i -e theDirname | md5sum