bash md5 目录树中的所有文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36920307/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 22:15:41  来源:igfitidea点击:

md5 all files in a directory tree

bashfor-loopfindmd5directory-structure

提问by Bleakley

I have a a directory with a structure like so:

我有一个结构如下的目录:

.
├── Test.txt
├── Test1
│?? ├── Test1.txt
│?? ├── Test1_copy.txt
│?? └── Test1a
│??     ├── Test1a.txt
│??     └── Test1a_copy.txt
└── Test2
   ├── Test2.txt
   ├── Test2_copy.txt
   └── Test2a
       ├── Test2a.txt
       └── Test2a_copy.txt

I would like to create a bash script that makes a md5 checksum of every file in this directory. I want to be able to type the script name in the CLI and then the path to the directory I want to hash and have it work. I'm sure there are many ways to accomplish this. Currently I have:

我想创建一个 bash 脚本,该脚本对该目录中的每个文件进行 md5 校验和。我希望能够在 CLI 中输入脚本名称,然后输入我想要散列并使其工作的目录的路径。我相信有很多方法可以实现这一点。目前我有:

#!/bin/bash

for file in "" ; do 
    md5 >> "__checksums.md5"
done

This just hangs and it not working. Perhaps I should use find?

这只是挂起,它不起作用。也许我应该使用查找?

One caveat - the directories I want to hash will have files with different extensions and may not always have this exact same tree structure. I want something that will work in these different situations, as well.

一个警告 - 我想要散列的目录将包含具有不同扩展名的文件,并且可能并不总是具有完全相同的树结构。我也想要一些可以在这些不同情况下工作的东西。

回答by TeWu

Using md5deep

使用 md5deep

md5deep -r path/to/dir > sums.md5

Using findand md5sum

使用findmd5sum

find relative/path/to/dir -type f -exec md5sum {} + > sums.md5

Be aware, that when you run check on your MD5 sums with md5sum -c sums.md5, you need to run it from the same directory from which you generated sums.md5file. This is because findoutputs paths that are relative to your current location, which are then put into sums.md5file.

请注意,当您使用 对 MD5 总和运行检查时md5sum -c sums.md5,您需要从生成sums.md5文件的同一目录中运行它。这是因为find输出相对于您当前位置的路径,然后将其放入sums.md5文件中。

If this is a problem you can make relative/path/to/dirabsolute (e.g. by puting $PWD/in front of your path). This way you can run check on sums.md5from any location. Disadvantage is, that now sums.md5contains absolute paths, which makes it bigger.

如果这是一个问题,您可以将其relative/path/to/dir设为绝对(例如,$PWD/放在您的路径前面)。这样您就可以sums.md5从任何位置运行检查。缺点是,现在sums.md5包含绝对路径,这使得它更大。

Fully featured function using findand md5sum

功能齐全的功能使用findmd5sum

You can put this function to your .bashrcfile (located in your $HOMEdirectory):

您可以将此函数放入您的.bashrc文件(位于您的$HOME目录中):

function md5sums {
  if [ "$#" -lt 1 ]; then
    echo -e "At least one parameter is expected\n" \
            "Usage: md5sums [OPTIONS] dir"
  else
    local OUTPUT="checksums.md5"
    local CHECK=false
    local MD5SUM_OPTIONS=""

    while [[ $# > 1 ]]; do
      local key=""
      case $key in
        -c|--check)
          CHECK=true
          ;;
        -o|--output)
          OUTPUT=
          shift
          ;;
        *)
          MD5SUM_OPTIONS="$MD5SUM_OPTIONS "
          ;;
      esac
      shift
    done
    local DIR= 

    if [ -d "$DIR" ]; then  # if $DIR directory exists
      cd $DIR  # change to $DIR directory
      if [ "$CHECK" = true ]; then  # if -c or --check option specified
        md5sum --check $MD5SUM_OPTIONS $OUTPUT  # check MD5 sums in $OUTPUT file
      else                          # else
        find . -type f ! -name "$OUTPUT" -exec md5sum $MD5SUM_OPTIONS {} + > $OUTPUT  # Calculate MD5 sums for files in current directory and subdirectories excluding $OUTPUT file and save result in $OUTPUT file
      fi
      cd - > /dev/null  # change to previous directory
    else
      cd $DIR  # if $DIR doesn't exists, change to it to generate localized error message
    fi
  fi
}

After you run source ~/.bashrc, you can use md5sumslike normal command:

运行后source ~/.bashrc,您可以md5sums像普通命令一样使用:

md5sums path/to/dir

will generate checksums.md5file in path/to/dirdirectory, containing MD5 sums of all files in this directory and subdirectories. Use:

checksums.md5path/to/dir目录中生成文件,包含该目录和子目录中所有文件的MD5总和。用:

md5sums -c path/to/dir

to check sums from path/to/dir/checksums.md5file.

检查path/to/dir/checksums.md5文件中的总和。

Note that path/to/dircan be relative or absolute, md5sumswill work fine either way. Resulting checksums.md5file always contains paths relative to path/to/dir. You can use different file name then default checksums.md5by supplying -oor --outputoption. All options, other then -c, --check, -oand --outputare passed to md5sum.

请注意,path/to/dir可以是相对的或绝对的,md5sums无论哪种方式都可以正常工作。结果checksums.md5文件始终包含相对于path/to/dir. 您可以checksums.md5通过提供-o--output选项使用与默认值不同的文件名。所有的选项,其他然后-c--check-o--output传递给md5sum

First half of md5sumsfunction definition is responsible for parsing options. See this answerfor more information about it. Second half contains explanatory comments.

md5sums函数定义的前半部分负责解析选项。有关的更多信息,请参阅此答案。后半部分包含解释性评论。

回答by taskalman

How about:

怎么样:

find /path/you/need -type f -exec md5sum {} \; > checksums.md5

find /path/you/need -type f -exec md5sum {} \; > checksums.md5

Update#1:Improved the command based on @twalberg's recommendation to handle white spaces in file names.

更新#1:根据@twalberg 的建议改进了命令以处理文件名中的空格。

Update#2:Improved based on @jil's suggestion, to remove unnecessary xargscall and use -execoption of find instead.

更新#2:根据@jil 的建议进行改进,删除不必要的xargs调用并改用-execfind 选项。

Update#3:@Blake a naive implementation of your script would look something like this:

更新#3:@Blake 一个简单的脚本实现看起来像这样:

#!/bin/bash
# Usage: checksumchecker.sh <path>
find "" -type f -exec md5sum {} \; > ""__checksums.md5

回答by jil

#!/bin/bash
shopt -s globstar
md5sum ""/** > "__checksums.md5"

Explanation: shopt -s globstar(manual)enables **recursive glob wildcard. It will mean that "$1"/**will expand to list of all the files recursively under the directory given as parameter $1. Then the script simply calls md5sumwith this file list as parameter and > "${1}__checksums.md5"redirects the output to the file.

说明:(shopt -s globstar手动)启用**递归全局通配符。这将意味着"$1"/**将递归扩展到作为参数给出的目录下的所有文件的列表$1。然后脚本简单地md5sum使用这个文件列表作为参数调用> "${1}__checksums.md5"并将输出重定向到文件。

回答by Mark Setchell

Updated Answer

更新答案

If you like the answer below, or any of the others, you can make a function that does the command for you. So, to test it, type the following into Terminal to declare a function:

如果您喜欢下面的答案或其他任何答案,您可以创建一个为您执行命令的函数。因此,要对其进行测试,请在终端中键入以下内容以声明一个函数:

function sumthem(){ find "" -type f -print0 | parallel -0 -X md5 > checksums.md5; }

Then you can just use:

然后你可以使用:

sumthem /Users/somebody/somewhere

If that works how you like, you can add that line to the end of your "bash profile"and the function will be declared and available whenever you are logged in. Your "bash profile"is probably in $HOME/.profile

如果您喜欢这样,您可以将该行添加到“bash 配置文件”的末尾,并且该函数将在您登录时声明并可用。您的“bash 配置文件”可能在$HOME/.profile

Original Answer

原答案

Why not get all your CPU cores working in parallel for you?

为什么不让所有 CPU 内核为您并行工作?

find . -type f -print0 | parallel -0 -X md5sum

This finds all the files (-type f) in the current directory (.) and prints them with a null byte at the end. These are then passed passed into GNU Parallel, which is told that the filenames end with a null byte (-0) and that it should do as many files as possible at a time (-X) to save creating a new process for each file and it should md5sum the files.

这将查找-type f当前目录( ) 中的所有文件 ( ).并在末尾打印一个空字节。然后将这些传递给GNU Parallel,它被告知文件名以空字节 ( -0)结尾,并且它应该一次处理尽可能多的文件 ( -X) 以节省为每个文件创建一个新进程,它应该 md5sum文件。

This approach will pay the largest bonus, in terms off speed, with big images like Photoshop files.

对于像 Photoshop 文件这样的大图像,这种方法将在速度方面带来最大的好处。

回答by Alex Jurado - Bitendian

md5deep -r $your_directory | awk {'print '} | sort | md5sum | awk {'print '}