Linux 比较 2 个文件夹并查找字节数不同的文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11087244/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:56:10  来源:igfitidea点击:

Compare 2 Folders and Find Files with Differing Byte Counts

linuxfilesizecomparedirectory

提问by user1464189

Using Gnome in Linux Mint 12, I copied a Folder of about 9.7 GB (containing a complex tree of subfolders) from one NTFS Flash Drive to another NTFS Flash Drive. According to Gnome the file counts match, but according to du (and other programs) the byte counts don't match. (I've had the same problem copying folders in other Linux distros and Windows XP.)

在 Linux Mint 12 中使用 Gnome,我将一个大约 9.7 GB 的文件夹(包含一个复杂的子文件夹树)从一个 NTFS 闪存驱动器复制到另一个 NTFS 闪存驱动器。根据 Gnome 文件计数匹配,但根据 du(和其他程序)字节计数不匹配。(我在其他 Linux 发行版和 Windows XP 中复制文件夹时遇到了同样的问题。)

I only want to know which files don't have matching byte counts. (I don't want to compare the contents of each file, because that would take way too long.) What's the best, easiest and fastest way to find the byte-count-mismatched files?

我只想知道哪些文件没有匹配的字节数。(我不想比较每个文件的内容,因为那会花费太长时间。)找到字节数不匹配的文件的最佳、最简单和最快的方法是什么?

回答by amaksr

Assuming you need to compare dir1 and dir 2, here are the console commands:

假设您需要比较 dir1 和 dir 2,以下是控制台命令:

cd dir1
find . -type f|sort|xargs ls -l| awk '{print ,}' > ~/dir1.txt
cd dir2
find . -type f|sort|xargs ls -l| awk '{print ,}' > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

You may need to edit awk parameters to make it print file length and path properly.

您可能需要编辑 awk 参数以使其正确打印文件长度和路径。

回答by gpoo

Did you check if both partitions have the same attributes? (block size, size, reserved space for deletions or bad blocks, etc.)

您是否检查过两个分区是否具有相同的属性?(块大小、大小、为删除或坏块保留的空间等)

For your specific case, I would recommend rsyncwith option -n (or --dry-run). It will tell you which files are different. That is:

对于您的具体情况,我建议使用选项 -n(或 --dry-run)的rsync。它会告诉你哪些文件是不同的。那是:

$ rsync -I -n /source/ /target/

The option -Iis to ignore times. You can use the same command to make both directories equivalent (timestamp, permissions, etc.).

选项-I是忽略时间。您可以使用相同的命令使两个目录等效(时间戳、权限等)。

Check the manual of rsyncor try the option --helpto get more options and examples on how to use it. It is very powerful.

查看rsync的手册或尝试选项--help以获取有关如何使用它的更多选项和示例。它非常强大。

回答by Ludovic Kuty

I would adapt the answer by @user1464130 as it has trouble handling spaces in file names.

我会调整@user1464130 的答案,因为它在处理文件名中的空格时遇到问题。

cd dir1
find . -type f -printf "%p %s\n" | sort > ~/dir1.txt
cd dir2
find . -type f -printf "%p %s\n" | sort > ~/dir2.txt
diff ~/dir1.txt ~/dir2.txt

If you want to launch a command on each file and use the result in the report, you can use the whileBash construct. This example uses md5sumto compute a checksum for each file.

如果要对每个文件启动命令并在报告中使用结果,则可以使用whileBash 构造。此示例用于md5sum计算每个文件的校验和。

find . -maxdepth 1 -type f -printf "%p %s\n" | while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done

Each $()is executed separately and allows us to compute the checksum for each file. The use of trsqueezes every consecutive spaces into a single space and cutextracts the word in the n-th position, here in the first position. If we don't do that, we get the name of the file two times because md5sumgive it back on stdout.

每个$()都单独执行,并允许我们计算每个文件的校验和。使用将tr每个连续的空格压缩成一个空格并cut提取第 n 个位置的单词,这里是第一个位置。如果我们不这样做,我们会得到文件名两次,因为md5sum将它返回到标准输出。

Here is an example without using the comparison (no diff). Note that I've used a dash -to emphasize the three datas we output about each file but it could be a problem if you want to feed it to another program.

这是一个不使用比较的示例(否diff)。请注意,我使用了破折号-来强调我们输出的关于每个文件的三个数据,但如果您想将其提供给另一个程序,这可能会出现问题。

$ find . -maxdepth 1 -name "*.c" -type f -printf "%p %s\n" |  while read path size; do echo "$path - $(md5sum $path | tr -s " " | cut -f 1 -d " ") - $size" ; done
./thread.c - 5f2b7b12c7cd12fcb9e9796078e5d15b - 584
./utils.c - d61bc1dbc72768e622a04f03e3b8f7a2 - 3413

EDIT: And to handle spaces in filenames and still get the checksum and the size, you can use the following code.

编辑:并且要处理文件名中的空格并仍然获得校验和和大小,您可以使用以下代码。

$ find . -maxdepth 1 -name "*.c" -type f -print0 | xargs -0 -n 1 md5sum | while read checksum path; do echo $path $(stat --printf="%s" "$path") $checksum ; done
./ini tia li za tion.c 84 31626123e9056bac2e96b472bd62f309