bash 递归 diff 目录,忽略所有二进制文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6710878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
diff a directory recursively, ignoring all binary files
提问by Zéychin
Working on a Fedora Constantine box. I am looking to diff
two directories recursively to check for source changes. Due to the setup of the project (prior to my own engagement with said project! sigh), the directories contain both source and binaries, as well as large binary datasets. While diffing eventually works on these directories, it would take perhaps twenty seconds if I could ignore the binary files.
在 Fedora Constantine 盒子上工作。我正在diff
递归查找两个目录以检查源更改。由于项目的设置(在我自己参与上述项目之前!叹气),目录包含源代码和二进制文件,以及大型二进制数据集。虽然 diffing 最终适用于这些目录,但如果我可以忽略二进制文件,可能需要 20 秒。
As far as I understand, diff does not have an 'ignore binary file' mode, but does have an ignore argument which will ignore regular expression withina file. I don't know what to write there to ignore binary files, regardless of extension.
据我了解,DIFF没有一个“忽略二进制文件”模式,但确实有一个忽略的参数,它会忽略正则表达式中的文件。我不知道在那里写什么来忽略二进制文件,无论扩展名如何。
I'm using the following command, but it does not ignore binary files. Does anyone know how to modify this command to do this?
我正在使用以下命令,但它不会忽略二进制文件。有谁知道如何修改这个命令来做到这一点?
diff -rq dir1 dir2
采纳答案by jon
Maybe use grep -I
(which is equivalent to grep --binary-files=without-match
) as a filter to sort out binary files.
也许使用grep -I
(相当于grep --binary-files=without-match
)作为筛选器来整理二进制文件。
dir1='folder-1'
dir2='folder-2'
IFS=$'\n'
for file in $(grep -Ilsr -m 1 '.' "$dir1"); do
diff -q "$file" "${file/${dir1}/${dir2}}"
done
回答by Shannon VanWagner
Kind of cheating but here's what I used:
有点作弊,但这是我使用的:
diff -r dir1/ dir2/ | sed '/Binary\ files\ /d' >outputfile
This recursively compares dir1 to dir2, sed removes the lines for binary files(begins with "Binary files "), then it's redirected to the outputfile.
这递归地比较 dir1 和 dir2,sed 删除二进制文件的行(以“二进制文件”开头),然后将其重定向到输出文件。
回答by RecursivelyIronic
I came to this (old) question looking for something similar (Config files on a legacy production server compared to default apache installation). Following @fearlesstost's suggestion in the comments, git
is sufficiently lightweight and fast that it's probably more straightforward than any of the above suggestions. Copyversion1 to a new directory. Then do:
我来到这个(旧)问题寻找类似的东西(与默认 apache 安装相比,遗留生产服务器上的配置文件)。在评论中遵循@fearlesstost 的建议,git
足够轻量级和快速,它可能比上述任何建议都更直接。 将version1复制到新目录。然后做:
git init
git add .
git commit -m 'Version 1'
Now delete all the files from version 1 in this directory and copy version 2 into the directory. Now do:
现在删除此目录中版本 1 中的所有文件,并将版本 2 复制到该目录中。现在做:
git add .
git commit -m 'Version 2'
git show
This will show you Git's version of all the differences between the first commit and the second. For binary files it will just say that they differ. Alternatively, you could create a branch for each version and try to merge them using git's merge tools.
这将向您显示第一次提交和第二次提交之间所有差异的 Git 版本。对于二进制文件,它只会说它们不同。或者,您可以为每个版本创建一个分支,并尝试使用 git 的合并工具合并它们。
回答by Mohan S Nayaka
If the names of the binary files in your project follow a specific pattern (*.o
, *.so
, ...) as they usually do, you can put those patterns in a file and specify it using -X
(hyphen X).
如果项目中二进制文件的名称像通常那样遵循特定模式(*.o
, *.so
, ...),则可以将这些模式放入文件中并使用-X
(连字符 X)指定它。
Contents of my exclude_file
我的内容 exclude_file
*.o
*.so
*.git
Command:
命令:
diff -X exclude_file -r . other_tree > my_diff_file
UPDATE:
更新:
-x
can be used instead of -X
, to specify exclusion patterns on the command line rather than in a file:
-x
可以用来代替-X
, 在命令行而不是在文件中指定排除模式:
diff -r -x *.o -x *.so -x *.git dir1 dir2
回答by Fredrik Pihl
Use a combination of find
and the file
command. This requires you to do some research on the output of the file
command in your directory; below I'm assuming that the files you want to diff is reported as ascii. OR, use grep -v
to filter out the binary files.
使用find
和file
命令的组合。这需要您对file
目录中命令的输出进行一些研究;下面我假设您要比较的文件报告为 ascii。或者,用于grep -v
过滤二进制文件。
#!/bin/bash
dir1=/path/to/first/folder
dir2=/path/to/second/folder
cd $dir1
files=$(find . -type f -print | xargs file | grep ASCII | cut -d: -f1)
for i in $files;
do
echo diffing $i ---- $dir2/$i
diff -q $i $dir2/$i
done
Since you probably know the names of the huge binaries, place them in a hash-array and only do the diff when a file is not in the hash,something like this:
由于您可能知道巨大二进制文件的名称,因此将它们放在一个散列数组中,并且仅在文件不在散列中时才进行比较,如下所示:
#!/bin/bash
dir1=/path/to/first/directory
dir2=/path/to/second/directory
content_dir1=$(mktemp)
content_dir2=$(mktemp)
$(cd $dir1 && find . -type f -print > $content_dir1)
$(cd $dir2 && find . -type f -print > $content_dir2)
echo Files that only exist in one of the paths
echo -----------------------------------------
diff $content_dir1 $content_dir2
#Files 2 Ignore
declare -A F2I
F2I=( [sqlite3]=1 [binfile2]=1 )
while read f;
do
b=$(basename $f)
if ! [[ ${F2I[$b]} ]]; then
diff $dir1/$f $dir2/$f
fi
done < $content_dir1
回答by Troy
Well, as a crude sort of check, you could ignore files that match /\0/.
好吧,作为一种粗略的检查,您可以忽略匹配 /\0/ 的文件。