bash 如何列出目录树中的所有二进制文件扩展名?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9813141/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to list all binary file extensions within a directory tree?
提问by dukeofgaming
I need to build a list of all the file extensions of binary files located within a directory tree.
我需要构建位于目录树中的二进制文件的所有文件扩展名的列表。
The main question would need to be how to distinguish a text file from a binary one, and the rest should be cake.
主要问题需要是如何区分文本文件和二进制文件,其余的应该是蛋糕。
EDIT: This is the closest I got, any better ideas?
编辑:这是我得到的最接近的,有更好的想法吗?
find . -type f|xargs file|grep -v text|sed -r 's:.*\.(.*)\:.*::g'
回答by Eran Ben-Natan
Here's a trick to find the binary files:
这是查找二进制文件的技巧:
grep -r -m 1 "^" <Your Root> | grep "^Binary file"
The -m 1 makes grep not read all the file.
-m 1 使 grep 不读取所有文件。
回答by Bijou Trouvaille
This perly one-liner worked for me, it was also quite fast:
这个 perly one-liner 对我有用,它也很快:
find . -type f -exec perl -MFile::Basename -e 'print (-T $_ ? "" : (fileparse ($_, qr/\.[^.]*/))[2] . "\n" ) for @ARGV' {} + | sort | uniq
and this is how you can find all binary files in the current folder:
这是在当前文件夹中查找所有二进制文件的方法:
find . -type f -exec perl -e 'print (-B $_ ? "$_\n" : "" ) for @ARGV' {} +
-T is a test for text files, and -B for binary, and they are opposites of each other*.
-T 是对文本文件的测试,-B 是对二进制文件的测试,它们是对立的*。
回答by Kaz
There is no difference between a binary file and a text file on Linux. The fileutility looks at the contents and guesses. Unfortunately, it's not of much help because filedoesn't produce a simple "binary or text" answer; it has a complex output with a large number of cases that you would have to parse.
Linux 上的二进制文件和文本文件没有区别。该file实用程序查看内容并进行猜测。不幸的是,它没有多大帮助,因为file它不会产生简单的“二进制或文本”答案;它有一个复杂的输出,其中包含大量您必须解析的案例。
One approach is to read some fixed-sized prefix of a file, like say 256 bytes, and then apply some heuristics. For instance, are all the byte values 0x0 to 0x7F, avoiding control codes except for common whitespace? That suggests ASCII? If there are bytes 0x80 through 0xFF, does the entire buffer (except for one code at the end which may be chopped) decode as valid UTF-8? Etc.
一种方法是读取文件的一些固定大小的前缀,比如 256 字节,然后应用一些启发式方法。例如,所有字节值都为 0x0 到 0x7F,避免了除常见空格之外的控制代码吗?这表明ASCII?如果有字节 0x80 到 0xFF,整个缓冲区(除了最后一个可能被切碎的代码)是否解码为有效的 UTF-8?等等。
One idea might be to sneakily exploit utilities which detect binary files, like GNU diff.
一种想法可能是偷偷利用检测二进制文件的实用程序,例如 GNU diff。
$ diff -r /bin/ls <(echo foo)
Binary files /bin/ls and /dev/fd/63 differ
Without process substitution, still works:
没有进程替换,仍然有效:
$ diff -r /bin/ls /dev/null
Binary files /bin/ls and /dev/null differ
Now just grep the output of that and look for the word Binary.
现在只需 grep 输出并查找单词Binary.
The question is whether diff's heuristic for binary files works for your purposes.
问题是diff's heuristic for binary files是否适合您的目的。
回答by pizza
There is no sure way to differentiate a "text" file from a "binary" file, it is guess work.
没有确定的方法来区分“文本”文件和“二进制”文件,这是猜测工作。
#!/bin/bash
guess=`echo \`head -c 4096 | strings -a -n 1 | wc -c \` '* 1.05 /' \`head -c 4096 | wc -c \` | bc `;
if [ $guess -eq 1 ] ; then
echo "is text file"
exit 0
else
echo "is binary file"
exit 1
fi
回答by kenorb
Here is one-liner in Python to check if the file is binary:
这是 Python 中的单行检查文件是否为二进制文件:
b"\x00" in open("/etc/hosts", "rb").read()
Find using it recursively in shell, see the example below:
找到在 shell 中递归使用它,见下面的例子:
IS_BINARY='import sys; sys.exit(not b"\x00" in open(sys.argv[1], "rb").read())'
find . -type f -exec bash -c "python -c '$IS_BINARY' {} && echo {}" \;
To find all non-binary files, change &&to ||.
要查找所有非二进制文件,请更改&&为||.
回答by kenorb
Here is simple command to list all binary files (which consist NULL character) using GNU grep:
这是使用 GNU列出所有二进制文件(包含NULL 字符)的简单命令grep:
grep -Palr '\x00' .
To print the file extension shorter than 5 characters we can use awkand then filter out the duplicates by using either uniqor sort.
要打印短于 5 个字符的文件扩展名,我们可以使用awk,然后使用uniq或过滤掉重复项sort。
So all together should be something like:
所以所有在一起应该是这样的:
grep -Palr '\x00' . | awk -F. '{if (length($NF) < 5) print $NF}' | sort -u

