Linux上的二进制grep？

Question

提问by sdaau

Say I have generated the following binary file:

假设我生成了以下二进制文件：

# generate file:
python -c 'import sys;[sys.stdout.write(chr(i)) for i in (0,0,0,0,2,4,6,8,0,1,3,0,5,20)]' > mydata.bin

# get file size in bytes
stat -c '%s' mydata.bin

# 14

And say, I want to find the locations of all zeroes (0x00), using a grep-like syntax.

比如说，我想0x00使用类似 grep 的语法找到所有零 ( )的位置。

The best I can do so far is:

到目前为止我能做的最好的是：

$ hexdump -v -e "1/1 \" %02x\n\"" mydata.bin | grep -n '00'

1: 00
2: 00
3: 00
4: 00
9: 00
12: 00

However, this implicitly converts each byte in the original binary file into a multi-byte ASCII representation, on which grepoperates; not exactly the prime example of optimization :)

但是，这将原始二进制文件中的每个字节隐式转换为多字节 ASCII 表示，对其进行grep操作；不完全是优化的主要例子:)

Is there something like a binary grepfor Linux? Possibly, also, something that would support a regular expression-like syntax, but also for byte "characters" - that is, I could write something like 'a(\x00*)b' and match 'zero or more' occurrences of byte 0 between bytes 'a' (97) and 'b' (98)?

有没有类似于grepLinux的二进制文件？可能还有一些支持类似正则表达式的语法的东西，但也支持字节“字符”——也就是说，我可以写一些类似 ' a(\x00*)b' 的东西，并匹配字节 'a' 之间出现的字节 0 的“零个或多个”（ 97) 和 'b' (98)？

EDIT: The context is that I'm working on a driver, where I capture 8-bit data; something goes wrong in the data, which can be kilobytes up to megabytes, and I'd like to check for particular signatures and where they occur. (so far, I'm working with kilobyte snippets, so optimization is not that important - but if I start getting some errors in megabyte long captures, and I need to analyze those, my guess is I would like something more optimized :) . And especially, I'd like something where I can "grep" for a byte as a character - hexdumpforces me to search strings per byte)

编辑：上下文是我正在开发一个驱动程序，在那里我捕获 8 位数据；数据出现问题，可能是千字节到兆字节，我想检查特定的签名及其出现的位置。（到目前为止，我正在处理千字节片段，所以优化不是那么重要 - 但是如果我开始在兆字节长捕获中遇到一些错误，并且我需要分析这些错误，我的猜测是我想要更优化的东西:)。特别是，我想要一些可以“grep”一个字节作为字符的东西 -hexdump迫使我按字节搜索字符串）

EDIT2: same question, different forum :) grepping through a binary file for a sequence of bytes

EDIT2：相同的问题，不同的论坛 :)通过二进制文件查找字节序列

EDIT3: Thanks to the answer by @tchrist, here is also an example with 'grepping' and matching, and displaying results (although not quite the same question as OP):

EDIT3：感谢@tchrist 的回答，这里还有一个带有“grepping”和匹配以及显示结果的示例（尽管与 OP 的问题不完全相同）：

$ perl -ln0777e 'print unpack("H*",), "\n", pos() while /(.....$ perl -ln0777e 'print join(" ", unpack("H2 "x17,)), "\n", pos() while /(.....% perl -ln0e 'print tell' < inputfile
% perl -e '($/,$\) = ("% perl -MO=Deparse,-p -ln0e 'print tell'
BEGIN { $/ = "#!/usr/bin/env perl

use English qw[ -no_match_vars ];

$RS  = "#!/usr/bin/env perl

use strict;
use autodie;  # for perl5.10 or better
use warnings qw[ FATAL all  ];

use IO::Handle;

IO::Handle->input_record_separator("% perl -e 'print 0.0.0.0.2.4.6.8.0.1.3.0.5.20' > inputfile
");
IO::Handle->output_record_separator("\n");

binmode(STDIN);   # just in case

while (my $null_terminated = readline(STDIN)) {
    # this just *past* the null we just read:
    my $seek_offset = tell(STDIN);
    print STDOUT $seek_offset;  

}

close(STDIN);
close(STDOUT);
";    # input  separator for readline, chomp
$ORS = "\n";    # output separator for print

while (<STDIN>) {
    print tell();
}
0"; $\ = "\n"; }
LINE: while (defined(($_ = <ARGV>))) {
    chomp($_);
    print(tell);
}
","\n"); print tell while <STDIN>' < inputfile
#!/usr/bin/env perl
@values = (
    0,  0,  0,  0,  2,
    4,  6,  8,  0,  1,
    3,  0,  5, 20,
);
print pack("C*", @values);
\xCCprint chr for @values;
print map { chr } @values;
#!/usr/bin/env perl

use strict;
use warnings qw[ FATAL all ];
use autodie;

binmode(STDOUT);

my @octet_list = (
    0,  0,  0,  0,  2,
    4,  6,  8,  0,  1,
    3,  0,  5, 20,
);

my $binary = pack("C*", @octet_list);
print STDOUT $binary;

close(STDOUT); 
.....)/g' /path/to/myfile.bin

ca 00 00 00 cb 00 00 00 cc 00 00 00 cd 00 00 00 ce
66357
% man perl
% man perlrun
% man perlvar
% man perlfunc
0:
1:
2:
3:
8:
11:
\xCCbbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin

11:x00 x00 xcc x00 x00 x00 xcd x00 x00 x00 xce
-b search pattern between //. each 2 byte begin with \x (hexa notation).
   -b works like this /pattern/:length (in byte) after matched pattern
-s similar to 'grep -o' suppress unmatched output 
-e similar to 'sed -e' give commands
-e 'F d' display offsets before each result here: '11:'
-e 'p h' print results in hexadecimal notation
-e 'A \n' append end-of-line to each result
bbe -b "/\x00\x00\xCC\x00\x00\x00/:17" -s -e "F d" -e "p h" -e "A \n" mydata.bin | sed -e 's/x//g'

11:00 00 cc 00 00 00 cd 00 00 00 ce
.....)/g' /path/to/myfile.bin

ca000000cb000000cc000000cd000000ce     # Matched data (hex)
66357                                  # Offset (dec)

To have the matched data be grouped as one byte (two hex characters) each, then "H2 H2 H2 ..." needs to be specified for as many bytes are there in the matched string; as my match '.....\0\0\0\xCC\0\0\0.....' covers 17 bytes, I can write '"H2"x17' in Perl. Each of these "H2" will return a separate variable (as in a list), so joinalso needs to be used to add spaces between them - eventually:

要将匹配的数据分组为一个字节（两个十六进制字符），则需要为匹配字符串中的字节数指定“H2 H2 H2 ...”；由于我的匹配项 ' .....\0\0\0\xCC\0\0\0.....' 包含 17 个字节，所以我可以"H2"x17在 Perl 中编写 ' '。这些“H2”中的每一个都将返回一个单独的变量（如在列表中），因此join还需要用于在它们之间添加空格 - 最终：

grep --only-matching --byte-offset --binary --text --perl-regexp "<\x-hex pattern>" <file>

Well.. indeed Perl is very nice 'binary grepping' facility, I must admit :) As long as one learns the syntax properly :)

嗯..确实 Perl 是非常好的“二进制 grepping”工具，我必须承认:) 只要你正确地学习语法:)

Answer 1

采纳答案by tchrist

One-Liner Input

单线输入

Here's the shorter one-liner version:

这是较短的单行版本：

grep -obUaP "<\x-hex pattern>" <file>

And here's a slightly longer one-liner:

这是一个稍长的单线：

grep -obUaP "\x01\x02" /bin/grep

The way to connect those two one-liners is by uncompiling the first one's program:

连接这两个单行程序的方法是反编译第一个程序：

153: <\x01\x02>
33210: <\x01\x02>
53453: <\x01\x02>

Programmed Input

程序输入

If you want to put that in a file instead of a calling it from the command line, here's a somewhat more explicit version:

如果你想把它放在一个文件中而不是从命令行调用它，这里有一个更明确的版本：

##代码##

And here's the really long version:

这是非常长的版本：

##代码##

One-Liner Output

单线输出

BTW, to create the test input file, I didn't use your big, long Python script; I just used this simple Perl one-liner:

顺便说一句，为了创建测试输入文件，我没有使用你的又大又长的 Python 脚本；我只是使用了这个简单的 Perl one-liner：

##代码##

You'll find that Perl often winds up being 2-3 times shorter than Python to do the same job. And you don't have to compromise on clarity; what could be simpler that the one-liner above?

你会发现 Perl 通常比 Python 短 2-3 倍来完成同样的工作。而且您不必在清晰度上妥协；有什么比上面的单线更简单？

Programmed Output

程序输出

I know, I know. If you don't already know the language, this might be clearer:

我知道我知道。如果您还不了解该语言，这可能会更清楚：

##代码##

although this works, too:

虽然这也有效：

##代码##

as does

就像

##代码##

Although for those who like everything all rigorous and careful and all, this might be more what you would see:

虽然对于那些喜欢一切严谨细致的人来说，这可能是你更会看到的：

##代码##

TMTOWTDI

Perl supports more than one way to do things so that you can pick the one that you're most comfortable with. If this were something I planned to check in as school or work project, I would certainly select the longer, more careful versions — or at least put a comment in the shell script if I were using the one-liners.

Perl 支持不止一种做事方式，因此您可以选择最适合您的方式。如果这是我计划作为学校或工作项目检查的内容，我肯定会选择更长、更仔细的版本——或者如果我使用的是单行代码，至少在 shell 脚本中添加注释。

You can find documentation for Perl on your own system. Just type

您可以在自己的系统上找到 Perl 的文档。只需输入

##代码##

etc at your shell prompt. If you want pretty-ish versions on the web instead, get the manpages for perl, perlrun, perlvar, and perlfuncfrom http://perldoc.perl.org.

等在你的 shell 提示符下。如果您想在网络上使用漂亮的版本，请从http://perldoc.perl.org获取perl、perlrun、perlvar和perlfunc的联机帮助页。

Answer 2

回答by David Dean

Someone else appears to have been similarly frustrated and wrote their own tool to do it (or at least something similar): bgrep.

其他人似乎也有类似的沮丧，并编写了自己的工具来做到这一点（或至少是类似的东西）：bgrep。

Answer 3

回答by Chance

What about grep -a? Not sure how it works on truly binary files but it works well on text files that the OS thinks is binary.

怎么样grep -a？不确定它如何处理真正的二进制文件，但它在操作系统认为是二进制的文本文件上运行良好。

Answer 4

回答by Omniwombat

One way to solve your immediate problem using only grep is to create a file containing a single null byte. After that, grep -abo -f null_byte_file target_filewill produce the following output.

仅使用 grep 解决直接问题的一种方法是创建一个包含单个空字节的文件。之后，grep -abo -f null_byte_file target_file将产生以下输出。

##代码##

That is of course each byte offset as requested by "-b" followed by a null byte as requested by "-o"

这当然是“-b”请求的每个字节偏移量，后跟“-o”请求的空字节

I'd be the first to advocate perl, but in this case there's no need to bring in the extended family.

我会是第一个提倡 perl 的人，但在这种情况下，没有必要引入大家庭。

Answer 5

回答by hdorio

The bbeprogram is a sed-like editor for binary files. See documentation.

该BBE程序是sed的二进制文件编辑器状。请参阅文档。

Example with bbe:

bbe 的示例：

##代码##

Explanation

解释

##代码##

You can also pipe it to sedto have a cleaner output:

您还可以将其通过管道传输到sed以获得更清晰的输出：

##代码##

Your solution with Perlfrom your EDIT3 give me an 'Out of memory' error with large files.
The same problem goes with bgrep.
The only downside to bbe is that I don't know how to print context that precedes a matched pattern.

你的 EDIT3 中的Perl解决方案给了我一个大文件“内存不足”的错误。
bgrep 也有同样的问题。
bbe 唯一的缺点是我不知道如何打印匹配模式之前的上下文。

Answer 6

回答by Fr0sT

This seems to work for me:

这似乎对我有用：

##代码##

Short form:

简写：

##代码##

Example:

例子：

##代码##

Output (Cygwinbinary):

输出（Cygwin二进制文件）：

##代码##

So you can grep this again to extract offsets. But don't forget to use binary mode again.

因此，您可以再次使用 grep 来提取偏移量。但是不要忘记再次使用二进制模式。

Linux上的二进制grep？

提问by sdaau

采纳答案by tchrist

One-Liner Input

单线输入

Programmed Input

程序输入

One-Liner Output

单线输出

Programmed Output

程序输出

TMTOWTDI

TMTOWTDI

回答by David Dean

回答by Chance

回答by Omniwombat

回答by hdorio

Explanation

解释

回答by Fr0sT

相关推荐

最近更新

标签

Linux上的二进制grep？

提问by sdaau

采纳答案by tchrist

One-Liner Input

单线输入

Programmed Input

程序输入

One-Liner Output

单线输出

Programmed Output

程序输出

TMTOWTDI

TMTOWTDI

回答by David Dean

回答by Chance

回答by Omniwombat

回答by hdorio

Explanation

解释

回答by Fr0sT

相关推荐

C# 为什么这个 WebRequest 代码很慢？

Linux 使用 setrlimit() 设置堆栈大小并引发堆栈溢出/段错误

将泛型对象添加到 C# 中的泛型列表

在 C++ Linux 中获取本地 IP 地址

相关推荐

最近更新

标签