Linux 如何拆解一个被剥离的应用程序的主要功能?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5475790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:26:53  来源:igfitidea点击:

How to disassemble the main function of a stripped application?

clinuxgdbstripdisassembly

提问by karlphillip

Let's say I compiled the application below and stripped it's symbols.

假设我编译了下面的应用程序并剥离了它的符号。

#include <stdio.h>

int main()
{
    printf("Hello\n");
}

Build procedure:

构建程序:

gcc -o hello hello.c
strip --strip-unneeded hello

If the application wasn't stripped, disassembling the main function would be easy. However, I have no idea how to disassemble the mainfunction of a stripped application.

如果应用程序没有被剥离,分解主要功能会很容易。但是,我不知道如何反汇编剥离应用程序的主要功能。

(gdb) disas main
No symbol table is loaded.  Use the "file" command.

(gdb) info line main
Function "main" not defined.

How could I do it? Is it even possible?

我怎么能做到?甚至有可能吗?

Notes: this must be done with GDB only. Forget objdump. Assume that I don't have access to the code.

注意:这只能用 GDB 来完成。忘记objdump。假设我无权访问代码。

A step-by-step example would be greatly appreciated.

一个循序渐进的例子将不胜感激。

采纳答案by Dr Beco

Ok, here a big edition of my previous answer. I think I found a way now.

好的,这是我之前答案的大版本。我想我现在找到了方法。

You (still :) have this specific problem:

你(仍然 :) 有这个特定的问题:

(gdb) disas main
No symbol table is loaded.  Use the "file" command.

Now, if you compile the code (I added a return 0at the end), you will get with gcc -S:

现在,如果您编译代码(我return 0在最后添加了一个),您将获得gcc -S

    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %edi
    call    puts
    movl    
(gdb) info files
Symbols from "/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip".
Local exec file:
    `/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip', file type elf64-x86-64.
    Entry point: 0x400440
    0x0000000000400238 - 0x0000000000400254 is .interp
    ...
    0x00000000004003a8 - 0x00000000004003c0 is .rela.dyn
    0x00000000004003c0 - 0x00000000004003f0 is .rela.plt
    0x00000000004003f0 - 0x0000000000400408 is .init
    0x0000000000400408 - 0x0000000000400438 is .plt
    0x0000000000400440 - 0x0000000000400618 is .text
    ...
    0x0000000000601010 - 0x0000000000601020 is .data
    0x0000000000601020 - 0x0000000000601030 is .bss
, %eax leave ret

Now, you can see that your binary gives you some info:

现在,您可以看到您的二进制文件为您提供了一些信息:

Striped:

有条纹的:

disas 0x0000000000400440,0x0000000000400618
Dump of assembler code from 0x400440 to 0x400618:
   0x0000000000400440:  xor    %ebp,%ebp
   0x0000000000400442:  mov    %rdx,%r9
   0x0000000000400445:  pop    %rsi
   0x0000000000400446:  mov    %rsp,%rdx
   0x0000000000400449:  and    
   0x000000000040045d:  mov    
(gdb) break *0x400524
Breakpoint 1 at 0x400524
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2 

Breakpoint 1, 0x0000000000400524 in main ()
(gdb) n
Single stepping until exit from function main, 
which has no line number information.
hello 1
__libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, 
    init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, 
    stack_end=0x7fffffffdc38) at libc-start.c:258
258 libc-start.c: No such file or directory.
    in libc-start.c
(gdb) n

Program exited normally.
(gdb) 
x400524,%rdi 0x0000000000400464: callq 0x400428 <__libc_start_main@plt>
xfffffffffffffff0,%rsp 0x000000000040044d: push %rax 0x000000000040044e: push %rsp 0x000000000040044f: mov
(gdb) disas 0x0000000000400524,0x0000000000400600
Dump of assembler code from 0x400524 to 0x400600:
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  sub    
#include <stdio.h>

int main(void)
{
    int i=1;
    printf("hello %d\n", i);
    return 0;
}
x10,%rsp 0x000000000040052c: movl
(gdb) break *0x0000000000400440
Breakpoint 2 at 0x400440
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2 

Breakpoint 2, 0x0000000000400440 in _start ()
(gdb) n
Single stepping until exit from function _start, 
which has no line number information.
0x0000000000400428 in __libc_start_main@plt ()
(gdb) n
Single stepping until exit from function __libc_start_main@plt, 
which has no line number information.
0x0000000000400408 in ?? ()
(gdb) n
Cannot find bounds of current function
x1,-0x4(%rbp) 0x0000000000400533: mov
0x400440
0x40046c
0x400490
0x4004f4
0x40051e
0x400524
x40064c,%eax 0x0000000000400538: mov -0x4(%rbp),%edx 0x000000000040053b: mov %edx,%esi 0x000000000040053d: mov %rax,%rdi 0x0000000000400540: mov
(gdb) disas main
Dump of assembler code for function main:
   0x0000000000400524 <+0>: push   %rbp
   0x0000000000400525 <+1>: mov    %rsp,%rbp
   0x0000000000400528 <+4>: mov    
(gdb) disas 0x0000000000400524,0x0000000000400539
Dump of assembler code from 0x400524 to 0x400539:
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  mov    
gdb) info files

Symbols from "/home/bob/tmp/t".
Local exec file:
`/home/bob/tmp/t', file type elf64-x86-64.
Entry point: 0x400490
0x0000000000400270 - 0x000000000040028c is .interp
0x000000000040028c - 0x00000000004002ac is .note.ABI-tag
    ....

0x0000000000400448 - 0x0000000000400460 is .init
    ....
x40062c,%edi 0x000000000040052d: callq 0x400418 <puts@plt> 0x0000000000400532: mov
(gdb) disas 0x0000000000400448,0x0000000000400460
Dump of assembler code from 0x400448 to 0x400460:
   0x0000000000400448:  sub    ##代码##x8,%rsp
   0x000000000040044c:  callq  0x4004bc
   0x0000000000400451:  callq  0x400550
   0x0000000000400456:  callq  0x400650
   0x000000000040045b:  add    ##代码##x8,%rsp
   0x000000000040045f:  retq   
x0,%eax 0x0000000000400537: leaveq 0x0000000000400538: retq End of assembler dump.
x40062c,%edi 0x000000000040052d <+9>: callq 0x400418 <puts@plt> 0x0000000000400532 <+14>: mov ##代码##x0,%eax 0x0000000000400537 <+19>: leaveq 0x0000000000400538 <+20>: retq End of assembler dump.
x0,%eax 0x0000000000400545: callq 0x400418 <printf@plt> 0x000000000040054a: mov ##代码##x0,%eax 0x000000000040054f: leaveq 0x0000000000400550: retq 0x0000000000400551: nop 0x0000000000400552: nop 0x0000000000400553: nop 0x0000000000400554: nop 0x0000000000400555: nop ...
x400540,%r8 0x0000000000400456: mov ##代码##x400550,%rcx 0x000000000040045d: mov ##代码##x400524,%rdi 0x0000000000400464: callq 0x400428 <__libc_start_main@plt> 0x0000000000400469: hlt ... 0x000000000040046c: sub ##代码##x8,%rsp ... 0x0000000000400482: retq 0x0000000000400483: nop ... 0x0000000000400490: push %rbp .. 0x00000000004004f2: leaveq 0x00000000004004f3: retq 0x00000000004004f4: data32 data32 nopw %cs:0x0(%rax,%rax,1) ... 0x000000000040051d: leaveq 0x000000000040051e: jmpq *%rax ... 0x0000000000400520: leaveq 0x0000000000400521: retq 0x0000000000400522: nop 0x0000000000400523: nop 0x0000000000400524: push %rbp 0x0000000000400525: mov %rsp,%rbp 0x0000000000400528: mov ##代码##x40062c,%edi 0x000000000040052d: callq 0x400418 <puts@plt> 0x0000000000400532: mov ##代码##x0,%eax 0x0000000000400537: leaveq 0x0000000000400538: retq

The most important entry here is .text. It is a common name for a assembly start of code, and from our explanation of main bellow, from its size, you can see that it includes main. If you disassembly it, you will see a call to __libc_start_main. Most important, you are disassembling a good entry point that is real code (you are not misleading to change DATA to CODE).

这里最重要的条目是.text. 它是代码汇编开始的通用名称,从我们对main bellow 的解释中,从它的大小可以看出它包括main。如果反汇编它,您将看到对 __libc_start_main 的调用。最重要的是,您正在反汇编一个好的入口点,它是真正的代码(将 DATA 更​​改为 CODE 不会误导您)。

##代码##

The call to __libc_start_maingets as its first argument a pointer to main(). So, the last argument in the stack just immediately before the call is your main() address.

__libc_start_main的调用获得一个指向 main() 的指针作为它的第一个参数。因此,在调用之前堆栈中的最后一个参数是您的 main() 地址。

##代码##

Here it is 0x400524 (as we already know). Now you set a breakpoint an try this:

这里是 0x400524(我们已经知道)。现在你设置一个断点试试这个:

##代码##

Now you can disassembly it using:

现在您可以使用以下方法拆卸它:

##代码##

This is primarily the solution.

这主要是解决方案。

BTW, this is a different code, to see if it works. That is why the assembly above is a bit different. The code above is from this c file:

顺便说一句,这是一个不同的代码,看看它是否有效。这就是为什么上面的程序集有点不同的原因。上面的代码来自这个c文件:

##代码##

But!

但!



if this does not work, then you still have some hints:

如果这不起作用,那么您仍然有一些提示:

You should be looking to set breakpoints in the beginning of all functions from now on. They are just before a retor leave. The first entry point is .textitself. This is the assembly start, but not the main.

从现在开始,您应该希望在所有函数的开头设置断点。它们就在 aret或之前leave。第一个入口点是.text它自己。这是组装的开始,但不是主要的。

The problem is that not always a breakpoint will let your program run. Like this one in the very .text:

问题是断点并不总是能让你的程序运行。非常喜欢这个.text

##代码##

So you need to keep trying until you find your way, setting breakpoints at:

因此,您需要不断尝试,直到找到自己的方法,在以下位置设置断点:

##代码##

From the other answer, we should keep this info:

从另一个答案中,我们应该保留此信息:

In the non-striped version of the file, we see:

在文件的非条纹版本中,我们看到:

##代码##

Now we know that main is at 0x0000000000400524,0x0000000000400539. If we use the same offset to look at the striped binary we get the same results:

现在我们知道 main 在0x0000000000400524,0x0000000000400539。如果我们使用相同的偏移量查看条带化二进制文件,我们会得到相同的结果:

##代码##

So, unless you can get some tip where the main starts (like using another code with symbols), another way is if you can have some info about the firsts assembly instructions, so you can disassembly at specifics places and look if it matches. If you have no access at all to the code, you still can read the ELF definitionto understand how many sections should appear in the code and try a calculated address. Still, you need info about sections in the code!

所以,除非你能在主开始的地方得到一些提示(比如使用另一个带有符号的代码),另一种方法是如果你可以获得关于第一个汇编指令的一些信息,那么你可以在特定的地方反汇编并查看它是否匹配。如果您根本无法访问代码,您仍然可以阅读ELF 定义以了解代码中应出现多少节并尝试计算地址。尽管如此,您仍需要有关代码部分的信息!

That is hard work, my friend! Good luck!

这是艰苦的工作,我的朋友!祝你好运!

Beco

贝科

回答by Laurent G

IIRC, x/i <location>is your friend. Of course you have to figure out which location you want to disassemble yourself.

IIRC,x/i <location>是你的朋友。当然你要自己搞清楚要在哪个位置拆机。

回答by Mat

How about doing info filesto get the section list (with addresses), and going from there?

如何info files获取部分列表(带地址),然后从那里开始?

Example:

例子:

##代码##

The disassemble .init:

拆卸.init

##代码##

Then go ahead and disassemble the rest.

然后继续分解其余部分。

If I were you, and I had the same GCC version as your executable was built with, I'd examine the sequence of functions called on a dummy non-stripped executable. The sequence of calls is probably similar in most usual cases, so that might help you grind through the startup sequence up to your mainby comparison. Optimizations will probably come in the way though.

如果我是你,并且我有与你的可执行文件相同的 GCC 版本,我会检查在一个虚拟的非剥离可执行文件上调用的函数序列。在大多数常见情况下,调用顺序可能是相似的,因此这可能会帮助您main通过比较来了解启动顺序。不过,优化可能会妨碍。

If your binary is stripped and optimized, mainmight not exist as an "entity" in the binary; chances are you can't get much better than this type of procedure.

如果您的二进制文件被剥离和优化,则main可能不作为二进制文件中的“实体”存在;很可能你不能比这种类型的程序更好。

回答by Kevin

There's a great new free tool called unstrip from the paradyn project (full disclosure: I work on this project) that will rewrite your program binary, adding symbol information to it, and recover all (or nearly all) of the functions in stripped Elf binaries for you, with great accuracy. It won't identify the main function as "main", but it will find it, and you can apply the heuristic you already mentioned above to figure out which function is main.

有一个很棒的新免费工具叫做 unstrip from paradyn 项目(完全披露:我在这个项目上工作)它将重写你的程序二进制文件,向它添加符号信息,并恢复剥离的精灵二进制文件中的所有(或几乎所有)函数给你,非常准确。它不会将 main 函数标识为“main”,但它会找到它,您可以应用上面已经提到的启发式方法来确定哪个函数是 main。

http://www.paradyn.org/html/tools/unstrip.html

http://www.paradyn.org/html/tools/unstrip.html

I'm sorry this isn't a gdb-only solution.

很抱歉,这不是仅适用于 gdb 的解决方案。