Linux 如何拆解一个被剥离的应用程序的主要功能?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5475790/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to disassemble the main function of a stripped application?
提问by karlphillip
Let's say I compiled the application below and stripped it's symbols.
假设我编译了下面的应用程序并剥离了它的符号。
#include <stdio.h>
int main()
{
printf("Hello\n");
}
Build procedure:
构建程序:
gcc -o hello hello.c
strip --strip-unneeded hello
If the application wasn't stripped, disassembling the main function would be easy. However, I have no idea how to disassemble the mainfunction of a stripped application.
如果应用程序没有被剥离,分解主要功能会很容易。但是,我不知道如何反汇编剥离应用程序的主要功能。
(gdb) disas main
No symbol table is loaded. Use the "file" command.
(gdb) info line main
Function "main" not defined.
How could I do it? Is it even possible?
我怎么能做到?甚至有可能吗?
Notes: this must be done with GDB only. Forget objdump. Assume that I don't have access to the code.
注意:这只能用 GDB 来完成。忘记objdump。假设我无权访问代码。
A step-by-step example would be greatly appreciated.
一个循序渐进的例子将不胜感激。
采纳答案by Dr Beco
Ok, here a big edition of my previous answer. I think I found a way now.
好的,这是我之前答案的大版本。我想我现在找到了方法。
You (still :) have this specific problem:
你(仍然 :) 有这个特定的问题:
(gdb) disas main
No symbol table is loaded. Use the "file" command.
Now, if you compile the code (I added a return 0
at the end), you will get with gcc -S
:
现在,如果您编译代码(我return 0
在最后添加了一个),您将获得gcc -S
:
pushq %rbp
movq %rsp, %rbp
movl $.LC0, %edi
call puts
movl (gdb) info files
Symbols from "/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip".
Local exec file:
`/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip', file type elf64-x86-64.
Entry point: 0x400440
0x0000000000400238 - 0x0000000000400254 is .interp
...
0x00000000004003a8 - 0x00000000004003c0 is .rela.dyn
0x00000000004003c0 - 0x00000000004003f0 is .rela.plt
0x00000000004003f0 - 0x0000000000400408 is .init
0x0000000000400408 - 0x0000000000400438 is .plt
0x0000000000400440 - 0x0000000000400618 is .text
...
0x0000000000601010 - 0x0000000000601020 is .data
0x0000000000601020 - 0x0000000000601030 is .bss
, %eax
leave
ret
Now, you can see that your binary gives you some info:
现在,您可以看到您的二进制文件为您提供了一些信息:
Striped:
有条纹的:
disas 0x0000000000400440,0x0000000000400618
Dump of assembler code from 0x400440 to 0x400618:
0x0000000000400440: xor %ebp,%ebp
0x0000000000400442: mov %rdx,%r9
0x0000000000400445: pop %rsi
0x0000000000400446: mov %rsp,%rdx
0x0000000000400449: and 0x000000000040045d: mov (gdb) break *0x400524
Breakpoint 1 at 0x400524
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2
Breakpoint 1, 0x0000000000400524 in main ()
(gdb) n
Single stepping until exit from function main,
which has no line number information.
hello 1
__libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>,
init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>,
stack_end=0x7fffffffdc38) at libc-start.c:258
258 libc-start.c: No such file or directory.
in libc-start.c
(gdb) n
Program exited normally.
(gdb)
x400524,%rdi
0x0000000000400464: callq 0x400428 <__libc_start_main@plt>
xfffffffffffffff0,%rsp
0x000000000040044d: push %rax
0x000000000040044e: push %rsp
0x000000000040044f: mov (gdb) disas 0x0000000000400524,0x0000000000400600
Dump of assembler code from 0x400524 to 0x400600:
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: sub #include <stdio.h>
int main(void)
{
int i=1;
printf("hello %d\n", i);
return 0;
}
x10,%rsp
0x000000000040052c: movl (gdb) break *0x0000000000400440
Breakpoint 2 at 0x400440
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2
Breakpoint 2, 0x0000000000400440 in _start ()
(gdb) n
Single stepping until exit from function _start,
which has no line number information.
0x0000000000400428 in __libc_start_main@plt ()
(gdb) n
Single stepping until exit from function __libc_start_main@plt,
which has no line number information.
0x0000000000400408 in ?? ()
(gdb) n
Cannot find bounds of current function
x1,-0x4(%rbp)
0x0000000000400533: mov 0x400440
0x40046c
0x400490
0x4004f4
0x40051e
0x400524
x40064c,%eax
0x0000000000400538: mov -0x4(%rbp),%edx
0x000000000040053b: mov %edx,%esi
0x000000000040053d: mov %rax,%rdi
0x0000000000400540: mov (gdb) disas main
Dump of assembler code for function main:
0x0000000000400524 <+0>: push %rbp
0x0000000000400525 <+1>: mov %rsp,%rbp
0x0000000000400528 <+4>: mov (gdb) disas 0x0000000000400524,0x0000000000400539
Dump of assembler code from 0x400524 to 0x400539:
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: mov gdb) info files
Symbols from "/home/bob/tmp/t".
Local exec file:
`/home/bob/tmp/t', file type elf64-x86-64.
Entry point: 0x400490
0x0000000000400270 - 0x000000000040028c is .interp
0x000000000040028c - 0x00000000004002ac is .note.ABI-tag
....
0x0000000000400448 - 0x0000000000400460 is .init
....
x40062c,%edi
0x000000000040052d: callq 0x400418 <puts@plt>
0x0000000000400532: mov (gdb) disas 0x0000000000400448,0x0000000000400460
Dump of assembler code from 0x400448 to 0x400460:
0x0000000000400448: sub ##代码##x8,%rsp
0x000000000040044c: callq 0x4004bc
0x0000000000400451: callq 0x400550
0x0000000000400456: callq 0x400650
0x000000000040045b: add ##代码##x8,%rsp
0x000000000040045f: retq
x0,%eax
0x0000000000400537: leaveq
0x0000000000400538: retq
End of assembler dump.
x40062c,%edi
0x000000000040052d <+9>: callq 0x400418 <puts@plt>
0x0000000000400532 <+14>: mov ##代码##x0,%eax
0x0000000000400537 <+19>: leaveq
0x0000000000400538 <+20>: retq
End of assembler dump.
x0,%eax
0x0000000000400545: callq 0x400418 <printf@plt>
0x000000000040054a: mov ##代码##x0,%eax
0x000000000040054f: leaveq
0x0000000000400550: retq
0x0000000000400551: nop
0x0000000000400552: nop
0x0000000000400553: nop
0x0000000000400554: nop
0x0000000000400555: nop
...
x400540,%r8
0x0000000000400456: mov ##代码##x400550,%rcx
0x000000000040045d: mov ##代码##x400524,%rdi
0x0000000000400464: callq 0x400428 <__libc_start_main@plt>
0x0000000000400469: hlt
...
0x000000000040046c: sub ##代码##x8,%rsp
...
0x0000000000400482: retq
0x0000000000400483: nop
...
0x0000000000400490: push %rbp
..
0x00000000004004f2: leaveq
0x00000000004004f3: retq
0x00000000004004f4: data32 data32 nopw %cs:0x0(%rax,%rax,1)
...
0x000000000040051d: leaveq
0x000000000040051e: jmpq *%rax
...
0x0000000000400520: leaveq
0x0000000000400521: retq
0x0000000000400522: nop
0x0000000000400523: nop
0x0000000000400524: push %rbp
0x0000000000400525: mov %rsp,%rbp
0x0000000000400528: mov ##代码##x40062c,%edi
0x000000000040052d: callq 0x400418 <puts@plt>
0x0000000000400532: mov ##代码##x0,%eax
0x0000000000400537: leaveq
0x0000000000400538: retq
The most important entry here is .text
. It is a common name for a assembly start of code, and from our explanation of main bellow, from its size, you can see that it includes main. If you disassembly it, you will see a call to __libc_start_main. Most important, you are disassembling a good entry point that is real code (you are not misleading to change DATA to CODE).
这里最重要的条目是.text
. 它是代码汇编开始的通用名称,从我们对main bellow 的解释中,从它的大小可以看出它包括main。如果反汇编它,您将看到对 __libc_start_main 的调用。最重要的是,您正在反汇编一个好的入口点,它是真正的代码(将 DATA 更改为 CODE 不会误导您)。
The call to __libc_start_maingets as its first argument a pointer to main(). So, the last argument in the stack just immediately before the call is your main() address.
对__libc_start_main的调用获得一个指向 main() 的指针作为它的第一个参数。因此,在调用之前堆栈中的最后一个参数是您的 main() 地址。
##代码##Here it is 0x400524 (as we already know). Now you set a breakpoint an try this:
这里是 0x400524(我们已经知道)。现在你设置一个断点试试这个:
##代码##Now you can disassembly it using:
现在您可以使用以下方法拆卸它:
##代码##This is primarily the solution.
这主要是解决方案。
BTW, this is a different code, to see if it works. That is why the assembly above is a bit different. The code above is from this c file:
顺便说一句,这是一个不同的代码,看看它是否有效。这就是为什么上面的程序集有点不同的原因。上面的代码来自这个c文件:
##代码##But!
但!
if this does not work, then you still have some hints:
如果这不起作用,那么您仍然有一些提示:
You should be looking to set breakpoints in the beginning of all functions from now on. They are just before a ret
or leave
. The first entry point is .text
itself. This is the assembly start, but not the main.
从现在开始,您应该希望在所有函数的开头设置断点。它们就在 aret
或之前leave
。第一个入口点是.text
它自己。这是组装的开始,但不是主要的。
The problem is that not always a breakpoint will let your program run. Like this one in the very .text
:
问题是断点并不总是能让你的程序运行。非常喜欢这个.text
:
So you need to keep trying until you find your way, setting breakpoints at:
因此,您需要不断尝试,直到找到自己的方法,在以下位置设置断点:
##代码##From the other answer, we should keep this info:
从另一个答案中,我们应该保留此信息:
In the non-striped version of the file, we see:
在文件的非条纹版本中,我们看到:
##代码##Now we know that main is at 0x0000000000400524,0x0000000000400539
. If we use the same offset to look at the striped binary we get the same results:
现在我们知道 main 在0x0000000000400524,0x0000000000400539
。如果我们使用相同的偏移量查看条带化二进制文件,我们会得到相同的结果:
So, unless you can get some tip where the main starts (like using another code with symbols), another way is if you can have some info about the firsts assembly instructions, so you can disassembly at specifics places and look if it matches. If you have no access at all to the code, you still can read the ELF definitionto understand how many sections should appear in the code and try a calculated address. Still, you need info about sections in the code!
所以,除非你能在主开始的地方得到一些提示(比如使用另一个带有符号的代码),另一种方法是如果你可以获得关于第一个汇编指令的一些信息,那么你可以在特定的地方反汇编并查看它是否匹配。如果您根本无法访问代码,您仍然可以阅读ELF 定义以了解代码中应出现多少节并尝试计算地址。尽管如此,您仍需要有关代码部分的信息!
That is hard work, my friend! Good luck!
这是艰苦的工作,我的朋友!祝你好运!
Beco
贝科
回答by Laurent G
IIRC, x/i <location>
is your friend. Of course you have to figure out which location you want to disassemble yourself.
IIRC,x/i <location>
是你的朋友。当然你要自己搞清楚要在哪个位置拆机。
回答by Mat
How about doing info files
to get the section list (with addresses), and going from there?
如何info files
获取部分列表(带地址),然后从那里开始?
Example:
例子:
##代码##The disassemble .init
:
拆卸.init
:
Then go ahead and disassemble the rest.
然后继续分解其余部分。
If I were you, and I had the same GCC version as your executable was built with, I'd examine the sequence of functions called on a dummy non-stripped executable. The sequence of calls is probably similar in most usual cases, so that might help you grind through the startup sequence up to your main
by comparison. Optimizations will probably come in the way though.
如果我是你,并且我有与你的可执行文件相同的 GCC 版本,我会检查在一个虚拟的非剥离可执行文件上调用的函数序列。在大多数常见情况下,调用顺序可能是相似的,因此这可能会帮助您main
通过比较来了解启动顺序。不过,优化可能会妨碍。
If your binary is stripped and optimized, main
might not exist as an "entity" in the binary; chances are you can't get much better than this type of procedure.
如果您的二进制文件被剥离和优化,则main
可能不作为二进制文件中的“实体”存在;很可能你不能比这种类型的程序更好。
回答by Kevin
There's a great new free tool called unstrip from the paradyn project (full disclosure: I work on this project) that will rewrite your program binary, adding symbol information to it, and recover all (or nearly all) of the functions in stripped Elf binaries for you, with great accuracy. It won't identify the main function as "main", but it will find it, and you can apply the heuristic you already mentioned above to figure out which function is main.
有一个很棒的新免费工具叫做 unstrip from paradyn 项目(完全披露:我在这个项目上工作)它将重写你的程序二进制文件,向它添加符号信息,并恢复剥离的精灵二进制文件中的所有(或几乎所有)函数给你,非常准确。它不会将 main 函数标识为“main”,但它会找到它,您可以应用上面已经提到的启发式方法来确定哪个函数是 main。
http://www.paradyn.org/html/tools/unstrip.html
http://www.paradyn.org/html/tools/unstrip.html
I'm sorry this isn't a gdb-only solution.
很抱歉,这不是仅适用于 gdb 的解决方案。