如何反汇编、修改然后重新组装 Linux 可执行文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4309771/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 00:11:05  来源:igfitidea点击:

How to disassemble, modify and then reassemble a Linux executable?

linuxx86disassemblyobjdump

提问by FlagCapper

Is there anyway this can be done? I've used objdump but that doesn't produce assembly output that will be accepted by any assembler that I know of. I'd like to be able to change instructions within an executable and then test it afterwards.

有没有办法做到这一点?我使用过 objdump ,但它不会产生任何我知道的汇编器都会接受的汇编输出。我希望能够更改可执行文件中的指令,然后对其进行测试。

采纳答案by mgiuca

I don't think there is any reliable way to do this. Machine code formats are very complicated, more complicated than assembly files. It isn't really possible to take a compiled binary (say, in ELF format) and produce a source assembly program which will compile to the same (or similar-enough) binary. To gain an understanding of the differences, compare the output of GCC compiling direct to assembler (gcc -S) versus the output of objdump on the executable (objdump -D).

我认为没有任何可靠的方法可以做到这一点。机器码格式非常复杂,比汇编文件还要复杂。实际上不可能采用编译后的二进制文件(例如,ELF 格式)并生成一个源汇编程序,该程序将编译为相同(或足够相似)的二进制文件。要了解差异,请将 GCC 直接编译到汇编程序 ( gcc -S) 的输出与可执行文件 ( objdump -D)上的 objdump 输出进行比较。

There are two major complications I can think of. Firstly, the machine code itself is not a 1-to-1 correspondence with assembly code, because of things like pointer offsets.

我能想到两个主要的并发症。首先,由于指针偏移之类的原因,机器代码本身与汇编代码不是一一对应的。

For example, consider the C code to Hello world:

例如,考虑到 Hello world 的 C 代码:

int main()
{
    printf("Hello, world!\n");
    return 0;
}

This compiles to the x86 assembly code:

这将编译为 x86 汇编代码:

.LC0:
    .string "hello"
    .text
<snip>
    movl    $.LC0, %eax
    movl    %eax, (%esp)
    call    printf

Where .LCO is a named constant, and printf is a symbol in a shared library symbol table. Compare to the output of objdump:

其中 .LCO 是一个命名常量,printf 是共享库符号表中的一个符号。与 objdump 的输出进行比较:

80483cd:       b8 b0 84 04 08          mov    
     f40:   aa1503e3    mov x3, x21
     f44:   97fffeeb    bl  af0 <error@plt>
     f48:   f94013f7    ldr x23, [sp, #32]
x80484b0,%eax 80483d2: 89 04 24 mov %eax,(%esp) 80483d5: e8 1a ff ff ff call 80482f4 <printf@plt>

Firstly, the constant .LC0 is now just some random offset in memory somewhere -- it would be difficult to create an assembly source file which contains this constant in the correct place, since the assembler and linker are free to choose locations for these constants.

首先,常量 .LC0 现在只是内存中某处的一些随机偏移量——很难创建一个在正确位置包含这个常量的汇编源文件,因为汇编器和链接器可以自由地为这些常量选择位置。

Secondly, I'm not entirely sure about this (and it depends on things like position independent code), but I believe the reference to printf is not actually encoded at the pointer address in that code there at all, but the ELF headers contain a lookup table which dynamically replaces its address at runtime. Therefore, the disassembled code doesn't quite correspond to the source assembly code.

其次,我对此并不完全确定(这取决于位置无关代码之类的东西),但我相信对 printf 的引用实际上并未在该代码中的指针地址处进行编码,但 ELF 标头包含一个在运行时动态替换其地址的查找表。因此,反汇编的代码与源汇编代码并不完全对应。

In summary, source assembly has symbolswhile compiled machine code has addresseswhich are difficult to reverse.

总之,源程序集具有符号,而编译后的机器代码具有难以反转的地址

The second major complication is that an assembly source file can't contain all of the information that was present in the original ELF file headers, like which libraries to dynamically link against, and other metadata placed there by the original compiler. It would be difficult to reconstruct this.

第二个主要问题是汇编源文件不能包含原始 ELF 文件头中存在的所有信息,例如动态链接到哪些库,以及原始编译器放置在那里的其他元数据。这将很难重建。

Like I said, it's possible that a special tool can manipulate all of this information, but it is unlikely that one can simply produce assembly code which can be reassembled back to the executable.

就像我说的那样,一种特殊的工具可能可以处理所有这些信息,但不太可能简单地生成可以重新组装回可执行文件的汇编代码。

If you are interested in modifying just a small section of the executable, I recommend a much more subtle approach than recompiling the whole application. Use objdump to get the assembly code for the function(s) you are interested in. Convert it to "source assembly syntax" by hand (and here, I wish there was a tool that actually produced disassembly in the same syntax as the input), and modify it as you wish. When you are done, recompile just those function(s) and use objdump to figure out the machine code for your modified program. Then, use a hex editor to manually paste the new machine code over the top of the corresponding part of the original program, taking care that your new code is precisely the same number of bytes as the old code (or all the offsets would be wrong). If the new code is shorter, you can pad it out using NOP instructions. If it is longer, you may be in trouble, and might have to create new functions and call them instead.

如果您只想修改可执行文件的一小部分,我推荐一种比重新编译整个应用程序更微妙的方法。使用 objdump 获取您感兴趣的函数的汇编代码。手动将其转换为“源汇编语法”(在这里,我希望有一个工具能够以与输入相同的语法实际生成反汇编) ,并根据需要对其进行修改。完成后,仅重新编译这些函数并使用 objdump 找出修改后的程序的机器代码。然后,使用十六进制编辑器手动将新机器代码粘贴到原始程序相应部分的顶部,注意您的新代码与旧代码的字节数完全相同(否则所有偏移量都会出错) )。如果新代码更短,您可以使用 NOP 说明将其填充。如果它更长,您可能会遇到麻烦,并且可能不得不创建新函数并调用它们。

回答by Cine

For changing code inside of an binary assembly, there are generally 3 ways to do it.

要更改二进制程序集中的代码,通常有 3 种方法。

  • If it is just some trivial thing like a constant, then you just change the location with a hex editor. Assuming you can find it to begin with.
  • If you need to alter code, then utilize the LD_PRELOAD to overwrite some function in your program. That doesn't work if the function is not in the function tables though.
  • Hack the code at the function you want to fix to be a direct jump to a function you load via LD_PRELOAD and then jump back to the same location (This is a combi of the above two)
  • 如果它只是一些琐碎的事情,例如常量,那么您只需使用十六进制编辑器更改位置即可。假设你可以找到它开始。
  • 如果您需要更改代码,则使用 LD_PRELOAD 来覆盖程序中的某些功能。如果函数不在函数表中,那将不起作用。
  • 修改你想要修复的函数处的代码,直接跳转到你通过 LD_PRELOAD 加载的函数,然后跳回到同一个位置(这是上面两个的组合)

Ofcourse only the 2nd one will work, if the assembly does any kind of self-integrity-check.

当然,如果程序集进行任何类型的自我完整性检查,则只有第二个会起作用。

Edit: If it isn't obvious then playing around with binary assemblies is VERY high-level developer stuff, and you will have a hard time asking about it here, unless it is really specific things you ask.

编辑:如果它不明显,那么使用二进制程序集是非常高级的开发人员的东西,并且您将很难在这里询问它,除非您问的是真正的具体问题。

回答by Grzegorz Wierzowiecki

Another thing you might be interested to do:

您可能有兴趣做的另一件事:

  • binary instrumentation - changing existing code
  • 二进制检测 - 更改现有代码

If interested, check out: Pin, Valgrind (or projects doing this: NaCl - Google's Native Client, maybe QEmu.)

如果有兴趣,请查看:Pin、Valgrind(或执行此操作的项目:NaCl - Google 的 Native Client,也许是 QEmu。)

回答by user502515

You can run the executable under supervision of ptrace (in other words, a debugger like gdb) and in that way, control execution as you go, without modifying the actual file. Of course, requires the usual editing skills like finding where particular instructions you want to influence are in the executable.

您可以在 ptrace(换句话说,像 gdb 这样的调试器)的监督下运行可执行文件,这样,您可以随时控制执行,而无需修改实际文件。当然,需要通常的编辑技能,例如找到您想要影响的特定指令在可执行文件中的位置。

回答by ilpelle

@mgiuca has correctly addressed this answer from a technical point of view. In fact, disassemblying an executable program into an easy-to-recompile assembly source is not an easy task.

@mgiuca 从技术角度正确地解决了这个答案。事实上,将一个可执行程序反汇编成一个易于重新编译的汇编源代码并不是一件容易的事。

To add some bits to the discussion, there are a couple of techniques/tools which could be interesting to explore, although they are technically complex.

为讨论添加一些内容,有一些技术/工具可能很有趣,尽管它们在技术上很复杂。

  1. Static/Dynamic instrumentation. This technique entails analyzing the executable format, insert/delete/replace specific assembly instructions for a given purpose, fix all references to variables/functions in the executable, and the emit a new modified executable. Some tools which I know of are: PIN, HiHymaner, PEBIL, DynamoRIO. Consider that configuring such tools to a purpose different from what they were designed for could be tricky, and requires understanding of both executable formats and instruction sets.
  2. Full executable decompilation. This technique tries to reconstruct a full assembly source from an executable. You might want to give a glance to the Online Disassembler, which tries to do the job. You lose anyhow information about different source modules and possibly functions/variable names.
  3. Retargetable decompilation. This technique tries to extract more information from the executable, looking at compiler fingerprints(i.e., patterns of code generated by known compilers) and other deterministic stuff. The main goal is to reconstruct higher-level source code, like C source, from an executable. This is sometimes able to regain information about functions/variables names. Consider that compiling sources with -goften offers better results. You might want to give the Retargetable Decompilera try.
  1. 静态/动态仪器。这种技术需要分析可执行文件的格式,为给定目的插入/删除/替换特定的汇编指令,修复对可执行文件中变量/函数的所有引用,并发出一个新的修改后的可执行文件。我所知道的一些工具是:PINHiHymanerPEBILDynamoRIO。考虑到将此类工具配置为与其设计目的不同的目的可能会很棘手,并且需要了解可执行格式和指令集。
  2. 完整的可执行反编译。此技术尝试从可执行文件重建完整的程序集源。您可能想看一眼Online Disassembler,它试图完成这项工作。无论如何,您会丢失有关不同源模块和可能的函数/变量名称的信息。
  3. 可重定向反编译。这种技术试图从可执行文件中提取更多信息,查看编译器指纹(即已知编译器生成的代码模式)和其他确定性的东西。主要目标是从可执行文件重建更高级别的源代码,如 C 源代码。这有时能够重新获得有关函数/变量名称的信息。考虑编译源代码-g通常会提供更好的结果。您可能想尝试一下Retargetable Decompiler

Most of this comes from vulnerbility assessment and execution analysis research fields. They are complex techniques and often the tools cannot be used immediately out of the box. Nevertheless, they provide invaluable help when trying to reverse engineer some software.

其中大部分来自漏洞评估和执行分析研究领域。它们是复杂的技术,通常这些工具不能立即使用。然而,它们在尝试对某些软件进行逆向工程时提供了宝贵的帮助。

回答by mtraceur

I do this with hexdumpand a text editor. You have to be reallycomfortable with the machine code and the file format storing it, and flexible with what counts as "disassemble, modify, and then reassemble".

我用hexdump一个文本编辑器来做这件事。您必须非常熟悉机器代码和存储它的文件格式,并且灵活地处理“反汇编、修改和重新组装”。

If you can get away with making just "spot changes" (rewriting bytes, but not adding nor removing bytes), it'll be easy (relatively speaking).

如果您可以只进行“现场更改”(重写字节,但不添加或删除字节),那将很容易(相对而言)。

You reallydon't want to displace any existing instructions, because then you'd have to manually adjust any effected relative offset within the machine code, for jumps/branches/loads/stores relative to the program counter, both in hardcoded immediatevalues andones computed through registers.

真的不想替换任何现有指令,因为那样您必须手动调整机器代码中任何受影响的相对偏移量,用于相对于程序计数器的跳转/分支/加载/存储,无论是硬编码的立即还是通过寄存器计算的。

You should always be able to get away with not removing bytes. Adding bytes might be necessary for more complex modifications, and gets a lot harder.

您应该始终能够避免不删除字节。对于更复杂的修改,可能需要添加字节,并且会变得更加困难。

Step 0 (preparation)

步骤 0(准备)

After you've actuallydisassembled the file properly with objdump -Dor whatever you normally use first to actually understand it and find the spots you need to change, you'll need to take note of the following things to help you locate the correct bytes to modify:

在您实际使用objdump -D或您通常首先使用的任何方法正确反汇编文件以实际理解它并找到需要更改的位置后,您需要注意以下事项以帮助您找到要修改的正确字节:

  1. The "address" (offset from the start of the file) of the bytes you need to change.
  2. The raw value of those bytes as they currently are (the --show-raw-insnoption to objdumpis really helpful here).
  1. 您需要更改的字节的“地址”(从文件开头的偏移量)。
  2. 这些字节当前的原始值(这里的--show-raw-insn选项objdump非常有用)。

You'll also need to check if hexdump -Rworks on your system. If not, then for the rest of these steps, use the xxdcommand or similar instead of hexdumpin all of the steps below (consult the documentation for whatever tool you use, I only explain hexdumpfor now in this answer because that is the one I am familiar with).

您还需要检查是否hexdump -R在您的系统上工作。如果没有,那么对于这些​​步骤的其余部分,请使用该xxd命令或类似命令而不是hexdump以下所有步骤(请查阅您使用的任何工具的文档,我hexdump现在只在此答案中解释,因为那是我熟悉的和)。

Step 1

第1步

Dump the raw hexadecimal representation of the binary file with hexdump -Cv.

转储二进制文件的原始十六进制表示hexdump -Cv

Step 2

第2步

Open the hexdumped file and find the bytes at the address you're looking to change.

打开hexdumped 文件并在您要更改的地址处找到字节。

Quick crash course in hexdump -Cvoutput:

hexdump -Cv输出中的快速速成课程:

  1. The left-most column is the addresses of the bytes (relative to the start of the binary file itself, just like objdumpprovides).
  2. The right-most column (surrounded by |characters) is just "human readable" representation of the bytes - the ASCII character matching each byte is written there, with a .standing in for all bytes which don't map to an ASCII printable character.
  3. The important stuff is in between - each byte as two hex digits separated by spaces, 16 bytes per line.
  1. 最左边的列是字节的地址(相对于二进制文件本身的开头,就像objdump提供一样)。
  2. 最右边的列(由|字符包围)只是字节的“人类可读”表示 - 与每个字节匹配的 ASCII 字符写在那里,.代表所有未映射到 ASCII 可打印字符的字节。
  3. 重要的东西介于两者之间 - 每个字节为两个由空格分隔的十六进制数字,每行 16 个字节。

Beware: Unlike objdump -D, which gives you the address of each instruction and shows the raw hex of the instruction based on how it's documented as being encoded, hexdump -Cvdumps each byte exactly in the order it appears in the file. This can be a little confusing as first on machines where the instruction bytes are in opposite order due to endianness differences, which can also be disorienting when you're expecting a specific byte as a specific address.

当心: 不像objdump -D,它为您提供每条指令的地址并根据指令的编码方式显示指令的原始十六进制,hexdump -Cv完全按照它在文件中出现的顺序转储每个字节。这可能有点令人困惑,因为首先在由于字节序差异而指令字节顺序相反的机器上,当您期望特定字节作为特定地址时,这也可能令人迷惑。

Step 3

第 3 步

Modify the bytes that need to change - you obviously need to figure out the raw machine instruction encoding (not the assembly mnemonics) and manually write in the correct bytes.

修改需要更改的字节 - 您显然需要弄清楚原始机器指令编码(而不是汇编助记符)并手动写入正确的字节。

Note: You don'tneed to change the human-readable representation in the right-most column. hexdumpwill ignore it when you "un-dump" it.

注意:您没有需要改变的最右列中的人类可读表示。hexdump当您“取消转储”它时将忽略它。

Step 4

第四步

"Un-dump" the modified hexdump file using hexdump -R.

使用hexdump -R.

Step 5 (sanity check)

第 5 步(健全性检查)

objdumpyour newly unhexdumped file and verify that the disassembly that you changed looks correct. diffit against the objdumpof the original.

objdump您新的 un hexdumped 文件并验证您更改的反汇编看起来正确。diff它反对objdump原来的。

Seriously, don't skip this step. I make a mistake more often than not when manually editing the machine code and this is how I catch most of them.

说真的,不要跳过这一步。在手动编辑机器代码时,我经常犯错误,这就是我捕获大部分错误的方式。

Example

例子

Here's a real-life worked example from when I modified an ARMv8 (little endian) binary recently. (I know, the question is tagged x86, but I don't have an x86 example handy, and the fundamental principles are the same, just the instructions are different.)

这是我最近修改 ARMv8(小端)二进制文件时的真实工作示例。(我知道,问题被标记为x86,但我手边没有 x86 示例,基本原理相同,只是说明不同。)

In my situation I needed to disable a specific "you shouldn't be doing this" hand-holding check: in my example binary, in objdump --show-raw-insn -doutput the line I cared about looked like this (one instruction before and after given for context):

在我的情况下,我需要禁用特定的“你不应该这样做”的手持检查:在我的示例二进制文件中,在objdump --show-raw-insn -d输出中,我关心的行看起来像这样(前后一条指令用于上下文):

00000f40  e3 03 15 aa eb fe ff 97  f7 13 40 f9 e8 02 40 39  |..........@...@9|

As you can see, our program is "helpfully" exiting by jumping into an errorfunction (which terminates the program). Unacceptable. So we're going to turn that instruction into a no-op. So we're looking for the bytes 0x97fffeebat the address/file-offset 0xf44.

正如您所看到的,我们的程序通过跳转到一个error函数(它终止程序)来“帮助”退出。不可接受。所以我们要把这条指令变成一个空操作。所以我们0x97fffeeb在 address/file-offset寻找字节0xf44

Here is the hexdump -Cvline containing that offset.

这是hexdump -Cv包含该偏移量的行。

00000f40  -- -- -- -- eb fe ff 97  -- -- -- -- -- -- -- --  |..........@...@9|
                      ^
                      This is offset f44, holding the least significant byte
                      So the *instruction as a whole* is at the expected offset,
                      just the bytes are flipped around. Of course, whether the
                      order matches or not will vary with the architecture.

Notice how the relevant bytes are actually flipped (little endian encoding in the architecture applies to machine instructions like to anything else) and how this slightly unintuitively relates to what byte is at what byte offset:

请注意相关字节实际上是如何翻转的(架构中的小端编码适用于机器指令,就像其他任何东西一样),以及这与哪个字节在哪个字节偏移处的关系有点不直观:

00000f40  e3 03 15 aa 1f 20 03 d5  f7 13 40 f9 e8 02 40 39  |..........@...@9|

Anyway, I know from looking at other disassembly that 0xd503201fdisassembles to nopso that seems like a good candidate for my no-op instruction. I modifies the line in the hexdumped file accordingly:

无论如何,我从查看其他反汇编中了解到0xd503201fnop这似乎是我的无操作指令的一个很好的候选者。我相应地修改了hexdumped 文件中的行:

     f40:   aa1503e3    mov x3, x21
     f44:   d503201f    nop
     f48:   f94013f7    ldr x23, [sp, #32]

Converted back into binary with hexdump -R, disassembled the new binary with objdump --show-raw-insn -dand verified that the change was correct:

用 转换回二进制文件,用hexdump -R反汇编新的二进制文件objdump --show-raw-insn -d并验证更改是否正确:

     f2c:   350000e8    cbnz    w8, f48
     f30:   b0000002    adrp    x2, 1000
     f34:   91128442    add x2, x2, #0x4a1
     f38:   320003e0    orr w0, wzr, #0x1
     f3c:   2a1f03e1    mov w1, wzr
     f40:   aa1503e3    mov x3, x21
     f44:   97fffeeb    bl  af0 <error@plt>
     f48:   f94013f7    ldr x23, [sp, #32]

Then I ran the binary and got the behavior I wanted - the relevant check no longer caused the program to abort.

然后我运行二进制文件并得到我想要的行为 - 相关检查不再导致程序中止。

Machine code modification successful.

机器码修改成功。

!!! Warning !!!

!!!警告 !!!

Or was I successful? Did you spot what I missed in this example?

还是我成功了?你有没有发现我在这个例子中遗漏了什么?

I am sure you did - since you're asking about how to manually modify the machine code of a program, you presumably know what you're doing. But for the benefit of any readers who might be reading to learn, I'll elaborate:

我相信你做到了——因为你问的是如何手动修改程序的机器代码,你大概知道自己在做什么。但为了任何可能正在阅读以学习的读者的利益,我将详细说明:

I only changed the lastinstruction in the error-case branch! The jump into the function that exits the problem. But as you can see, register x3was being modified by the movjust above! In fact, a total of four (4)registers were modified as part of the preamble to call error, and one register was. Here's the full machine code for that branch, starting from the conditional jump over the ifblock and ending where the jump goes to if the conditional ifisn't taken:

我只更改了错误案例分支中的最后一条指令!跳转到退出问题的函数。但是正如你所看到的, registerx3正在被mov上面的修改!事实上,作为调用前导码的一部分,总共修改了四 (4) 个寄存器error,其中一个寄存器是。这是该分支的完整机器代码,从条件跳转到if块开始,如果if没有采用条件,则跳转到的位置结束:

     f2c:   14000007    b   f48
     f30:   b0000002    adrp    x2, 1000
     f34:   91128442    add x2, x2, #0x4a1
     f38:   320003e0    orr w0, wzr, #0x1
     f3c:   2a1f03e1    mov w1, wzr
     f40:   aa1503e3    mov x3, x21
     f44:   97fffeeb    bl  af0 <error@plt>
     f48:   f94013f7    ldr x23, [sp, #32]

All of the code after the branch was generated by the compiler on the assumption that the program state was as it was before the conditional jump! But by just making the final jump to the errorfunction code a no-op, I created a code path where we reach that code with inconsistent/incorrect program state!

分支之后的所有代码都是由编译器生成的,前提是程序状态与条件跳转之前一样!但是,通过将最后一个跳转到error函数代码的操作设为无操作,我创建了一个代码路径,我们可以在其中以不一致/不正确的程序状态到达该代码!

In my case, this actually seemed tonot cause any problems. So I got lucky. Verylucky: only after I already ran my modified binary (which, incidentally, was a security-critical binary: it had the capability to setuid, setgid, and change SELinux context!) did I realize that I forgot to actually trace the code paths of whether those register changes effected the code paths that came later!

就我而言,这实际上似乎没有引起任何问题。所以我很幸运。幸运的,只有在我已经跑了我的改进型二(顺便说一下,是一个安全关键二进制:它有能力setuidsetgid以及变化的SELinux!),我才意识到,我忘了其实追查是否代码路径这些寄存器更改影响了后来的代码路径!

That could've been catastrophic - any one of those registers might've been used in later code with the assumption that it contained a previous value that now got overwritten! And I'm the kind of person that people know for meticulous careful thought about code and as a pedant and stickler for always being conscientious of computer security.

这可能是灾难性的——这些寄存器中的任何一个都可能在以后的代码中使用,并假设它包含一个现在被覆盖的先前值!而且我是那种人们知道的那种对代码一丝不苟的人,并且作为一个总是认真对待计算机安全的学究和坚持者。

What if I was calling a function where the arguments spilled from the registers onto the stack (as is very common on, for example, x86)? What if there was actually multiple conditional instructions in the instruction set that preceded the conditional jump (as is common on, for example, older ARM versions)? I would've been in an even more recklessly inconsistent state after having done that simplest-seeming change!

如果我调用一个函数,其中参数从寄存器溢出到堆栈上(这在 x86 上很常见)怎么办?如果在条件跳转之前的指令集中实际上有多个条件指令(例如,在较旧的 ARM 版本上很常见)怎么办?在完成了那个看似最简单的更改后,我会处于更加鲁莽的不一致状态!

So this my cautionary reminder:Manually twiddling with binaries is literally stripping away everysafetybetween you and what the machine and operating system will permit. Literally allthe advances that we have made in our tools to automatically catch mistakes our programs, gone.

所以这是我的警告:手动处理二进制文件实际上剥夺了你与机器和操作系统允许的一切之间的安全。从字面上看,我们在工具中为自动捕获程序错误而取得的所有进步都消失了

So how do we fix this more properly? Read on.

那么我们如何更正确地解决这个问题呢?继续阅读。

Removing Code

删除代码

To effectively/logically"remove" more than one instruction, you can replace the first instruction you want to "delete" with an unconditional jump to the first instruction at the end of the "deleted" instructions. For this ARMv8 binary, that looked like this:

为了有效/逻辑地“删除”多条指令,您可以用无条件跳转到“已删除”指令末尾的第一条指令来替换要“删除”的第一条指令。对于这个 ARMv8 二进制文件,它看起来像这样:

     f2c:   d503201f    nop
     f30:   d503201f    nop
     f34:   d503201f    nop
     f38:   d503201f    nop
     f3c:   d503201f    nop
     f40:   d503201f    nop
     f44:   d503201f    nop
     f48:   f94013f7    ldr x23, [sp, #32]

Basically, you "kill" the code (turn it into "dead code"). Sidenote: You can do something similar with literal strings embedded in the binary: so long as you want to replace it with a smaller string, you can almost always get away with overwriting the string (including the terminating null byte if it's a "C-string") and if necessary overwriting the hard-coded size of the string in the machine code that uses it.

基本上,你“杀死”了代码(把它变成“死代码”)。旁注:您可以对嵌入在二进制文件中的文字字符串做类似的事情:只要您想用较小的字符串替换它,您几乎总是可以覆盖字符串(包括终止空字节,如果它是“C- string") 并在必要时覆盖使用它的机器代码中字符串的硬编码大小。

You can also replace all unwanted instructions with no-ops. In other words, we can turn the unwanted code into what's called a "no-op sled":

您还可以用无操作替换所有不需要的指令。换句话说,我们可以将不需要的代码变成所谓的“无操作雪橇”:

##代码##

I would expect that that's just wasting CPU cycles relative to jumping over them, butit is simplerand thus safer against mistakes, because you don't have to manually figure out how to encode the jump instruction including figuring out the offset/address to use in it - you don't have to think as muchfor a no-op sled.

我希望这只是相对于跳过它们浪费 CPU 周期,更简单,因此更安全,不会出错,因为您不必手动弄清楚如何对跳转指令进行编码,包括计算要使用的偏移量/地址在其中 - 您不必为无操作雪橇考虑太多

To be clear, error is easy: I messed up two (2)times when manually encoding that unconditional branch instruction. And it's not always our fault: the first time was because the documentation I had was outdated/wrong and said one bit was ignored in the encoding, when it actually wasn't, so I set it to zero on my first try.

需要明确的是,错误很容易:在手动编码该无条件分支指令时,我搞砸了两 (2)次。而且这并不总是我们的错:第一次是因为我的文档已经过时/错误,并说编码中忽略了一位,而实际上并没有,所以我在第一次尝试时将其设置为零。

Adding Code

添加代码

You couldtheoretically use this technique to addmachine instructions too, but it's more complex, and I've never had to do it, so I don't have a worked example at this time.

理论上你也可以使用这种技术来添加机器指令,但它更复杂,我从来没有这样做过,所以我目前没有一个可行的例子。

From a machine code perspective it's sorta easy: pick one instruction at the spot you want to add code, and convert it into a jump instruction to the new code that you need add (don't forget to add the instruction(s) you thus replaced into the new code, unless you didn't need that for your added logic, and to jump back to the instruction you want to come back to at the end of the addition). Basically, you're "splicing" the new code in.

从机器代码的角度来看,这很简单:在您要添加代码的位置选择一条指令,然后将其转换为跳转指令到您需要添加的新代码(不要忘记添加您这样的指令)替换到新代码中,除非您在添加的逻辑中不需要它,并跳回到添加结束时要返回的指令)。基本上,您是在“拼接”新代码。

But you have to find a spot to actually put that new code, and this is the hard part.

但是您必须找到一个位置来实际放置新代码,这是困难的部分。

If you're reallylucky, you can just append the new machine code at the end of the file, and it'll "just work": the new code will get loaded along with the rest into the same expected machine instructions, into your address space space that falls into a memory page properly marked executable.

如果你真的很幸运,你可以在文件的末尾附加新的机器代码,它会“正常工作”:新代码将与其他代码一起加载到相同的预期机器指令中,到你的落入正确标记为可执行的内存页面的地址空间空间。

In my experience hexdump -Rignores not just the right-most column but the left-most column too - so you could literally just put zero addresses for all manually added lines and it'll work out.

根据我的经验,不仅会hexdump -R忽略最右侧的列,还会忽略最左侧的列 - 因此您实际上可以为所有手动添加的行设置零地址,它就会解决。

If you're less lucky, after adding the code you'll have to actually adjust some header values within the same file: if the loader for your operating system expects the binary to contain metadata describing the size of the executable section (for historical reasons often called the "text" section) you'll have to find and adjust that. In the old days binaries were just raw machine code - nowadays the machine code is wrapped in a bunch of metadata (for example ELF on Linux and some others).

如果运气不好,在添加代码后,您实际上必须调整同一文件中的一些标头值:如果操作系统的加载程序希望二进制文件包含描述可执行部分大小的元数据(由于历史原因)通常称为“文本”部分)您必须找到并调整它。在过去,二进制文件只是原始机器代码 - 现在机器代码被包装在一堆元数据中(例如 Linux 上的 ELF 和其他一些)。

If you're still a little lucky, you might have some "dead" spot in the file which does get properly loaded up as part of the binary at the same relative offsets as the rest of the code that's already in the file (and that dead spot can fit your code and is properly aligned if your CPU requires word-alignment for CPU instructions). Then you can overwrite it.

如果你仍然有点幸运,你可能在文件中有一些“死”点,它确实作为二进制文件的一部分以与文件中已经存在的其余代码相同的相对偏移量正确加载(以及如果您的 CPU 需要 CPU 指令的字对齐,死点可以适合您的代码并正确对齐)。然后你可以覆盖它。

If you're really unlucky you can't just append code and there is no dead space you can fill with your machine code. At that point, you basically have to be intimately familiar with the executable format and hope that you can figure out something within those constraints that is humanly feasible to pull off manually within a reasonable amount fo time and with a reasonable chance of not messing it up.

如果你真的很不走运,你不能只附加代码,而且没有死空间可以用你的机器代码填充。在这一点上,您基本上必须非常熟悉可执行格式,并希望您能够在这些限制范围内找出一些人为可行的方法,可以在合理的时间内手动完成,并且有合理的机会不会弄乱它.

回答by Albert van der Horst

My "ci assembler disassembler" is the only system that I know is that is designed around the principle that whatever the disassembly is, it must reassemble to the byte for byte same binary.

我的“ci assembler disassembler”是我所知道的唯一一个系统,它的设计原则是无论反汇编是什么,它都必须重新组装成字节相同的二进制文件。

https://github.com/albertvanderhorst/ciasdis

https://github.com/albertvanderhorst/ciasdis

There are two examples given of elf-executables with their disassembly and reassembly. It was originally designed to be able to modify a booting system, consisting of code, interpreted code, data and graphic characters, with such niceties as a transition from real to protected mode. (It succeeded.) The examples demonstrate also the extraction of text from the executables, that is subsequently used for labels. The debian package is intended for Intel Pentium, but plug ins are available for Dec Alpha, 6809, 8086 etc.

给出了两个精灵可执行文件的反汇编和重组示例。它最初被设计为能够修改由代码、解释代码、数据和图形字符组成的引导系统,具有从真实模式到保护模式的转换等细节。(它成功了。)这些示例还演示了从可执行文件中提取文本,随后将其用于标签。debian 包适用于 Intel Pentium,但插件可用于 Dec Alpha、6809、8086 等。

The quality of the disassembly depends on how much effort you put into it. E.g., if you do not even supply the information that it is an elf file, the disassembly consist of single bytes, and the reassembly is trivial. In the examples I use a script that extracts labels, and makes for a truely usable reverse engineered program that is modifiable. You can insert or delete something and the automatically generated symbolic labels will get recalculated. With the tools provided labels are generated for all places where jumps end, and then the labels are used for those jumps. That means that in most case you can insert an instruction and reassemble the modified source.

拆卸的质量取决于您投入多少精力。例如,如果你甚至不提供它是一个 elf 文件的信息,反汇编由单个字节组成,重组是微不足道的。在示例中,我使用了一个提取标签的脚本,并制作了一个真正可用的可修改的逆向工程程序。您可以插入或删除某些内容,自动生成的符号标签将重新计算。使用提供的工具为跳转结束的所有位置生成标签,然后将标签用于这些跳转。这意味着在大多数情况下,您可以插入一条指令并重新组装修改后的源代码。

No assumption at all is made about the binary blob, but of course an Intel disassembly is of little use for a Dec Alpha binary.

对二进制 blob 根本没有任何假设,但当然,英特尔反汇编对 Dec Alpha 二进制文件几乎没有用。