windows 如何手动读取/写入 .exe 机器码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/756367/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read / write .exe machine code manually?
提问by Peter Perhá?
I am not well acquainted to the compiler magic. The act of transforming human-readable code (or the not reallyreadable Assembly instructions) into machine code is, for me, rocket science combined with sorcery.
我不太熟悉编译器的魔法。对我来说,将人类可读的代码(或不是真正可读的汇编指令)转换为机器代码的行为是结合魔法的火箭科学。
I will narrow down the subject of this question to Win32 executables (.exe). When I open these files up in a specialized viewer, I can find strings (usually 16b per character) scattered at various places, but the rest is just garbage. I suppose the unreadable part (majority) is the machine code (or maybe resources, such as images etc...).
我会将这个问题的主题缩小到 Win32 可执行文件 (.exe)。当我在专门的查看器中打开这些文件时,我可以找到散落在不同地方的字符串(通常每个字符 16b),但其余的都是垃圾。我想不可读的部分(大部分)是机器代码(或者可能是资源,例如图像等......)。
Is there any straightforward way of reading the machine code? Opening the exe as a file stream and reading it byte by byte, how could one turn these individual bytes into Assembly? Is there a straightforward mapping between these instruction bytesand the Assembly instruction?
有什么直接的方法可以读取机器码吗?将exe作为文件流打开并逐字节读取,如何将这些单个字节转换为Assembly?这些指令字节和汇编指令之间是否存在直接映射?
How is the .exe written? Four bytes per instruction? More? Less? I have noticed some applications can create executable files just like that: for example, in ACD See you can export a series of images into a slideshow. But this does not necessarily have to be a SWF slideshow, ACD See is also capable of producing EXEcutable presentations. How is that done?
.exe 是怎么写的?每条指令四个字节?更多的?较少的?我注意到一些应用程序可以像这样创建可执行文件:例如,在 ACD See 中,您可以将一系列图像导出到幻灯片中。但这不一定必须是 SWF 幻灯片,ACD See 也能够生成可执行的演示文稿。这是怎么做的?
How can I understand what goes on inside an EXE file?
如何理解 EXE 文件中的内容?
回答by dreamlax
OllyDbgis an awesome tool that disassembles an EXE into readable instructions and allows you to execute the instructions one-by-one. It also tells you what API functions the program uses and if possible, the arguments that it provides (as long as the arguments are found on the stack).
OllyDbg是一个很棒的工具,它可以将 EXE 反汇编成可读的指令,并允许您逐条执行指令。它还告诉您程序使用哪些 API 函数,如果可能,还告诉您它提供的参数(只要在堆栈中找到这些参数)。
Generally speaking, CPU instructions are of variable length, some are one byte, others are two, some three, some four etc. It mostly depends on the kind of data that the instruction expects. Some instructions are generalised, like "mov" which tells the CPU to move data from a CPU register to a place in memory, or vice versa. In reality, there are many different "mov" instructions, ones for handling 8-bit, 16-bit, 32-bit data, ones for moving data from different registers and so on.
一般来说,CPU指令的长度是可变的,有的是1字节,有的是2字节,有的是3字节,有的是4字节等等。这主要取决于指令期望的数据类型。一些指令是通用的,例如“mov”,它告诉 CPU 将数据从 CPU 寄存器移动到内存中的某个位置,反之亦然。实际上,有许多不同的“mov”指令,用于处理 8 位、16 位、32 位数据的指令,用于从不同寄存器移动数据的指令等等。
You could pick up Dr. Paul Carter's PC Assembly Language Tutorialwhich is a free entry level book that talks about assembly and how the Intel 386 CPU operates. Most of it is applicable even to modern day consumer Intel CPUs.
您可以阅读 Paul Carter 博士的PC 汇编语言教程,这是一本免费的入门级书籍,讨论了汇编以及 Intel 386 CPU 的运行方式。其中大部分甚至适用于现代消费者英特尔 CPU。
The EXE format is specific to Windows. The entry-point (i.e. the first executable instruction) is usually found at the same place within the EXE file. It's all kind of difficult to explain all at once, but the resources I've provided should help cure at least some of your curiosity! :)
EXE 格式特定于 Windows。入口点(即第一条可执行指令)通常位于 EXE 文件中的同一位置。一下子解释起来有点困难,但是我提供的资源应该至少可以帮助您治愈一些好奇心!:)
回答by Peter Perhá?
You need a disassembler which will turn the machine code into assembly language. This Wikipedia linkdescribes the process and provides links to free disassemblers. Of course, as you say you don't understand assembly language, this may not be very informative - what exactly are you trying to do here?
您需要一个反汇编器,它将机器代码转换为汇编语言。此Wikipedia 链接描述了该过程并提供了指向免费反汇编程序的链接。当然,正如你所说的你不懂汇编语言,这可能不是很有用 - 你到底想在这里做什么?
回答by grover
The executable file you see is Microsofts PE (Portable Executable) format. It is essentially a container, which holds some operating system specific data about a program and the program data itself split into several sections. For example code, resources, static data are stored in seperate sections.
你看到的可执行文件是微软的PE(Portable Executable)格式。它本质上是一个容器,其中包含有关程序的一些操作系统特定数据,并且程序数据本身分为几个部分。例如代码、资源、静态数据存储在单独的部分中。
The format of the section depends on what is in it. The code section holds the machine code according to the executable target architecture. In the most common cases this is Intel x86 or AMD-64 (same as EM64T) for Microsoft PE binaries. The format of the machine code is CISC and originates back to the 8086 and earlier. The important aspect of CISC is that its instruction size is not constant, you have to start reading at the right place to get something valuable out of it. Intel publishes good manuals on the x86/x64 instruction set.
该部分的格式取决于其中的内容。代码部分根据可执行目标架构保存机器代码。在最常见的情况下,这是用于 Microsoft PE 二进制文件的 Intel x86 或 AMD-64(与 EM64T 相同)。机器码的格式是 CISC,起源于 8086 及更早版本。CISC 的一个重要方面是它的指令大小不是恒定的,你必须从正确的地方开始阅读才能从中获得有价值的东西。Intel 发布了关于 x86/x64 指令集的优秀手册。
You can use a disassembler to view the machine code directly. In combination with the manuals you can guess the source code most of the time.
可以使用反汇编器直接查看机器码。大多数情况下,结合手册,您可以猜出源代码。
And then there's MSIL EXE: The .NET executables holding Microsofts Intermediate Language, these do not contain machine specific code, but .NET CIL code. The specifications for that are available online at the ECMA.
然后是 MSIL EXE:包含 Microsoft 中间语言的 .NET 可执行文件,它们不包含特定于机器的代码,而是 .NET CIL 代码。其规范可在 ECMA 在线获得。
These can be viewed with a tool such as Reflector.
这些可以使用反射器等工具查看。
回答by MaxVT
The contents of the EXE file are described in Portable Executable. It contains code, data, and instructions to OS on how to load the file.
可移植可执行文件中描述了 EXE 文件的内容。它包含有关如何加载文件的代码、数据和操作系统说明。
There is an 1:1 mapping between machine code and assembly. A disassembler program will perform the reverse operation.
机器代码和程序集之间存在 1:1 的映射。反汇编程序将执行相反的操作。
There isn't a fixed number of bytes per instruction on i386. Some are a single byte, some are much longer.
i386 上的每条指令没有固定的字节数。有些是单个字节,有些则更长。
回答by Dead account
You can use debug from the command line, but that's hard.
您可以从命令行使用调试,但这很难。
C:\WINDOWS>debug taskman.exe
-u
0D69:0000 0E PUSH CS
0D69:0001 1F POP DS
0D69:0002 BA0E00 MOV DX,000E
0D69:0005 B409 MOV AH,09
0D69:0007 CD21 INT 21
0D69:0009 B8014C MOV AX,4C01
0D69:000C CD21 INT 21
0D69:000E 54 PUSH SP
0D69:000F 68 DB 68
0D69:0010 69 DB 69
0D69:0011 7320 JNB 0033
0D69:0013 7072 JO 0087
0D69:0015 6F DB 6F
0D69:0016 67 DB 67
0D69:0017 7261 JB 007A
0D69:0019 6D DB 6D
0D69:001A 206361 AND [BP+DI+61],AH
0D69:001D 6E DB 6E
0D69:001E 6E DB 6E
0D69:001F 6F DB 6F
回答by U62
If it's as foreign to you as it seems, I don't think a debugger or disassembler is going to help - you need to learn assembler programming first; study the architecture of the processor (plenty of documentation downloadable from Intel). And then since most machine code is generated by compilers, you'll need to understand how compilers generate code - the simplest way to write lots of small programs and then disassemble them to see what your C/C++ is turned into.
如果它看起来对你来说很陌生,我认为调试器或反汇编器不会有帮助——你需要先学习汇编程序;研究处理器的架构(可从英特尔下载大量文档)。然后由于大多数机器代码是由编译器生成的,您需要了解编译器如何生成代码 - 编写大量小程序然后反汇编它们以查看您的 C/C++ 变成什么的最简单方法。
A couple of books that'll help you understand:-
几本书可以帮助您理解:-
回答by Marco van de Voort
To get an idea, set a breakpoint on some interesting code, and then go to the CPU window.
要获得一个想法,请在一些有趣的代码上设置断点,然后转到 CPU 窗口。
If you are interested in more, it is easier to compile short fragments with Free Pascal using the -al parameter.
如果您对更多感兴趣,使用 -al 参数使用 Free Pascal 编译短片段会更容易。
FPC allows to output the generated assembler in a multitude of assembler formats (TASM,MASM,GAS ) using the -A parameter, and you can have the original pascal code interleaved in comments (and more) for easy crossreference.
FPC 允许使用 -A 参数以多种汇编器格式(TASM、MASM、GAS)输出生成的汇编器,并且您可以将原始 pascal 代码插入注释(以及更多)中以便于交叉引用。
Because it is compiler generated assembler, as opposed to assembler from disassembled .exe, it is more symbolic and easier to follow.
因为它是编译器生成的汇编程序,而不是从反汇编的 .exe 中生成的汇编程序,它更具象征意义且更易于遵循。
回答by Coding With Style
Familiarity with low level assembly (and I mean low level assembly, not "macros" and that bull) is probably a must. If you really want to read the raw machine code itself directly, usually you would use a hex editor for that. In order to understand what the instructions do, however, most people would use a disassembler to convert that into the appropriate assembly instructions. If you're one of the minority who wants to understand the machine language itself, I think you'd want the Intel? 64 and IA-32 Architectures Software Developer's Manuals. Volume 2specifically covers the instruction set, which relates to your query about how to read machine code itself and how assembly relates to it.
熟悉低级程序集(我的意思是低级程序集,而不是“宏”和那头公牛)可能是必须的。如果你真的想直接读取原始机器代码本身,通常你会使用十六进制编辑器。然而,为了理解指令的作用,大多数人会使用反汇编器将其转换为适当的汇编指令。如果你是少数想要了解机器语言本身的人之一,我想你会想要英特尔?64 和 IA-32 架构软件开发人员手册。第 2 卷专门介绍了指令集,它与您关于如何读取机器代码本身以及汇编如何与之相关的查询有关。
回答by zeroin23
Just relating to this question, anyone still read things like CD 21?
就这个问题而言,还有人读CD 21之类的东西吗?
I remembered Sandra Bullock in one show, actually reading a screenful of hex numbers and figure out what the program does. Sort of like the current version of reading Matrix code.
我记得在一个节目中桑德拉·布洛克 (Sandra Bullock),实际上是阅读了一整屏的十六进制数字并弄清楚程序做了什么。有点像阅读矩阵代码的当前版本。
if you do read stuff like CD 21, how do you remember the different various combinations?
如果您确实阅读了 CD 21 之类的内容,您如何记住不同的组合?
回答by Dinah
Both your curiosity and your level of understanding is exactly where I was at one point. I highlyrecommend Code: The Hidden Language of Computer Hardware and Software. This will not answer all of the questions you ask here but it will shed light on some of the utterly black magic aspects of computers. It's a thick book but highly readable.
你的好奇心和你的理解水平正是我当时所处的位置。我强烈推荐代码:计算机硬件和软件的隐藏语言。这不会回答你在这里提出的所有问题,但它会阐明计算机的一些完全黑魔法的方面。这是一本厚厚的书,但可读性很强。