.net 本机代码、机器代码和汇编代码有什么区别?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3434202/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the difference between native code, machine code and assembly code?
提问by samaladeepak
I'm confused about machine code and native code in the context of .NET languages.
我对 .NET 语言上下文中的机器代码和本机代码感到困惑。
What is the difference between them? Are they the same?
它们之间有什么区别?他们是一样的吗?
回答by Timwi
The terms are indeed a bit confusing, because they are sometimes used inconsistently.
这些术语确实有点令人困惑,因为它们有时会被不一致地使用。
Machine code:This is the most well-defined one. It is code that uses the byte-code instructions which your processor (the physical piece of metal that does the actual work) understands and executes directly. All other code must be translated or transformed into machine codebefore your machine can execute it.
机器码:这是最明确的一种。它是使用处理器(执行实际工作的物理金属片)理解并直接执行的字节码指令的代码。在您的机器可以执行之前,所有其他代码必须被翻译或转换成机器代码。
Native code:This term is sometimes used in places where machine code(see above) is meant. However, it is also sometimes used to mean unmanaged code(see below).
本机代码:该术语有时用于表示机器代码(见上文)的地方。但是,它有时也用于表示非托管代码(见下文)。
Unmanaged codeand managed code:Unmanagedcode refers to code written in a programming language such as C or C++, which is compiled directly into machine code. It contrasts with managed code, which is written in C#, VB.NET, Java, or similar, and executed in a virtual environment (such as .NET or the JavaVM) which kind of “simulates” a processor in software. The main difference is that managed code“manages” the resources (mostly the memory allocation) for you by employing garbage collection and by keeping references to objects opaque. Unmanaged codeis the kind of code that requires you to manually allocate and de-allocate memory, sometimes causing memory leaks (when you forget to de-allocate) and sometimes segmentation faults (when you de-allocate too soon). Unmanagedalso usually implies there are no run-time checks for common errors such as null-pointer dereferencing or array bounds overflow.
非托管代码和托管代码:非托管代码是指用C或C++等编程语言编写的代码,直接编译成机器码。它与用C#、VB.NET、Java 或类似语言编写并在虚拟环境(如 .NET 或 JavaVM)中执行的托管代码形成对比,后者在软件中“模拟”了处理器。主要区别在于托管代码通过使用垃圾收集和保持对对象的引用不透明来为您“管理”资源(主要是内存分配)。非托管代码是那种需要您手动分配和取消分配内存的代码,有时会导致内存泄漏(当您忘记取消分配时),有时会导致分段错误(当您过早取消分配时)。非托管通常还意味着没有针对常见错误(例如空指针取消引用或数组边界溢出)的运行时检查。
Strictly speaking, most dynamically-typed languages — such as Perl, Python, PHP and Ruby — are also managed code. However, they are not commonly described as such, which shows that managed codeis actually somewhat of a marketing term for the really big, serious, commercial programming environments (.NET and Java).
严格来说,大多数动态类型语言(例如 Perl、Python、PHP 和 Ruby)也是托管代码。但是,它们通常不这样描述,这表明托管代码实际上是真正大型、严肃的商业编程环境(.NET 和 Java)的营销术语。
Assembly code:This term generally refers to the kind of source code people write when they really want to write byte-code. An assembleris a program that turns this source code into real byte-code. It is not a compilerbecause the transformation is 1-to-1. However, the term is ambiguous as to what kind of byte-code is used: it could be managed or unmanaged. If it is unmanaged, the resulting byte-code is machine code. If it is managed, it results in the byte-code used behind-the-scenes by a virtual environment such as .NET. Managed code (e.g. C#, Java) is compiled into this special byte-code language, which in the case of .NET is called Common Intermediate Language (CIL)and in Java is called Java byte-code. There is usually little need for the common programmer to access this code or to write in this language directly, but when people do, they often refer to it as assembly codebecause they use an assemblerto turn it into byte-code.
汇编代码:这个术语通常是指人们在真正想编写字节码时编写的那种源代码。一个汇编程序是一个程序,打开这个源代码转换成真正的字节码。它不是编译器,因为转换是 1 对 1 的。但是,该术语对于使用哪种字节码是模棱两可的:它可以是托管的,也可以是非托管的。如果它是非托管的,则生成的字节码是机器码。如果对它进行管理,则会产生由 .NET 等虚拟环境在幕后使用的字节码。托管代码(例如 C#、Java)被编译成这种特殊的字节码语言,在 .NET 中称为通用中间语言 (CIL),在 Java 中称为Java 字节码. 普通程序员通常很少需要访问这些代码或直接用这种语言编写,但是当人们这样做时,他们通常将其称为汇编代码,因为他们使用汇编程序将其转换为字节码。
回答by Hans Passant
What you see when you use Debug + Windows + Disassembly when debugging a C# program is a good guide for these terms. Here's an annotated version of it when I compile a 'hello world' program written in C# in the Release configuration with JIT optimization enabled:
在调试 C# 程序时使用 Debug + Windows + Disassembly 所看到的内容是这些术语的一个很好的指南。这是我在启用 JIT 优化的 Release 配置中编译用 C# 编写的“hello world”程序时的注释版本:
static void Main(string[] args) {
Console.WriteLine("Hello world");
00000000 55 push ebp ; save stack frame pointer
00000001 8B EC mov ebp,esp ; setup current frame
00000003 E8 30 BE 03 6F call 6F03BE38 ; Console.Out property getter
00000008 8B C8 mov ecx,eax ; setup "this"
0000000a 8B 15 88 20 BD 02 mov edx,dword ptr ds:[02BD2088h] ; arg = "Hello world"
00000010 8B 01 mov eax,dword ptr [ecx] ; TextWriter reference
00000012 FF 90 D8 00 00 00 call dword ptr [eax+000000D8h] ; TextWriter.WriteLine()
00000018 5D pop ebp ; restore stack frame pointer
}
00000019 C3 ret ; done, return
Right-click the window and tick the "Show Code Bytes" to get a similar display.
右键单击窗口并勾选“显示代码字节”以获得类似的显示。
The column on the left is the machine code address. Its value is faked by the debugger, the code is actually located somewhere else. But that could be anywhere, depending on the location selected by the JIT compiler, so the debugger just starts numbering addresses from 0 at the start of the method.
左边一栏是机器码地址。它的值是由调试器伪造的,代码实际上位于其他地方。但它可以在任何地方,具体取决于 JIT 编译器选择的位置,因此调试器只是在方法开始时从 0 开始编号地址。
The second column is the machine code. The actual 1s and 0s that the CPU executes. Machine code, like here, is commonly displayed in hex. Illustrative perhaps is that 0x8B selects the MOV instruction, the additional bytes are there to tell the CPU exactly what needs to be moved. Also note the two flavors of the CALL instruction, 0xE8 is the direct call, 0xFF is the indirect call instruction.
第二列是机器码。CPU 执行的实际 1 和 0。机器代码,就像这里一样,通常以十六进制显示。说明性的可能是 0x8B 选择 MOV 指令,附加字节用于告诉 CPU 确切需要移动的内容。还要注意 CALL 指令的两种风格,0xE8 是直接调用,0xFF 是间接调用指令。
The third column is the assembly code. Assembly is a simple language, designed to make it easier to write machine code. It compares to C# being compiled to IL. The compiler used to translate assembly code is called an "assembler". You probably have the Microsoft assembler on your machine, its executable name is ml.exe, ml64.exe for the 64-bit version. There are two common versions of assembly languages in use. The one you see is the one that Intel and AMD use. In the open source world, assembly in the AT&T notation is common. The language syntax is heavily dependent on the kind of CPU for which is was written, the assembly language for a PowerPC is very different.
第三列是汇编代码。汇编是一种简单的语言,旨在简化机器代码的编写。相比之下,C# 被编译为 IL。用于翻译汇编代码的编译器称为“汇编器”。您的机器上可能装有 Microsoft 汇编程序,它的可执行文件名为 ml.exe,64 位版本的 ml64.exe。有两种常见的汇编语言版本在使用。您看到的是 Intel 和 AMD 使用的那个。在开源世界中,AT&T 符号中的汇编很常见。语言语法在很大程度上取决于所编写的 CPU 类型,PowerPC 的汇编语言非常不同。
Okay, that tackles two of the terms in your question. "Native code" is a fuzzy term, it isn't uncommonly used to describe code in an unmanaged language. Instructive perhaps is to see what kind of machine code is generated by a C compiler. This is the 'hello world' version in C:
好的,这解决了您问题中的两个术语。“本机代码”是一个模糊的术语,它并不罕见地用于描述非托管语言中的代码。有启发性的也许是看看 C 编译器生成了什么样的机器代码。这是 C 语言中的“hello world”版本:
int _tmain(int argc, _TCHAR* argv[])
{
00401010 55 push ebp
00401011 8B EC mov ebp,esp
printf("Hello world");
00401013 68 6C 6C 45 00 push offset ___xt_z+128h (456C6Ch)
00401018 E8 13 00 00 00 call printf (401030h)
0040101D 83 C4 04 add esp,4
return 0;
00401020 33 C0 xor eax,eax
}
00401022 5D pop ebp
00401023 C3 ret
I didn't annotate it, mostly because it is so similarto the machine code generated by the C# program. The printf() function call is quite different from the Console.WriteLine() call but everything else is about the same. Also note that the debugger is now generating the real machine code address and that it is a bit smarter about symbols. A side effect of generating debug info aftergenerating machine code like unmanaged compilers often do. I should also mention that I turned off a few machine code optimization options to make the machine code look similar. C/C++ compilers have a lot more time available to optimize code, the result is often hard to interpret. And veryhard to debug.
我没有注释,主要是因为它和C#程序生成的机器码太相似了。printf() 函数调用与 Console.WriteLine() 调用完全不同,但其他一切都大致相同。另请注意,调试器现在正在生成真实的机器代码地址,并且它在符号方面更加智能。在生成机器代码之后生成调试信息的副作用,就像非托管编译器经常做的那样。我还应该提到,我关闭了一些机器代码优化选项,以使机器代码看起来相似。C/C++ 编译器有更多的时间来优化代码,结果往往难以解释。而且非常难以调试。
Key point here is there are veryfew differences between machine code generated from a managed language by the JIT compiler and machine code generated by a native code compiler. Which is the primary reason why the C# language can be competitive with an native code compiler. The only real difference between them are the support function calls. Many of which are implemented in the CLR. And that revolves primary around the garbage collector.
这里关键的一点是有非常从由本机代码编译器生成的JIT编译器和机器代码托管语言生成的机器代码之间的一些区别。这是 C# 语言可以与本机代码编译器竞争的主要原因。它们之间唯一真正的区别是支持函数调用。其中许多是在 CLR 中实现的。这主要围绕垃圾收集器展开。
回答by cHao
Native code and machine code are the same thing -- the actual bytes that the CPU executes.
本机代码和机器代码是一回事——CPU 执行的实际字节。
Assembly code has two meanings: one is the machine code translated into a more human-readable form (with the bytes for the instructions translated into short wordlike mnemonics like "JMP" (which "jumps" to another spot in the code). The other is the IL bytecode (instruction bytes that compilers like C# or VB generate, that will end up translated into machine code eventually, but aren't yet) that lives in a DLL or EXE.
汇编代码有两个含义:一个是机器代码被翻译成更易读的形式(指令的字节被翻译成像“JMP”这样的短单词助记符(它“跳转”到代码中的另一个位置)。另一个是是存在于 DLL 或 EXE 中的 IL 字节码(C# 或 VB 等编译器生成的指令字节,最终将转换为机器代码,但尚未转换)。
回答by Henk Holterman
In .NET, assemblies contain MS Intermediate Languagecode (MSIL, sometimes CIL).
It is like a 'high level' machine code.
在 .NET 中,程序集包含MS 中间语言代码(MSIL,有时是 CIL)。
它就像一个“高级”机器代码。
When loaded, MSIL is compiled by the JIT compilerinto native code (Intel x86 or x64 machine code).
加载时,MSIL 由JIT 编译器编译为本机代码(Intel x86 或 x64 机器代码)。

