如何用 C/++ 编写一个简单的编译器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3946911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 14:06:33  来源:igfitidea点击:

How to write a simple compiler in C/++?

c++compiler-construction

提问by

Possible Duplicate:
Learning to write a compiler

可能的重复:
学习编写编译器

Hi Stack Overflow, now don't get me wrong, I don't intend to write a compiler for C++(though I intend to write it in C++) or Java or some other high level complex programming language. I just want to learn the basics of converting a basic instruction set into a Windows Executable(Say, just a simple language with 5-6 functions, completely custom). Also I don't want to download any libraries or header files. If you could link me to any very basic example source or tutorials it would be greatly appreciated!

嗨堆栈溢出,现在不要误会我的意思,我不打算为 C++(尽管我打算用 C++ 编写)或 Java 或其他一些高级复杂编程语言编写编译器。我只想学习将基本指令集转换为 Windows 可执行文件的基础知识(比如说,只是一种具有 5-6 个功能的简单语言,完全自定义)。另外我不想下载任何库或头文件。如果您可以将我链接到任何非常基本的示例源或教程,将不胜感激!

回答by Gordon Brandly

Hyman Crenshaw's Let's Build a Compileris a good tutorial to start off with. He's a good writer and makes the subject easy to understand.

Hyman Crenshaw 的Let's Build a Compiler是一个很好的入门教程。他是一位优秀的作家,并且使主题易于理解。

回答by Michael Ekstrand

To parse the input, you should read up on recursive descent parsing(those are probably the easiest parsers to hand-implement), although you will also need a lexer of some kind to produce tokens for your parser. They can be hand-coded (I've done it), although it's easier to use a lexer generator like lexor flex.

要解析输入,您应该阅读递归下降解析(这些可能是最容易手动实现的解析器),尽管您还需要某种词法分析器来为您的解析器生成标记。它们可以手动编码(我已经完成了),尽管使用lex或 之类的词法分析器生成器更容易flex

Once you've parsed the input, you will need to transform it into appropriate output. I can't help you much there, as I do not know the Windows toolchain very well. The "easy" way is to generate assembly and run it through NASM, MASM, or whatever assembler comes with your compiler environment. If your language is sufficiently simple, you can just generate the assembly as you go in the parser code.

解析输入后,您需要将其转换为适当的输出。我在那里帮不了你太多,因为我不太了解 Windows 工具链。“简单”的方法是生成汇编并通过 NASM、MASM 或编译器环境附带的任何汇编器运行它。如果您的语言足够简单,您可以在解析器代码中生成程序集。

回答by Lie Ryan

Here's what you need to write a basic compiler:

以下是编写基本编译器所需的内容:

  1. Parser. You will need to parse your language, and make an Abstract Syntax Tree. You may want to learn about writing parsers. You can either hand code the parser, or you can use parser generators, e.g lex/yacc.
  2. Assembly. You will need to generate assembly instructions form the Syntax Tree.
  3. Instruction Set. You will need to translate the assembly into machine code, in some specific instruction set (typical Intel and AMD CPU uses x86 instruction set; alternatively, you can target Java VM's instruction set or .NET's IL).
  1. 解析器。您将需要解析您的语言,并制作一个抽象语法树。您可能想了解编写解析器。您可以手动编写解析器代码,也可以使用解析器生成器,例如 lex/yacc。
  2. 集会。您需要从语法树生成汇编指令。
  3. 指令系统。您将需要在某些特定指令集(典型的 Intel 和 AMD CPU 使用 x86 指令集;或者,您可以针对 Java VM 的指令集或 .NET 的 IL)将程序集转换为机器代码。

回答by Bill K

Actually, the most important thing you need is to figure out the binary format of .exe files (Unless you are planning to use an existing linker, at which point I think you need to output obj files which also have a binary format).

实际上,您需要做的最重要的事情是弄清楚 .exe 文件的二进制格式(除非您打算使用现有的链接器,此时我认为您需要输出也具有二进制格式的 obj 文件)。

You also need to deal with a LOT of assembly, unless you are already VERY familiar with the x86 instruction set, I'd try something else.

您还需要处理大量汇编,除非您已经非常熟悉 x86 指令集,否则我会尝试其他方法。

Here are a few possibilities:

这里有几种可能性:

  • There used to be a thing called "Tiny C"--I'm guessing this is it: http://bellard.org/tcc. Tiny C is a good enough compiler to build itself, but not so complex that it's hard to understand. It's a bare-bones "How-to build a compiler" lesson in a box. Messed with it on the 8088.

  • Output for an "Embedded" cpu. They tend to have simple assembly languages and very clearly defined executable formats. This would be a good place to start.

  • Output C-code instead of a binary. This is a cheat for sure, but you can concentrate on your language and not worry too much about the assembly language.

  • Finally, if you really want to directly creat an .exe, first write an app that produces a "Hello world" exe. Don't bother having it "Compile" anything, just hand edit the code, get it into the exe format and run it--in doing this you will KNOW that you got all your bits lined up and into the right spots, then you can start on a compiler with some confidence.

  • 曾经有一种叫做“Tiny C”的东西——我猜这就是它:http: //bellard.org/tcc。Tiny C 是一个足够好的编译器来构建它自己,但并不复杂到难以理解。这是一个简单的“如何构建编译器”课程。在 8088 上弄乱了它。

  • “嵌入式”CPU 的输出。它们往往具有简单的汇编语言和非常明确定义的可执行格式。这将是一个很好的起点。

  • 输出 C 代码而不是二进制。这肯定是作弊,但您可以专注于您的语言,而不必过多担心汇编语言。

  • 最后,如果你真的想直接创建一个 .exe,首先要编写一个生成“Hello world”exe 的应用程序。不要费心让它“编译”任何东西,只需手动编辑代码,将其转换为 exe 格式并运行它——这样做你会知道你已经把所有的位都排列好并放到了正确的位置,然后你可以放心地开始使用编译器。

After this, then creating the language can be done through a lot of the procedures given here--but if you just want to see how it all works, I'd definitely do a few small iterations first, don't worry about what you will run into until you run into it.

在此之后,然后可以通过这里给出的很多过程来创建语言——但是如果你只是想看看它是如何工作的,我肯定会先做一些小的迭代,不要担心你会遇到,直到你遇到它。

回答by Cheers and hth. - Alf

For learning about how building a compiler is different in C++ than in, say, C or Pascal, try out the Boost Spiritparser framework.

要了解在 C++ 中构建编译器与在 C 或 Pascal 中的不同之处,请尝试使用 Boost Spirit解析器框架。

This assumes familiarity with C++.

这假设熟悉 C++。

For learning about creating a compiler I suggest using a simpler language than C++, then perhaps advancing to C++.

为了学习创建编译器,我建议使用比 C++ 更简单的语言,然后可能会升级到 C++。

Cheers & hth.,

干杯 & hth.,

回答by kenny

I would recommend www.antlr.org. I worked in C#, but it has support for C, Java, Python and more.

我会推荐 www.antlr.org。我使用 C#,但它支持 C、Java、Python 等。