C语言 从一个简单的(也许是最简单的)C 编译器开始?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2349468/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 04:40:11  来源:igfitidea点击:

Starting off a simple (the simplest perhaps) C compiler?

ccompiler-constructionprogramming-languages

提问by Legend

I came across this: Writing a compiler using Turbo Pascal

我遇到了这个:使用 Turbo Pascal 编写编译器

I am curious if there are any tutorials or references explaining how to go about creating a simple C compiler. I mean, it is enough if it gets me to the level of making it understand arithmetic operations. I became really curious after reading this article by Ken Thompson. The idea of writing something that understands itself seems exciting.

我很好奇是否有任何教程或参考资料解释了如何创建一个简单的 C 编译器。我的意思是,如果它让我能够理解算术运算就足够了。阅读Ken Thompson 的这篇文章后,我变得非常好奇。写一些理解自己的东西的想法似乎很令人兴奋。

Why did I put up this question instead of asking Google? I tried Google and the Pascal one was the first link. The rest did no seem relevant and added to that... I am not a CS major (so I still need to learn what all those tools like yacc do) and I want to learn this by doing and am hoping people with more experience are always better at these things than Google. I want to read some article written in the same spirit as the one I listed above but that which highlights at least the bootstrapping phases of building a simple C compiler.

为什么我提出这个问题而不是问谷歌?我试过谷歌,Pascal 是第一个链接。其余的似乎没有相关性并添加到...我不是 CS 专业(所以我仍然需要学习像 yacc 这样的所有工具的作用)并且我想通过这样做来学习这一点,并希望有更多经验的人在这些方面总是比谷歌更好。我想阅读一些与我上面列出的具有相同精神的文章,但其中至少强调了构建简单 C 编译器的引导阶段。

Also, I don't know the best way to learn. Do I start off building a C compiler in C or some other language? Do I write a C compiler or some other language? I feel questions like this are better answered once I have some direction to explore. Any suggestions?

另外,我不知道最好的学习方式。我是从 C 语言还是其他语言开始构建 C 编译器?我是编写 C 编译器还是其他语言?一旦我有一些探索的方向,我觉得这样的问题会得到更好的回答。有什么建议?

Any suggestions?

有什么建议?

采纳答案by duffymo

A compiler consists of three pieces:

编译器由三部分组成:

  1. A parser
  2. A abstract syntax tree (AST)
  3. A code generator
  1. 解析器
  2. 抽象语法树 (AST)
  3. 代码生成器

There are lots of nice parser generators that start with language grammars. Maybe ANTLR would be a good place for you to start. If you want to stick to C roots, try lex/yacc or bison.

有很多不错的解析器生成器都是从语言语法开始的。也许 ANTLR 是您开始的好地方。如果您想坚持使用 C 语言,请尝试使用 lex/yacc 或 bison。

There are grammars for C, but I think C in its entirety is complex. You'd do well to start off with a subset of the language and work your way up.

C 有语法,但我认为 C 的整体很复杂。您最好从该语言的一个子集开始,然后逐步提高。

Once you have an AST, you use it to generate the machine code that you'll run.

一旦有了 AST,就可以使用它来生成将要运行的机器代码。

It's doable, but not trivial.

这是可行的,但不是微不足道的。

I'd also check Amazon for books about writing compilers. The Dragon Book is the classic, but there are more modern ones available.

我还会在亚马逊上查看有关编写编译器的书籍。龙之书是经典之作,但还有更多现代之书可供选择。

UPDATE: There have been similar questions on Stack overflow, like this one. Check out those resources as well.

更新:在堆栈溢出上也有类似的问题,比如这个。也请查看这些资源。

回答by Phong

I advise you this tutorial:

我建议你这个教程:

It is a small example on how to implement a "small language" compiler. The source code is very small and is explained step by step.

这是一个关于如何实现“小语言”编译器的小例子。源码很小,一步步讲解。

There is also the C front end library for the LLVM (Low Level Virtual Machine which represent the internal structure of a program) library:

还有用于 LLVM(代表程序内部结构的低级虚拟机)库的 C 前端库:

回答by Mark Rushakoff

For what it's worth, the Tiny C Compileris a pretty full-featured C compiler in a relatively small source package. You might benefit from studying that source, as it's probably significantly easier to understand than trying to comprehend all of GCC's source base, for instance.

就其价值而言,Tiny C Compiler是一个功能相当齐全的 C 编译器,其源代码包相对较小。例如,您可能会从研究该源代码中受益,因为它可能比试图理解所有 GCC 的源代码库更容易理解。

回答by mctylr

This is my opinion (and conjecture) it will be hard to write a compiler without understanding data structures normally covered in undergraduate (post secondary) Computer Science classes. This doesn't mean you cannot, but you will need to know essential data structures such as linked lists, and trees.

这是我的观点(和推测),如果不理解本科(中学后)计算机科学课程中通常涵盖的数据结构,将很难编写编译器。这并不意味着您不能,但您需要了解基本的数据结构,例如链表和树。

Rather than writing a full or standards compliant C language compiler (at least in the start), I would suggest limiting yourself to a basic subset of the language, such as common operators, integer only support, and basic functions and pointers. One classic example of this was Ron Cain's Small-C, made popular by a series of articles written in Dr. Dobbs Journalin I believe the 1980s. They publish a CDwith the James Hendrix's out-of-print book, A Small-C Compiler.

与其编写完整的或符合标准的 C 语言编译器(至少在开始时),我建议将自己限制在语言的一个基本子集上,例如常用运算符、仅支持整数以及基本函数和指针。一个典型的例子是 Ron Cain 的Small-C,我相信 1980 年代,它因在Dr. Dobbs Journal上发表的一系列文章而广受欢迎。他们出版了一张带有 James Hendrix 的绝版书籍A Small-C CompilerCD

What I would suggest is following Crenshaw's tutorial, but write it for a C-like language compiler, and whatever CPU target (Crenshaw targets the Motorola 68000 CPU) you wish to target. In order to do this, you will need to know basic assembly of which ever target you want to run the compiled programs on. This could include a emulator for a 68000, or MIPS which are arguably nicerassembly instruction sets than the venerable CISC instruction set of the Intel x86 (16/32-bit).

我建议遵循 Crenshaw 的教程,但为类似 C 的语言编译器编写它,以及您希望针对的任何 CPU 目标(Crenshaw 以 Motorola 68000 CPU 为目标)。为了做到这一点,您需要知道要在哪个目标上运行编译程序的基本程序集。这可能包括用于 68000 或 MIPS 的模拟器,它们可以说是比 Intel x86(16/32 位)的古老 CISC 指令集更好的汇编指令集。

There are many potential books that can be used as starting points for learning compiler / translator theory (and practice). Read the comp.compilers FAQ, and reviews at various online book sellers. Most introductory books are written as textbooks for sophomore to senior level undergraduate Computer Science classes, so they can be slow reading without a CS background. One older book that might be more introductory, but easier to read than "The Dragon Book"is Introduction to Compiler Constructionby Thomas Parsons. It is older, so you should be able to find an used copy from your choice of online book sellers at a reasonable price.

有许多潜在的书籍可以用作学习编译器/翻译器理论(和实践)的起点。阅读comp.compilers 常见问题解答以及各种在线图书销售商的评论。大多数介绍性书籍都是作为大二至高年级本科计算机科学课程的教科书编写的,因此没有 CS 背景也可以慢速阅读。托马斯·帕森斯 (Thomas Parsons) 所著的编译器构造概论( Introduction to Compiler Construction)可能比龙之书更具有介绍性,但更易于阅读。它较旧,因此您应该能够以合理的价格从您选择的在线图书销售商处找到二手书。

So I'd say, try starting with Hyman Crenshaw's Let's Build a Compilertutorial, write your own, following his examples as a guide, and build the basics of a simplecompiler. Once you have that working, you can better decide where you wish to take it from that point.

所以我想说,尝试从 Hyman Crenshaw 的Let's Build a Compiler教程开始,编写自己的教程,按照他的示例作为指南,并构建简单编译器的基础知识。一旦你开始工作,你就可以更好地决定从那一点开始你想把它带到哪里。

Added:

添加:

In regards to the bootstrapping process. Since there are existing C compilers freely available, you do not need to worry about bootstrapping. Write your compiler with separate, existing tools (GCC, Visual C++ Express, Mingw / djgpp, tcc), and you can worry about self-compiling your project at a much later stage. I was surprised by this part of the question until I realized you were brought to the idea of writing your own compiler by reading Ken Thomas' ACM Turing award speech, Reflections on Trusting Trust, which does go into the compiler bootstrapping process. It's a moderated advanced topic, and is also simply a lot of hassle as well. I find even bootstrapping the GCC C compiler under older Unix systems (Digital OSF/1 on the 64-bit Alpha) that included a C compiler a slow and time consuming, error prone process.

关于引导过程。由于现有的 C 编译器可以免费使用,因此您无需担心引导。使用单独的现有工具(GCC、Visual C++ Express、Mingw / djgpp、tcc)编写您的编译器,您可以担心在以后的阶段自行编译您的项目。我对问题的这一部分感到惊讶,直到我意识到您通过阅读 Ken Thomas 的 ACM 图灵奖演讲“信任信任的反思”而产生了编写自己的编译器的想法,这确实会进入编译器引导过程。这是一个适度的高级主题,也很麻烦。我发现即使在包含 C 编译器的旧 Unix 系统(64 位 Alpha 上的 Digital OSF/1)下引导 GCC C 编译器也是一个缓慢且耗时且容易出错的过程。

The other sort-of question was what a compiler tool like Yacc actually does. Yacc (Yet Another Compiler Compiler or Bison from GNU) is a tool designed to make writing a compiler (or translator) parser easier. Based on the formal grammarfor your target language that you input to yacc, it generates a parser, which is one portion of a compiler's overall design. Next is Lex (or flex from GNU) which used to generate a lexical analyzeror scanner, which is often used in combination with the yacc generated parser to form the skeleton of the front-end of a compiler. These tools make writer a front end arguably easier than writing an lexical analyzer and parser yourself. Crenshaw's tutorial does not use these tools, and you don't need to either, many compiler writers don't always use them. Of course Crenshaw admits the tutorial's parser is quite basic.

另一个问题是像 Yacc 这样的编译器工具实际上做了什么。Yacc(Yet Another Compiler Compiler 或来自 GNU 的 Bison)是一种旨在使编写编译器(或翻译器)解析器更容易的工具。根据您输入到 yacc 的目标语言的正式语法,它会生成一个parser,这是编译器整体设计的一部分。接下来是用于生成词法分析器的Lex(或来自 GNU 的 flex)或扫描器,它通常与 yacc 生成的解析器结合使用,以形成编译器前端的骨架。这些工具使编写器成为前端可以说比自己编写词法分析器和解析器更容易。Crenshaw 的教程不使用这些工具,您也不需要,许多编译器编写者并不总是使用它们。当然,Crenshaw 承认教程的解析器非常基础。

Crenshaw's tutorial also skips generating an AST (abstract syntax tree), which simplifies but also limits the tutorial compiler. It lacks most if not all optimization, and is very tied to the specific programming language and the particular assembly language emitted by the "back-end" of the compiler. Normally the AST is a middle piece where some optimization can be performed, and serves to de-couple the compiler front-end and back-end in design. For a beginner without a Computer Science background, I'd suggest not worrying about not having an AST for your first compiler (or at least the first version of it). I think keeping it small and simple will help you finish writing a compiler, in its first version, and you can decide from there how you want to proceed then.

Crenshaw 的教程还跳过了生成 AST(抽象语法树),这简化了但也限制了教程编译器。它缺乏大部分优化,并且与编译器“后端”发出的特定编程语言和特定汇编语言密切相关。通常,AST 是可以执行一些优化的中间部分,用于在设计中解耦编译器前端和后端。对于没有计算机科学背景的初学者,我建议不要担心您的第一个编译器(或至少它的第一个版本)没有 AST。我认为保持它的小和简单将帮助您在第一个版本中完成编译器的编写,然后您可以从那里决定如何继续。

回答by Joe Internet

You might be interested in the book/course The Elements of Computing Systems:Building a Modern Computer from First Principles.

您可能对The Elements of Computing Systems:Building a Modern Computer from First Principles 一书/课程感兴趣。

Note that this isn't about building a "pc" from stuff you bought off newegg. It begins with a description of Boolean logic fundamentals, and builds a virtual computer from the lowest levels of abstraction to progressively higher levels of abstraction. The course materials are all online, and the book itself is fairly inexpensive from Amazon.

请注意,这不是用您从 newegg 购买的东西构建“PC”。它从对布尔逻辑基础的描述开始,并从最低抽象级别到逐渐更高的抽象级别构建虚拟计算机。课程材料都是在线的,书本身在亚马逊上相当便宜。

In the course, in addition to "building the hardware", you'll also implement an assembler, virtual machine, compiler, and rudimentary OS, in a step-wise fashion. I think this would give you enough of a background to delve deeper into the subject area with some of the more commonly recommended resources listed in the other answers.

在本课程中,除了“构建硬件”之外,您还将逐步实现汇编器、虚拟机、编译器和基本操作系统。我认为这将为您提供足够的背景知识,可以通过其他答案中列出的一些更常用的推荐资源深入研究该主题领域。

回答by msw

In The Unix Programming Environment, Kernighan and Pike walk through 5 iterations of making a calculator working from simple C based lexical analysis and immediate execution to yacc/lex parsing and code generation for an abstract machine. Because they write so wonderfully I can't suggest smoother introduction. It is certainly smaller than C, but that is likely to your advantage.

The Unix Programming Environment 中,Kernighan 和 Pike 经历了 5 次迭代,使计算器从简单的基于 C 的词法分析和立即执行到抽象机器的 yacc/lex 解析和代码生成。因为他们写得如此精彩,我不能建议更流畅的介绍。它肯定比 C 小,但这可能对您有利。

回答by Norman Ramsey

How do I [start writing] a simple C compiler?

我如何[开始编写]一个简单的 C 编译器?

There's nothing simple about compiling C. The best simple C compiler is lccby Chris Fraser and David Hanson. They spent 10 years working on the design to make it as simple as they possibly could, while still generating reasonably good code. If you have access to a university library, you should be able to get their book.

编译 C 并不简单。最好的简单 C 编译器是Chris Fraser 和 David Hanson 的lcc。他们花了 10 年的时间进行设计以使其尽可能简单,同时仍然生成相当不错的代码。如果你可以进入大学图书馆,你应该可以得到他们的书。

Do I start off building a C compiler in C or some other language?

我是从 C 语言还是其他语言开始构建 C 编译器?

Some other language. One time I got to ask Hanson what lessons he and Fraser had learned by spending 10 years on the lcc project. The main thing Hanson said was

其他一些语言。有一次我问汉森,他和弗雷泽在 lcc 项目上花了 10 年的时间,学到了什么教训。汉森说的主要是

C is a lousy language to write a compiler in.

C 是编写编译器的糟糕语言。

You're better off using Haskell or some dialect of ML. Both languages offer functions over algebraic data types, which is a perfect match to the problems faced by the compiler writer. If you still want to pursue C, you could start with George Necula's CIL, which is a big chunk of a C compiler written in ML.

最好使用 Haskell 或某种机器学习方言。这两种语言都提供基于代数数据类型的函数,这与编译器编写者面临的问题完美匹配。如果你仍然想学习 C,你可以从 George Necula 的CIL 开始,它是用 ML 编写的 C 编译器的一大块。

I want to read some article written in the same spirit as the one I listed above but that which highlights at least the bootstrapping phases...

我想阅读一些与我上面列出的精神相同的文章,但至少强调了引导阶段......

You won't find another article like Ken's. But Andrew Appel has written a nice article called Axiomatic Bootstrapping: A Guide for Compiler HackersI couldn't find a free version but many people have access to the ACM Digital Library.

你不会找到像 Ken 那样的另一篇文章。但是 Andrew Appel 写了一篇很好的文章,名为Axiomatic Bootstrapping: A Guide for Compiler Hackers我找不到免费版本,但很多人都可以访问 ACM 数字图书馆。

Any suggestions?

有什么建议?

If you want to write a compiler,

如果你想写一个编译器,

  • Use Haskell or ML as your implementation language.

  • For your first compiler, pick a very simple language like Oberonor like P0 from Niklaus Wirth's book Algorithms + Data Structures = Programs. Wirth is famous for designing languages that are easy to compile.

  • 使用 Haskell 或 ML 作为您的实现语言。

  • 对于您的第一个编译器,从 Niklaus Wirth 的书Algorithms + Data Structures = Programs 中选择一种非常简单的语言,如Oberon或 P0 。Wirth 以设计易于编译的语言而闻名。

You can write a C compiler for your secondcompiler.

您可以为第二个编译器编写一个 C 编译器。

回答by t0mm13b

A compiler is a complex subject matter that covers aspects of

编译器是一个复杂的主题,涵盖了

  • Input processing involving Lexing, Parsing
  • Building a symbol store of every variable used such as an Abstract Syntax Tree (AST)
  • From the AST tree, transpose and build a machine code binary based on the syntax
  • 涉及 Lexing、Parsing 的输入处理
  • 为使用的每个变量构建符号存储,例如抽象语法树 (AST)
  • 从 AST 树中,根据语法转置并构建机器码二进制文件

This is by no means exhaustive as it is an abstract bird's eye view from the top of a mountain, it boils down to getting the syntax notation correct and ensuring that malformed inputs do not throw it off, in fact a good input processing should never fall on its knees no matter how malformed, terrible, abused cases of input that gets thrown at it. And, also in deciding and knowing what output is going to be, is it in machine code, which would imply you may have to get to know the processor instructions intimately...including memory addressing for variables and so on...

这绝不是详尽无遗的,因为它是从山顶的抽象鸟瞰图,归结为正确的语法符号并确保格式错误的输入不会将其丢弃,事实上,一个好的输入处理不应该落下无论输入的内容多么畸形、可怕、被滥用,它都会屈服。而且,在决定和知道输出将是什么时,它是否在机器代码中,这意味着您可能必须密切了解处理器指令......包括变量的内存寻址等等......

Here are some links for you to get started:

这里有一些链接供您开始使用:

  • There was a Hyman Crenshaw's portof his code for C....(I recall downloading it months ago...)
  • Here's a link to a similar question hereon SO.
  • Also, here's another small compiler tutorialfor Basic to x86 assembler compiler.
  • Tiny C Compiler
  • Hendrix's Small C Compiler found here.
  • 有一个 Hyman Crenshaw 的C 代码端口......(我记得几个月前下载过......)
  • 下面是一个类似的问题的链接在这里的SO。
  • 此外,这里还有一个针对 Basic 到 x86 汇编器编译器的小型编译器教程
  • 微型 C 编译器
  • Hendrix 的 Small C Compiler 在这里找到。

回答by Potatoswatter

It might be worthwhile to learn about functional programming, too. Functional languages are well-suited to writing a compiler both inand for. My school's intro compilers class contained an intro to functional languages and the assignments were all in OCaml.

学习函数式编程也可能是值得的。函数式语言非常适合于两个编写编译器。我学校的编译器入门课程包含函数式语言的介绍,并且作业都在 OCaml 中。

Funny you should ask this today, since just a couple days ago I wrote a lambda calculus interpreter. Lambda calculus is the granddaddy of all functional languages. It's just 200 lines long (in C++, incl. error reporting, some pretty printing, some unicode) and has a two-phase structure, with an intermediate format that could be used to generate code.

有趣的是你今天应该问这个,因为就在几天前我写了一个 lambda 演算解释器。Lambda 演算是所有函数式语言的鼻祖。它只有 200 行长(在 C++ 中,包括错误报告、一些漂亮的打印、一些 unicode)并且具有两阶段结构,具有可用于生成代码的中间格式。

Not only is starting small and building up the most practical approach to compilers, it also encourages good, modular, organizational practice.

不仅从小处着手并建立最实用的编译器方法,它还鼓励良好的、模块化的、有组织的实践。

回答by Ira Baxter

If you want a mind-blowing experience that teaches you how to write compilers that compile themselves, you need to read this paper from 1964.

如果你想要一个令人兴奋的体验,教你如何编写自己编译的编译器,你需要阅读1964 年的这篇论文。

META II a syntax-oriented compiler writing languageby Val Schorre.

META II 是Val Schorre的面向语法的编译器编写语言

In 10 pages, it tells you how to write compilers, how to write meta compilers, provides a virtual metacompiler instruction set, and a sample compiler built with the metacompiler.

在 10 页中,它告诉您如何编写编译器,如何编写元编译器,提供虚拟元编译器指令集,以及使用元编译器构建的示例编译器。

I learned how to write compilers from this paper back in the late 60s, and used the ideas to construct C-like langauges for several minicomputers and microprocessors.

早在 60 年代末,我就从这篇论文中学习了如何编写编译器,并使用这些思想为几台小型计算机和微处理器构建了类 C 语言。

If the paper is too much by itself (it isn't!) there's an online tutorialwhich will walk you through the whole thing.

如果论文本身太多(不是!),有一个在线教程将引导您完成整个过程。

And if getting the paper from the original link is awkward because you are not an ACM member, you'll find that the tutorial contains all the details anyway. (IMHO, for the price, the paper itself is waaaaay worth it).

如果因为您不是 ACM 成员而无法从原始链接获取论文,您会发现该教程无论如何都包含所有详细信息。(恕我直言,就价格而言,纸张本身是值得的)。

10 pages!

10页!