如何解析代码以在 Java 中构建编译器?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/672577/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 13:19:23  来源:igfitidea点击:

How can I parse code to build a compiler in Java?

javaparsingcompiler-constructionparser-generator

提问by fmsf

I need to write a compiler. It's homework at the univ. The teacher told us that we can use any API we want to do the parsing of the code, as long as it is a good one. That way we can focus more on the JVM we will generate.

我需要编写一个编译器。这是大学的家庭作业。老师告诉我们,我们可以使用任何我们想要的API来进行代码的解析,只要它是好的。这样我们就可以更多地关注我们将生成的 JVM。

So yes, I'll write a compiler in Java to generate Java.

所以是的,我将用 Java 编写一个编译器来生成 Java。

Do you know any good API for this? Should I use regex? I normally write my own parsers by hand, though it is not advisable in this scenario.

你知道这方面的任何好的 API 吗?我应该使用正则表达式吗?我通常手动编写自己的解析器,但在这种情况下不建议这样做。

Any help would be appreciated.

任何帮助,将不胜感激。

回答by Markus Jarderot

Regex is good to use in a compiler, but only for recognizing tokens (i.e. no recursive structures).

正则表达式很适合在编译器中使用,但仅用于识别标记(即没有递归结构)。

The classic way of writing a compiler is having a lexical analyzerfor recognizing tokens, a syntax analyzerfor recognizing structure, a semantic analyzerfor recognizing meaning, an intermediate code generator, an optimizer, and last a target code generator. Any of those steps can be merged, or skipped entirely, if makes the compiler easier to write.

编写编译器的经典方法是具有词法分析器用于识别的令牌,一个语法分析器,用于识别结构,一个语义分析器,用于识别的意义,一个中间码产生器,一个优化器,并且持续一个目标码发生器。如果使编译器更易于编写,任何这些步骤都可以合并或完全跳过。

There have been many tools developed to help with this process. For Java, you can look at

已经开发了许多工具来帮助完成这个过程。对于Java,您可以查看

回答by Vineet Reynolds

I would recommend ANTLR, primarily because of its output generation capabilities via StringTemplate.

我会推荐ANTLR,主要是因为它通过 StringTemplate 的输出生成功能。

What is better is that Terence Parr's bookon the same is by far one of the better books oriented towards writing compilers with a parser generator.

更好的是,Terence Parr 的这本书是迄今为止最好的一本面向使用解析器生成器编写编译器的书。

Then you have ANTLRWorkswhich enables you to study and debug your grammar on the fly.

然后你就有了ANTLRWorks,它使你能够快速学习和调试你的语法。

To top it all, the ANTLR wiki + documentation, (although not comprehensive enough to my liking), is a good place to start off for any beginner. It helped me refresh knowledge on compiler writing in a week.

最重要的是,ANTLR wiki + 文档(虽然不够全面,我喜欢),是任何初学者开始的好地方。它帮助我在一周内更新了编译器编写方面的知识。

回答by tddmonkey

Have a look at JavaCC, a language parser for Java. It's very easy to use and get the hang of

看看JavaCC,一种 Java 语言解析器。它非常易于使用和掌握

回答by gimel

Go classic - Lex + Yacc. In Java it spells JAXand javacc. Javacc even has some Java grammarsready for inspection.

经典 - Lex + Yacc。在 Java 中,它拼写JAXjavacc。Javacc 甚至有一些Java 语法可供检查。

回答by Apocalisp

I'd recommend using either a metacompiler like ANTLR, or a simple parser combinatorlibrary. Functional Javahas a parser combinator API. There's also JParsec. Both of these are based on the Parsec library for Haskell.

我建议使用像ANTLR这样的元编译器,或者一个简单的解析器组合器库。函数式 Java有一个解析器组合器 API。还有JParsec。这两者都基于Haskell 的 Parsec 库

回答by Michael Myers

JFlexis a scanner generator which, according to the manual, is designed to work with the parser generator CUP.

JFlex是一个扫描仪生成器,根据手册,它被设计为与解析器生成器CUP 一起使用

One of the main design goals of JFlex was to make interfacing with the free Java parser generator CUP as easy as possibly [sic].

JFlex 的主要设计目标之一是使与免费的 Java 解析器生成器 CUP 的接口尽可能简单 [原文如此]。

It also has supportfor BYACC/J, which, as its name suggests, is a port of Berkeley YACC to generate Java code.

它还支持用于BYACC / J,其中,顾名思义,是伯克利YACC的端口来生成Java代码。

I have used JFlex itself and liked it. Howeveer, the project I was doing was simple enough that I wrote the parser by hand, so I don't know how good either CUP or BYACC/J is.

我使用过 JFlex 本身并且喜欢它。但是,我做的项目很简单,我自己写了解析器,所以我不知道CUP和BYACC/J哪个好。

回答by Jonas K?lker

I've used SableCC in my compiler course, though not by choice.

我在我的编译器课程中使用了 SableCC,尽管不是自愿的。

I remember finding it very bulky and heavyweight, with more emphasis on cleanliness than convenience (no operator precedence or anything; you have to state that in the grammar).

我记得我发现它非常庞大和重量级,更强调清洁而不是方便(没有运算符优先级或任何东西;你必须在语法中说明)。

I'd probably want to use something else if I had the choice. My experiences with yacc (for C) and happy (for Haskell) have both been pleasant.

如果可以选择,我可能想使用其他东西。我对 yacc(对于 C)和 happy(对于 Haskell)的体验都很愉快。

回答by stepancheg

Parser combinators is a good choice. Popular Java implementation is JParsec.

解析器组合器是一个不错的选择。流行的 Java 实现是 JParsec。

回答by snemarch

If you're going to go hardcore, throw in a bit of http://llvm.orgin the mix :)

如果你打算去硬核,在混合中加入一些http://llvm.org:)

回答by Peter Lawrey

I suggest you look at at the source for BeanShell. It has a compiler for Java and is fairly simple to read.

我建议您查看 BeanShell 的源代码。它有一个 Java 编译器,读起来相当简单。