C++ Antlr 的优点（相对于 lex/yacc/bison）

Question

提问by Don Wakefield

I've used lex and yacc (more usually bison) in the past for various projects, usually translators (such as a subset of EDIF streamed into an EDA app). Additionally, I've had to support code based on lex/yacc grammars dating back decades. So I know my way around the tools, though I'm no expert.

过去，我曾在各种项目中使用过 lex 和 yacc（更常见的是 bison），通常是翻译器（例如流入 EDA 应用程序的 EDIF 子集）。此外，我不得不支持基于可追溯到几十年前的 lex/yacc 语法的代码。所以我知道如何使用这些工具，尽管我不是专家。

I've seen positive comments about Antlr in various fora in the past, and I'm curious as to what I may be missing. So if you've used both, please tell me what's better or more advanced in Antlr. My current constraints are that I work in a C++ shop, and any product we ship will not include Java, so the resulting parsers would have to follow that rule.

过去，我在各种论坛上看到了对 Antlr 的正面评论，我很好奇我可能遗漏了什么。因此，如果您同时使用了两者，请告诉我 Antlr 中哪个更好或更高级。我目前的限制是我在 C++ 商店工作，我们发布的任何产品都不会包含 Java，因此生成的解析器必须遵循该规则。

Answer 1

回答by Daniel Spiewak

Update/warning: This answer may be out of date!

更新/警告：此答案可能已过时！

One major difference is that ANTLR generates an LL(*) parser, whereas YACC and Bison both generate parsers that are LALR. This is an important distinction for a number of applications, the most obvious being operators:

一个主要区别是 ANTLR 生成一个 LL(*) 解析器，而 YACC 和 Bison 都生成 LALR 的解析器。这是许多应用程序的重要区别，最明显的是运算符：

expr ::= expr '+' expr
       | expr '-' expr
       | '(' expr ')'
       | NUM ;

ANTLR is entirely incapable of handling this grammar as-is. To use ANTLR (or any other LL parser generator), you would need to convert this grammar to something that is not left-recursive. However, Bison has no problem with grammars of this form. You would need to declare '+' and '-' as left-associative operators, but that is not strictly required for left recursion. A better example might be dispatch:

ANTLR 完全无法按原样处理这种语法。要使用 ANTLR（或任何其他 LL 解析器生成器），您需要将此语法转换为非左递归的语法。但是，Bison 对这种形式的语法没有问题。您需要将“+”和“-”声明为左关联运算符，但这并不是左递归所严格要求的。一个更好的例子可能是 dispatch：

expr ::= expr '.' ID '(' actuals ')' ;

actuals ::= actuals ',' expr | expr ;

Notice that both the exprand the actualsrules are left-recursive. This produces a much more efficient AST when it comes time for code generation because it avoids the need for multiple registers and unnecessary spilling (a left-leaning tree can be collapsed whereas a right-leaning tree cannot).

请注意， theexpr和 the actualsrules 都是左递归的。当需要生成代码时，这会产生更高效的 AST，因为它避免了对多个寄存器的需求和不必要的溢出（左倾树可以折叠而右倾树不能）。

In terms of personal taste, I think that LALR grammars are a lot easier to construct and debug. The downside is you have to deal with somewhat cryptic errors like shift-reduce and (the dreaded) reduce-reduce. These are errors that Bison catches when generating the parser, so it doesn't affect the end-user experience, but it can make the development process a bit more interesting. ANTLR is generally considered to be easier to use than YACC/Bison for precisely this reason.

就个人品味而言，我认为 LALR 语法更容易构建和调试。缺点是你必须处理一些神秘的错误，比如 shift-reduce 和（可怕的）reduce-reduce。这些是 Bison 在生成解析器时捕获的错误，因此不会影响最终用户体验，但可以使开发过程更有趣一些。正是由于这个原因，ANTLR 通常被认为比 YACC/Bison 更易于使用。

Answer 2

回答by trijezdci

The most significant difference between YACC/Bison and ANTLR is the type of grammars these tools can process. YACC/Bison handle LALR grammars, ANTLR handles LL grammars.

YACC/Bison 和 ANTLR 之间最显着的区别在于这些工具可以处理的语法类型。YACC/Bison 处理 LALR 语法，ANTLR 处理 LL 语法。

Often, people who have worked with LALR grammars for a long time, will find working with LL grammars more difficult and vice versa. That does not mean that the grammars or tools are inherently more difficult to work with. Which tool you find easier to use will mostly come down to familiarity with the type of grammar.

通常，长期使用 LALR 语法的人会发现使用 LL 语法更加困难，反之亦然。这并不意味着语法或工具本质上更难使用。您认为哪种工具更易于使用，主要取决于对语法类型的熟悉程度。

As far as advantages go, there are aspects where LALR grammars have advantages over LL grammars and there are other aspects where LL grammars have advantages over LALR grammars.

就优势而言，LALR 文法在某些方面优于 LL 文法，在其他方面 LL 文法优于 LALR 文法。

YACC/Bison generate table driven parsers, which means the "processing logic" is contained in the parser program's data, not so much in the parser's code. The pay off is that even a parser for a very complex language has a relatively small code footprint. This was more important in the 1960s and 1970s when hardware was very limited. Table driven parser generators go back to this era and small code footprint was a main requirement back then.

YACC/Bison 生成表驱动的解析器，这意味着“处理逻辑”包含在解析器程序的数据中，而不是包含在解析器的代码中。回报是，即使是用于非常复杂语言的解析器，其代码占用量也相对较小。这在 1960 年代和 1970 年代硬件非常有限时更为重要。表驱动的解析器生成器可以追溯到这个时代，小代码占用空间是当时的主要要求。

ANTLR generates recursive descent parsers, which means the "processing logic" is contained in the parser's code, as each production rule of the grammar is represented by a function in the parser's code. The pay off is that it is easier to understand what the parser is doing by reading its code. Also, recursive descent parsers are typically faster than table driven ones. However, for very complex languages, the code footprint will be larger. This was a problem in the 1960s and 1970s. Back then, only relatively small languages like Pascal for instance were implemented this way due to hardware limitations.

ANTLR 生成递归下降解析器，这意味着“处理逻辑”包含在解析器的代码中，因为语法的每个产生式规则都由解析器代码中的一个函数表示。回报是通过阅读其代码更容易理解解析器正在做什么。此外，递归下降解析器通常比表驱动解析器更快。但是，对于非常复杂的语言，代码占用空间会更大。这是 1960 年代和 1970 年代的一个问题。当时，由于硬件限制，只有相对较小的语言（例如 Pascal）以这种方式实现。

ANTLR generated parsers are typically in the vicinity of 10.000 lines of code and more. Handwritten recursive descent parsers are often in the same ballpark. Wirth's Oberon compiler is perhaps the most compact one with about 4000 lines of code including code generation, but Oberon is a very compact language with only about 40 production rules.

ANTLR 生成的解析器通常在 10.000 行或更多代码附近。手写的递归下降解析器通常在同一个范围内。Wirth 的 Oberon 编译器可能是最紧凑的编译器，包括代码生成在内的大约 4000 行代码，但 Oberon 是一种非常紧凑的语言，只有大约 40 条产生式规则。

As somebody has pointed out already, a big plus for ANTLR is the graphical IDE tool, called ANTLRworks. It is a complete grammar and language design laboratory. It visualises your grammar rules as you type them and if it finds any conflicts it will show you graphically what the conflict is and what causes it. It can even automatically refactor and resolve conflicts such as left-recursion. Once you have a conflict free grammar, you can let ANTLRworks parse an input file of your language and build a parse tree and AST for you and show the tree graphically in the IDE. This is a very big advantage because it can save you many hours of work: You will find conceptual errors in your language design before you start coding! I have not found any such tool for LALR grammars, it seems there isn't any such tool.

正如有人已经指出的那样，ANTLR 的一大优势是图形 IDE 工具，称为 ANTLRworks。它是一个完整的语法和语言设计实验室。它会在您键入语法规则时将它们可视化，如果发现任何冲突，它将以图形方式向您显示冲突是什么以及导致冲突的原因。它甚至可以自动重构和解决诸如左递归之类的冲突。一旦您有了无冲突语法，您就可以让 ANTLRworks 解析您语言的输入文件，并为您构建解析树和 AST，并在 IDE 中以图形方式显示树。这是一个非常大的优势，因为它可以为您节省很多时间：在开始编码之前，您会发现语言设计中的概念错误！我还没有找到任何这样的 LALR 语法工具，似乎没有任何这样的工具。

Even to people who do not wish to generate their parsers but hand code them, ANTLRworks is a great tool for language design/prototyping. Quite possibly the best such tool available. Unfortunately, that doesn't help you if you want to build LALR parsers. Switching from LALR to LL simply to take advantage of ANTLRworks may well be worthwhile, but for some people, switching grammar types can be a very painful experience. In other words: YMMV.

即使对于那些不想生成解析器而是手动编码它们的人来说，ANTLRworks 也是一个很好的语言设计/原型工具。很可能是最好的此类工具。不幸的是，如果您想构建 LALR 解析器，这对您没有帮助。从 LALR 切换到 LL 只是为了利用 ANTLRworks 可能是值得的，但对于某些人来说，切换语法类型可能是一种非常痛苦的经历。换句话说：YMMV。

Answer 3

回答by Cristian Diaconescu

A couple advantages for ANTLR:

ANTLR 的几个优点：

can output parsers in various languages - Java not required for running the generated parser.
Awesome GUI makes grammar debugging easy (e.g. you can see the generated AST's right in the GUI, no extra tools required)
Generated code is actually human-readable (it's one of the goals of ANTLR) and the fact that it generates LL parsers surely helps in this regard.
definition of terminals is context-free as well (as opposed to regex in (f)lex) - thus permitting, for instance, the definition of terminalscontaining properly-closed parentheses

可以输出各种语言的解析器 - 运行生成的解析器不需要 Java。
很棒的 GUI 使语法调试变得容易（例如，您可以在 GUI 中直接看到生成的 AST，不需要额外的工具）
生成的代码实际上是人类可读的（这是 ANTLR 的目标之一），并且它生成 LL 解析器的事实在这方面肯定会有所帮助。
终端的定义也是上下文无关的（与 (f)lex 中的正则表达式相反） - 例如，允许定义包含正确关闭的括号的终端

My .02$

我的 .02$

Answer 4

回答by John with waffle

Another advantage of ANTRL is that you can use ANTLRWORKS, although I can't say that this is a strict advantage, as there may be similar tools for other generators as well.

ANTRL 的另一个优点是您可以使用ANTLRWORKS，尽管我不能说这是一个严格的优点，因为其他生成器也可能有类似的工具。

Answer 5

回答by justme

Bison and Flex result in a smaller memory footprint, but you have no graphical IDE.
antlr uses more memory, but you have antlrworks, a graphical IDE.

Bison 和 Flex 的内存占用更小，但您没有图形 IDE。
antlr 使用更多内存，但您有 antlrworks，一个图形 IDE。

Bison/Flex memory usage is typically a mbyte or so. Contrast that with antlr - assuming it uses 512 bytes of memory for every token in the file you want to parse. 4 million tokens and you are out of virtual memory on a 32-bit system.

Bison/Flex 内存使用量通常为 1 兆字节左右。将其与 antlr 进行对比 - 假设它为您要解析的文件中的每个标记使用 512 字节的内存。400 万个令牌，你在 32 位系统上的虚拟内存用完了。

If the file which you wish to parse is large, antlr may run out of memory, so if you just want to parse a configuration file, it would be a viable solution. Otherwise, if you want to parse a file with lots of data, try Bison.

如果你想解析的文件很大，antlr 可能会耗尽内存，所以如果你只是想解析一个配置文件，这将是一个可行的解决方案。否则，如果您想解析包含大量数据的文件，请尝试使用 Bison。

C++ Antlr 的优点（相对于 lex/yacc/bison）

提问by Don Wakefield

回答by Daniel Spiewak

Update/warning: This answer may be out of date!

更新/警告：此答案可能已过时！

回答by trijezdci

回答by Cristian Diaconescu

回答by John with waffle

回答by justme

相关推荐

最近更新

标签

C++ Antlr 的优点（相对于 lex/yacc/bison）

提问by Don Wakefield

回答by Daniel Spiewak

Update/warning: This answer may be out of date!

更新/警告：此答案可能已过时！

回答by trijezdci

回答by Cristian Diaconescu

回答by John with waffle

回答by justme

相关推荐

有 C++ 反编译器吗？

C++ 图像处理：“可口可乐罐”识别的算法改进

为什么 C++ STL 不提供任何“树”容器？

Ms Visual Studio 上的 C++ 错误：“Windows 已触发 javaw.exe 中的断点”

相关推荐

最近更新

标签