Java 如何使用 ANTLR4 创建 AST?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29971097/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 08:55:03  来源:igfitidea点击:

How to create AST with ANTLR4?

javacompiler-constructionantlrabstract-syntax-treeantlr4

提问by Leandro

I've been searching A LOT about this and I couldn't find anything useful that REALLY helps me build an AST. I already know that ANTLR4 doesn't build AST like ANTLR3 used to do. Everyone say: "Hey, use visitors!", but I couldn't find any example or more detailed explanation on HOW can I do this...

我一直在搜索很多关于这个的东西,但我找不到任何真正帮助我构建 AST 的有用信息。我已经知道 ANTLR4 不像以前的 ANTLR3 那样构建 AST。每个人都说:“嘿,使用访问者!”,但我找不到任何关于如何执行此操作的示例或更详细的说明......

I have a grammar must like C, but with every commands written in Portuguese (portuga programming language). I can easily generate the parse tree using ANTLR4. My question is: What I need to do now to create an AST?

我有一个语法必须像 C,但每个命令都是用葡萄牙语(葡萄牙语编程语言)编写的。我可以使用 ANTLR4 轻松生成解析树。我的问题是:我现在需要做什么来创建 AST?

BTW, I'm using Java and IntelliJ...

顺便说一句,我正在使用 Java 和 IntelliJ ...

EDIT1:The closest I could get was using the answer of this topic: Is there a simple example of using antlr4 to create an AST from java source code and extract methods, variables and comments?But it only prints the name of the visited methods..

EDIT1:我能得到的最接近的是使用本主题的答案:是否有一个使用 antlr4 从 Java 源代码创建 AST 并提取方法、变量和注释的简单示例?但它只打印被访问方法的名称..

Since the first attempt didn't work for me as I expected, I tried to use this tutorialfrom ANTLR3, but I couldn't figure out how to use StringTamplate instead of ST...

由于第一次尝试没有像我预期的那样对我有用,我尝试使用ANTLR3 中的本教程,但我无法弄清楚如何使用 StringTamplate 而不是 ST ...

Reading the book The Definitive ANTLR 4 ReferenceI also couldn't find anything related to ASTs.

阅读The Definitive ANTLR 4 Reference这本书我也找不到与 AST 相关的任何内容。

EDIT2:Now I have one class to create the DOT file, I just need figure out on how to use visitors properly

EDIT2:现在我有一个类来创建 DOT 文件,我只需要弄清楚如何正确使用访问者

采纳答案by Lucas Trzesniewski

Ok, let's build a simple math example. Building an AST is totally overkill for such a task but it's a nice way to show the principle.

好的,让我们构建一个简单的数学示例。对于这样的任务,构建 AST 是完全矫枉过正的,但它是展示原理的好方法。

I'll do it in C# but the Java version would be very similar.

我会用 C# 来做,但 Java 版本会非常相似。

The grammar

语法

First, let's write a very basic math grammar to work with:

首先,让我们编写一个非常基本的数学语法来使用:

grammar Math;

compileUnit
    :   expr EOF
    ;

expr
    :   '(' expr ')'                         # parensExpr
    |   op=('+'|'-') expr                    # unaryExpr
    |   left=expr op=('*'|'/') right=expr    # infixExpr
    |   left=expr op=('+'|'-') right=expr    # infixExpr
    |   func=ID '(' expr ')'                 # funcExpr
    |   value=NUM                            # numberExpr
    ;

OP_ADD: '+';
OP_SUB: '-';
OP_MUL: '*';
OP_DIV: '/';

NUM :   [0-9]+ ('.' [0-9]+)? ([eE] [+-]? [0-9]+)?;
ID  :   [a-zA-Z]+;
WS  :   [ \t\r\n] -> channel(HIDDEN);

Pretty basic stuff, we have a single exprrule that handles everything (precedence rules etc).

非常基本的东西,我们有一个expr规则来处理所有事情(优先规则等)。

The AST nodes

AST 节点

Then, let's define some AST nodes we'll use. These are totally custom and you can define them in the way you want to.

然后,让我们定义一些我们将使用的 AST 节点。这些是完全自定义的,您可以按照自己的方式定义它们。

Here are the nodes we'll be using for this example:

以下是我们将用于此示例的节点:

internal abstract class ExpressionNode
{
}

internal abstract class InfixExpressionNode : ExpressionNode
{
    public ExpressionNode Left { get; set; }
    public ExpressionNode Right { get; set; }
}

internal class AdditionNode : InfixExpressionNode
{
}

internal class SubtractionNode : InfixExpressionNode
{
}

internal class MultiplicationNode : InfixExpressionNode
{
}

internal class DivisionNode : InfixExpressionNode
{
}

internal class NegateNode : ExpressionNode
{
    public ExpressionNode InnerNode { get; set; }
}

internal class FunctionNode : ExpressionNode
{
    public Func<double, double> Function { get; set; }
    public ExpressionNode Argument { get; set; }
}

internal class NumberNode : ExpressionNode
{
    public double Value { get; set; }
}

Converting a CST to an AST

将 CST 转换为 AST

ANTLR generated the CST nodes for us (the MathParser.*Contextclasses). We now have to convert these to AST nodes.

ANTLR 为我们生成了 CST 节点(MathParser.*Context类)。我们现在必须将这些转换为 AST 节点。

This is easily done with a visitor, and ANTLR provides us with a MathBaseVisitor<T>class, so let's work with that.

这很容易通过访问者完成,ANTLR 为我们提供了一个MathBaseVisitor<T>类,让我们来处理它。

internal class BuildAstVisitor : MathBaseVisitor<ExpressionNode>
{
    public override ExpressionNode VisitCompileUnit(MathParser.CompileUnitContext context)
    {
        return Visit(context.expr());
    }

    public override ExpressionNode VisitNumberExpr(MathParser.NumberExprContext context)
    {
        return new NumberNode
        {
            Value = double.Parse(context.value.Text, NumberStyles.AllowDecimalPoint | NumberStyles.AllowExponent)
        };
    }

    public override ExpressionNode VisitParensExpr(MathParser.ParensExprContext context)
    {
        return Visit(context.expr());
    }

    public override ExpressionNode VisitInfixExpr(MathParser.InfixExprContext context)
    {
        InfixExpressionNode node;

        switch (context.op.Type)
        {
            case MathLexer.OP_ADD:
                node = new AdditionNode();
                break;

            case MathLexer.OP_SUB:
                node = new SubtractionNode();
                break;

            case MathLexer.OP_MUL:
                node = new MultiplicationNode();
                break;

            case MathLexer.OP_DIV:
                node = new DivisionNode();
                break;

            default:
                throw new NotSupportedException();
        }

        node.Left = Visit(context.left);
        node.Right = Visit(context.right);

        return node;
    }

    public override ExpressionNode VisitUnaryExpr(MathParser.UnaryExprContext context)
    {
        switch (context.op.Type)
        {
            case MathLexer.OP_ADD:
                return Visit(context.expr());

            case MathLexer.OP_SUB:
                return new NegateNode
                {
                    InnerNode = Visit(context.expr())
                };

            default:
                throw new NotSupportedException();
        }
    }

    public override ExpressionNode VisitFuncExpr(MathParser.FuncExprContext context)
    {
        var functionName = context.func.Text;

        var func = typeof(Math)
            .GetMethods(BindingFlags.Public | BindingFlags.Static)
            .Where(m => m.ReturnType == typeof(double))
            .Where(m => m.GetParameters().Select(p => p.ParameterType).SequenceEqual(new[] { typeof(double) }))
            .FirstOrDefault(m => m.Name.Equals(functionName, StringComparison.OrdinalIgnoreCase));

        if (func == null)
            throw new NotSupportedException(string.Format("Function {0} is not supported", functionName));

        return new FunctionNode
        {
            Function = (Func<double, double>)func.CreateDelegate(typeof(Func<double, double>)),
            Argument = Visit(context.expr())
        };
    }
}

As you can see, it's just a matter of creating an AST node out of a CST node by using a visitor. The code should be pretty self-explanatory (well, maybe except for the VisitFuncExprstuff, but it's just a quick way to wire up a delegate to a suitable method of the System.Mathclass).

如您所见,这只是通过使用访问者从 CST 节点创建 AST 节点的问题。代码应该是不言自明的(好吧,也许除了这些VisitFuncExpr东西,但这只是将委托连接到类的合适方法的一种快速方法System.Math)。

And here you have the AST building stuff. That's all that's needed. Just extract the relevant information from the CST and keep it in the AST.

在这里你有 AST 构建的东西。这就是所需要的。只需从 CST 中提取相关信息并将其保存在 AST 中。

The AST visitor

AST 访客

Now, let's play a bit with the AST. We'll have to build an AST visitor base class to traverse it. Let's just do something similar to the AbstractParseTreeVisitor<T>provided by ANTLR.

现在,让我们玩一下 AST。我们必须构建一个 AST 访问者基类来遍历它。让我们做一些类似于AbstractParseTreeVisitor<T>ANTLR 提供的事情。

internal abstract class AstVisitor<T>
{
    public abstract T Visit(AdditionNode node);
    public abstract T Visit(SubtractionNode node);
    public abstract T Visit(MultiplicationNode node);
    public abstract T Visit(DivisionNode node);
    public abstract T Visit(NegateNode node);
    public abstract T Visit(FunctionNode node);
    public abstract T Visit(NumberNode node);

    public T Visit(ExpressionNode node)
    {
        return Visit((dynamic)node);
    }
}

Here, I took advantage of C#'s dynamickeyword to perform a double-dispatch in one line of code. In Java, you'll have to do the wiring yourself with a sequence of ifstatements like these:

在这里,我利用 C# 的dynamic关键字在一行代码中执行了双重调度。在 Java 中,您必须自己使用以下if语句序列进行接线:

if (node is AdditionNode) {
    return Visit((AdditionNode)node);
} else if (node is SubtractionNode) {
    return Visit((SubtractionNode)node);
} else if ...

But I just went for the shortcut for this example.

但我只是选择了这个例子的快捷方式。

Work with the AST

使用 AST

So, what can we do with a math expression tree? Evaluate it, of course! Let's implement an expression evaluator:

那么,我们可以用数学表达式树做什么呢?当然要评价!让我们实现一个表达式评估器:

internal class EvaluateExpressionVisitor : AstVisitor<double>
{
    public override double Visit(AdditionNode node)
    {
        return Visit(node.Left) + Visit(node.Right);
    }

    public override double Visit(SubtractionNode node)
    {
        return Visit(node.Left) - Visit(node.Right);
    }

    public override double Visit(MultiplicationNode node)
    {
        return Visit(node.Left) * Visit(node.Right);
    }

    public override double Visit(DivisionNode node)
    {
        return Visit(node.Left) / Visit(node.Right);
    }

    public override double Visit(NegateNode node)
    {
        return -Visit(node.InnerNode);
    }

    public override double Visit(FunctionNode node)
    {
        return node.Function(Visit(node.Argument));
    }

    public override double Visit(NumberNode node)
    {
        return node.Value;
    }
}

Pretty simple once we have an AST, isn't it?

一旦我们有了 AST 就很简单了,不是吗?

Putting it all together

把这一切放在一起

Last but not least, we have to actually write the main program:

最后但并非最不重要的是,我们必须实际编写主程序:

internal class Program
{
    private static void Main()
    {
        while (true)
        {
            Console.Write("> ");
            var exprText = Console.ReadLine();

            if (string.IsNullOrWhiteSpace(exprText))
                break;

            var inputStream = new AntlrInputStream(new StringReader(exprText));
            var lexer = new MathLexer(inputStream);
            var tokenStream = new CommonTokenStream(lexer);
            var parser = new MathParser(tokenStream);

            try
            {
                var cst = parser.compileUnit();
                var ast = new BuildAstVisitor().VisitCompileUnit(cst);
                var value = new EvaluateExpressionVisitor().Visit(ast);

                Console.WriteLine("= {0}", value);
            }
            catch (Exception ex)
            {
                Console.WriteLine(ex.Message);
            }

            Console.WriteLine();
        }
    }
}

And now we can finally play with it:

现在我们终于可以玩它了:

enter image description here

在此处输入图片说明

回答by Julian

I have created a small Java project that allows you to test your ANTLR grammar instantly by compiling the lexer and parser generated by ANTLR in-memory. You can just parse a string by passing it to the parser, and it will automatically generate an AST from it which can then be used in your application.

我创建了一个小型 Java 项目,它允许您通过编译内存中 ANTLR 生成的词法分析器和解析器来立即测试您的 ANTLR 语法。您可以通过将字符串传递给解析器来解析字符串,它会自动从中生成一个 AST,然后可以在您的应用程序中使用它。

For the purpose of reducing the size of the AST, you could use a NodeFilter to which you could add the production-rule names of the non-terminals that you would like to be considered when constructing the AST.

为了减小 AST 的大小,您可以使用一个 NodeFilter,您可以向其中添加在构建 AST 时要考虑的非终端的生产规则名称。

The code and some code examples can be found at https://github.com/julianthome/inmemantlr

代码和一些代码示例可以在https://github.com/julianthome/inmemantlr找到

Hope the tool is useful ;-)

希望该工具有用;-)