用于 Java 的 Javascript 解析器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6511556/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Javascript parser for Java
提问by quarks
Anyone can recommend a decent Javascript parser for Java? I believe Rhino can be used, however it seems an overkill for just doing parsing, or is it the only decent solution? Any suggestion would be greatly appreciated. Thanks.
任何人都可以为 Java 推荐一个不错的 Javascript 解析器?我相信可以使用 Rhino,但是仅进行解析似乎有点矫枉过正,还是唯一合适的解决方案?任何建议将不胜感激。谢谢。
采纳答案by Mike Samuel
From https://github.com/google/caja/blob/master/src/com/google/caja/parser/js/Parser.java
来自https://github.com/google/caja/blob/master/src/com/google/caja/parser/js/Parser.java
The grammar below is a context-free representation of the grammar this parser parses. It disagrees with EcmaScript 262 Edition 3 (ES3) where implementations disagree with ES3. The rules for semicolon insertion and the possible backtracking in expressions needed to properly handle backtracking are commented thoroughly in code, since semicolon insertion requires information from both the lexer and parser and is not determinable with finite lookahead.
Noteworthy features
- Reports warnings on a queue where an error doesn't prevent any further errors, so that we can report multiple errors in a single compile pass instead of forcing developers to play whack-a-mole.
- Does not parse Firefox style
catch (<Identifier> if <Expression>)
since those don't work on IE and many other interpreters.- Recognizes
const
since many interpreters do (not IE) but warns.- Allows, but warns, on trailing commas in
Array
andObject
constructors.- Allows keywords as identifier names but warns since different interpreters have different keyword sets. This allows us to use an expansive keyword set.
To parse strict code, pass in a
PedanticWarningMessageQueue
that convertsMessageLevel#WARNING
and above toMessageLevel#FATAL_ERROR
.
下面的语法是此解析器解析的语法的上下文无关表示。它不同意 EcmaScript 262 版本 3 (ES3),其中实现不同意 ES3。分号插入规则和正确处理回溯所需的表达式中可能的回溯在代码中进行了彻底的注释,因为分号插入需要来自词法分析器和解析器的信息,并且不能通过有限的前瞻来确定。
值得注意的功能
- 在错误不会阻止任何进一步错误的队列上报告警告,以便我们可以在单个编译过程中报告多个错误,而不是强迫开发人员玩打地鼠。
- 不解析 Firefox 样式,
catch (<Identifier> if <Expression>)
因为它们不适用于 IE 和许多其他解释器。- 承认,
const
因为许多解释器(不是 IE)但警告。- 允许,但警告,
Array
和Object
构造函数中的尾随逗号。- 允许关键字作为标识符名称但由于不同的解释器具有不同的关键字集而发出警告。这允许我们使用广泛的关键字集。
要解析严格的代码,请传入 a
PedanticWarningMessageQueue
将其转换MessageLevel#WARNING
为MessageLevel#FATAL_ERROR
.
CajaTestCase.js
shows how to set up a parser, and [fromResource
] and [fromString
] in the same class show how to get an input of the right kind.
CajaTestCase.js
展示了如何设置解析器,同一个类中的[ fromResource
] 和 [ fromString
] 展示了如何获得正确类型的输入。
回答by miku
Here are two ANTLRmore or less working or complete (see comments on this post) grammars for EcmaScript:
以下是EcmaScript 的两个ANTLR或多或少的工作或完整(请参阅这篇文章的评论)语法:
- http://www.antlr.org/grammar/1206736738015/JavaScript.g(incomplete?)
- http://www.antlr.org/grammar/1153976512034/ecmascriptA3.g(buggy?)
- http://www.antlr.org/grammar/1206736738015/JavaScript.g(不完整?)
- http://www.antlr.org/grammar/1153976512034/ecmascriptA3.g(有问题?)
From ANTLR 5 minute intro:
来自ANTLR 5 分钟介绍:
ANTLR reads a language description file called a grammar and generates a number of source code files and other auxiliary files. Most uses of ANTLR generates at least one (and quite often both) of these tools:
A Lexer: This reads an input character or byte stream (i.e. characters, binary data, etc.), divides it into tokens using patterns you specify, and generates a token stream as output. It can also flag some tokens such as whitespace and comments as hidden using a protocol that ANTLR parsers automatically understand and respect.
A Parser: This reads a token stream (normally generated by a lexer), and matches phrases in your language via the rules (patterns) you specify, and typically performs some semantic action for each phrase (or sub-phrase) matched. Each match could invoke a custom action, write some text via StringTemplate, or generate an Abstract Syntax Tree for additional processing.
ANTLR 读取称为语法的语言描述文件,并生成许多源代码文件和其他辅助文件。ANTLR 的大多数用途至少会生成以下工具中的一种(并且通常是两种):
词法分析器:它读取输入字符或字节流(即字符、二进制数据等),使用您指定的模式将其划分为标记,并生成标记流作为输出。它还可以使用 ANTLR 解析器自动理解和尊重的协议将一些标记(例如空格和注释)标记为隐藏。
解析器:它读取标记流(通常由词法分析器生成),并通过您指定的规则(模式)匹配您语言中的短语,并且通常对匹配的每个短语(或子短语)执行一些语义操作。每个匹配项都可以调用自定义操作,通过 StringTemplate 编写一些文本,或生成抽象语法树以进行额外处理。
回答by Matthew Kime
For me, the best solution is using acorn - https://github.com/marijnh/acornunder rhino.
对我来说,最好的解决方案是使用 acorn - https://github.com/marijnh/acorn在 rhino 下。
I just don't think caja is getting attention anymore.
我只是认为 caja 不再受到关注。
回答by Luke Machowski
When using Java V1.8, there is a trick you can use to parse with the Nashorn implementation that comes out the box. By looking at the unit tests in the OpenSDK source code, you can see how to use the parser only, without doing all the extra compilation etc...
使用 Java V1.8 时,您可以使用一个技巧来解析开箱即用的 Nashorn 实现。通过查看 OpenSDK 源代码中的单元测试,您可以了解如何仅使用解析器,而无需进行所有额外的编译等...
Options options = new Options("nashorn");
options.set("anon.functions", true);
options.set("parse.only", true);
options.set("scripting", true);
ErrorManager errors = new ErrorManager();
Context context = new Context(options, errors, Thread.currentThread().getContextClassLoader());
Source source = new Source("test", "var a = 10; var b = a + 1;" +
"function someFunction() { return b + 1; } ");
Parser parser = new Parser(context.getEnv(), source, errors);
FunctionNode functionNode = parser.parse();
Block block = functionNode.getBody();
List<Statement> statements = block.getStatements();
Once this code runs, you will have the Abstract Syntax Tree (AST) for the 3 expressions in the 'statements' list.
运行此代码后,您将拥有“语句”列表中 3 个表达式的抽象语法树 (AST)。
This can then be interpreted or manipulated to your needs.
然后可以根据您的需要对其进行解释或操作。
The previous example works with following imports:
上一个示例适用于以下导入:
import jdk.nashorn.internal.ir.Block;
import jdk.nashorn.internal.ir.FunctionNode;
import jdk.nashorn.internal.ir.Statement;
import jdk.nashorn.internal.parser.Parser;
import jdk.nashorn.internal.runtime.Context;
import jdk.nashorn.internal.runtime.ErrorManager;
import jdk.nashorn.internal.runtime.Source;
import jdk.nashorn.internal.runtime.options.Options;
You might need to add an access rule to make jdk/nashorn/internal/**
accessible.
您可能需要添加访问规则才能jdk/nashorn/internal/**
访问。
In my context, I am using Java Script as an expression language for my own Domain Specific Language (DSL) which I will then compile to Java classes at runtime and use. The AST lets me generate appropriate Java code that captures the intent of the Java Script expressions.
在我的上下文中,我使用 Java Script 作为我自己的领域特定语言 (DSL) 的表达语言,然后我将在运行时将其编译为 Java 类并使用。AST 允许我生成适当的 Java 代码来捕获 Java Script 表达式的意图。
Nashorn is available with Java SE 8.
Nashorn 可用于 Java SE 8。
The link to information about getting the Nashorn source code is here: https://wiki.openjdk.java.net/display/Nashorn/Building+Nashorn
有关获取 Nashorn 源代码的信息链接如下:https: //wiki.openjdk.java.net/display/Nashorn/Building+Nashorn
回答by Luke Machowski
A previous answer describes a way to get under the covers of JDK 8 to parse javascript. They are now mainlining it in Java 9. Nice!
之前的答案描述了一种深入 JDK 8 来解析 javascript 的方法。他们现在正在 Java 9 中对其进行主线处理。太好了!
This will mean that you don't need to include any libraries, instead we can rely on an official implementation from the java guys. Parsing javascript programmatically is much easier to achieve without stepping into taboo areas of java code.
这意味着您不需要包含任何库,相反,我们可以依赖 Java 人员的官方实现。以编程方式解析 javascript 更容易实现,而无需进入 Java 代码的禁忌区域。
Applicationsof this might be where you want to use javascript for a rules engine which gets parsed and compiled into some other language at runtime. The AST lets you 'understand' the logic as written in the the concise javascript language and then generate less pretty logic in some other language or framework for execution or evaluation.
这种应用程序可能是您希望将 javascript 用于规则引擎的地方,该引擎在运行时被解析并编译成其他语言。AST 让你“理解”用简洁的 javascript 语言编写的逻辑,然后用其他语言或框架生成不太漂亮的逻辑来执行或评估。
http://openjdk.java.net/jeps/236
http://openjdk.java.net/jeps/236
Summary from the link above:
来自上面链接的摘要:
Define a supported API for Nashorn's ECMAScript abstract syntax tree.
为 Nashorn 的 ECMAScript 抽象语法树定义一个受支持的 API。
Goals
目标
- Provide interface classes to represent Nashorn syntax-tree nodes.
- Provide a factory to create a configured parser instance, with configuration done by passing Nashorn command-line options via an API.
- Provide a visitor-pattern API to visit AST nodes.
- Provide sample/test programs to use the API.
- 提供接口类来表示 Nashorn 语法树节点。
- 提供一个工厂来创建一个配置的解析器实例,配置通过 API 传递 Nashorn 命令行选项来完成。
- 提供一个访问者模式 API 来访问 AST 节点。
- 提供示例/测试程序以使用 API。
Non-Goals
非目标
- The AST nodes will represent notions in the ECMAScript specification insofar as possible, but they will not be exactly the same. Wherever possible the javac tree API's interfaces will be adopted for ECMAScript.
- No external parser/tree standard or API will be used.
- There will be no script-level parser API. This is a Java API, although scripts can call into Java and therefore make use of this API.
- AST 节点将尽可能表示 ECMAScript 规范中的概念,但它们不会完全相同。ECMAScript 将尽可能采用 javac 树 API 的接口。
- 不会使用外部解析器/树标准或 API。
- 将没有脚本级解析器 API。这是一个 Java API,尽管脚本可以调用 Java 并因此使用此 API。
回答by Luke Machowski
EcmaScript 5 Parser for the java https://github.com/DigiArea/es5-model
用于 java https://github.com/DigiArea/es5-model 的EcmaScript 5 Parser