java 为简单的伪代码语言创建解析器?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9957873/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Creating a parser for a simple pseudocode language?
提问by Vinayak Garg
I wanted to make a simple parser, for a "pseudo code" like language(kept rigid), in Java. A sample pseudo code would be -
我想为 Java 中的“伪代码”(如语言(保持刚性))制作一个简单的解析器。示例伪代码将是 -
//This is a comment
$x1 = readint
$x2 = readint
$dx = $x2 - $x1
#f = $dx / 2
if ($dx > 0)
{
loop while(#f > 1)
{
print(#f)
#f = #f / 2
}
}
Note that above code is rigid in that, there can not be more than one statement on a line, integers start with $, floats start with # etc.
请注意,上面的代码是严格的,一行不能超过一个语句,整数以 $ 开头,浮点数以 # 开头等。
To parse such code, first I can use StringTokenizer
, and then regular expression, to match integer-variables, float-variables, or Keywords.
要解析这样的代码,我可以首先使用StringTokenizer
,然后使用正则表达式来匹配整数变量、浮点变量或关键字。
Is this approach good? For statements in loop, how can i store expressions, so that i don't have to tokenize in each iteration?
这种方法好吗?对于循环中的语句,我如何存储表达式,以便不必在每次迭代中进行标记?
I could think of converting expressions (like #f = #f / 2) to polish notation, and then to store in stack. And in each iteration, while popping operands I could replace value for each variable. But is this efficient enough?
我可以考虑将表达式(如#f = #f / 2)转换为润色符号,然后存储在堆栈中。在每次迭代中,在弹出操作数时,我可以替换每个变量的值。但这是否足够有效?
Thanks in advance, for any suggestion.
在此先感谢您的任何建议。
回答by templatetypedef
Although I think that it's great that you want to build a parser for a language like this, doing so is much harder than it looks. Parsing is a very well-studied problem and there are many excellent algorithms that you can use, but they are extremely difficult to implement by hand. While you can use tricks like conversions to RPN for smaller examples like parsing expressions, building up a full programming language requires a much more complex set of tricks.
虽然我认为你想为这样的语言构建一个解析器很好,但这样做比看起来要困难得多。解析是一个经过充分研究的问题,您可以使用许多优秀的算法,但手动实现它们极其困难。虽然您可以将转换为 RPN 之类的技巧用于解析表达式等较小的示例,但构建完整的编程语言需要一组更复杂的技巧。
To parse a language of this complexity, you are probably best off using a parser generator rather than trying to write your own by hand. ANTLRand Java CUPare two well-known tools for doing precisely what you're interested in accomplishing, and I would strongly suggest using one of the two of them.
要解析这种复杂的语言,最好使用解析器生成器,而不是尝试手动编写自己的语言。 ANTLR和Java CUP是两个众所周知的工具,可以精确地完成您感兴趣的任务,我强烈建议您使用其中的一个。
Hope this helps!
希望这可以帮助!
回答by Ira Baxter
For simple languages (this is a judgement call, and if you are inexperienced you may not be able to make that call correctly), one can often write a recursive descent parser by hand that does well enough. The good news is that coding a recursive descent parser is pretty straightforward.
对于简单的语言(这是一个判断调用,如果您没有经验,您可能无法正确调用),通常可以手动编写一个性能足够好的递归下降解析器。好消息是编写递归下降解析器非常简单。
If you aren't sure, use overkill in the form of the strongest parser generator you can get.
如果您不确定,请以您可以获得的最强解析器生成器的形式使用过度杀伤。
回答by stefan bachert
in simple cases writing manually a parser makes sense.
在简单的情况下,手动编写解析器是有意义的。
However, using StringTokenizer is a indicator of doing it wrong, because a StringTokenizer IS already a SIMPLE parser.
然而,使用 StringTokenizer 是做错的一个指标,因为 StringTokenizer 已经是一个简单的解析器。
a parser usually reads a char and changes its state depending on the value of that char.
解析器通常读取一个字符并根据该字符的值更改其状态。
Just a simple parser, a "b" make following char "uppercase", e to lowercase. "." stops
只是一个简单的解析器,“b”使后面的字符“大写”,e 变为小写。“。” 停止
String input = "aDDbcDDeaaef.";
int pos = 0;
int state = 0;
while (pos < input.length()) {
char z = input.charAt (pos);
if (z == '.') break;
switch (z) {
case 'b': state = 1; break;
case 'e': state = 0; break;
default:
if (state == 0) {
System.out.print(Char.toLowerCase(z));
} else {
System.out.print(Char.toUpperCase(z));
}
}
pos ++;
}