java 如何在 ANTLR 3 中处理字符串文字中的转义序列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/504402/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to handle escape sequences in string literals in ANTLR 3?
提问by Sam Martin
I've been looking through the ANTLR v3 documentation (and my trusty copy of "The Definitive ANTLR reference"), and I can't seem to find a clean way to implement escape sequences in string literals (I'm currently using the Java target). I had hoped to be able to do something like:
我一直在浏览 ANTLR v3 文档(以及我可信赖的“最终 ANTLR 参考”副本),但似乎找不到在字符串文字中实现转义序列的干净方法(我目前正在使用 Java目标)。我曾希望能够做这样的事情:
fragment
ESCAPE_SEQUENCE
: '\' '\'' { setText("'"); }
;
STRING
: '\'' (ESCAPE_SEQUENCE | ~('\'' | '\'))* '\''
{
// strip the quotes from the resulting token
setText(getText().substring(1, getText().length() - 1));
}
;
For example, I would want the input token "'Foo\'s House'" to become the String "Foo's House".
例如,我希望输入标记“ 'Foo\'s House'”成为字符串“ Foo's House”。
Unfortunately, the setText(...)call in the ESCAPE_SEQUENCEfragment sets the text for the entire STRINGtoken, which is obviously not what I want.
不幸的是,片段中的setText(...)调用ESCAPE_SEQUENCE设置了整个STRING令牌的文本,这显然不是我想要的。
Is there a way to implement this grammar without adding a method to go back through the resulting string and manually replace escape sequences (e.g., with something like setText(escapeString(getText()))in the STRINGrule)?
有没有办法实现这个语法不增加通过得到的字符串回去手动替换转义序列(例如,用类似的方法的方式setText(escapeString(getText()))在STRING规则)?
采纳答案by Bruno Ranschaert
Here is how I accomplished this in the JSON parser I wrote.
这是我如何在我编写的 JSON 解析器中完成此操作。
STRING
@init{StringBuilder lBuf = new StringBuilder();}
:
'"'
( escaped=ESC {lBuf.append(getText());} |
normal=~('"'|'\'|'\n'|'\r') {lBuf.appendCodePoint(normal);} )*
'"'
{setText(lBuf.toString());}
;
fragment
ESC
: '\'
( 'n' {setText("\n");}
| 'r' {setText("\r");}
| 't' {setText("\t");}
| 'b' {setText("\b");}
| 'f' {setText("\f");}
| '"' {setText("\"");}
| '\'' {setText("\'");}
| '/' {setText("/");}
| '\' {setText("\");}
| ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT
{setText(ParserUtil.hexToChar(i.getText(),j.getText(),
k.getText(),l.getText()));}
)
;
回答by jeanmi
For ANTLR4, Java target and standard escaped string grammar, I used a dedicated singleton class : CharSupport to translate string. It is available in antlr API :
对于 ANTLR4、Java 目标和标准转义字符串语法,我使用了一个专用的单例类:CharSupport 来翻译字符串。它在 antlr API 中可用:
STRING : '"'
( ESC
| ~('"'|'\'|'\n'|'\r')
)*
'"' {
setText(
org.antlr.v4.misc.CharSupport.getStringFromGrammarStringLiteral(
getText()
)
);
}
;
As I saw in V4 documentation and by experiments, @init is no longer supported in lexer part!
正如我在 V4 文档和实验中看到的那样,词法分析器部分不再支持 @init!
回答by Trevor Robinson
Another (possibly more efficient) alternative is to use rule arguments:
另一种(可能更有效)的替代方法是使用规则参数:
STRING
@init { final StringBuilder buf = new StringBuilder(); }
:
'"'
(
ESCAPE[buf]
| i = ~( '\' | '"' ) { buf.appendCodePoint(i); }
)*
'"'
{ setText(buf.toString()); };
fragment ESCAPE[StringBuilder buf] :
'\'
( 't' { buf.append('\t'); }
| 'n' { buf.append('\n'); }
| 'r' { buf.append('\r'); }
| '"' { buf.append('\"'); }
| '\' { buf.append('\'); }
| 'u' a = HEX_DIGIT b = HEX_DIGIT c = HEX_DIGIT d = HEX_DIGIT { buf.append(ParserUtil.hexChar(a, b, c, d)); }
);
回答by Trevor Robinson
I needed to do just that, but my target was C and not Java. Here's how I did it based on answer #1 (and comment), in case anyone needs something alike:
我需要这样做,但我的目标是 C 而不是 Java。这是我根据答案#1(和评论)所做的,以防万一有人需要类似的东西:
QUOTE : '\'';
STR
@init{ pANTLR3_STRING unesc = GETTEXT()->factory->newRaw(GETTEXT()->factory); }
: QUOTE ( reg = ~('\' | '\'') { unesc->addc(unesc, reg); }
| esc = ESCAPED { unesc->appendS(unesc, GETTEXT()); } )+ QUOTE { SETTEXT(unesc); };
fragment
ESCAPED : '\'
( '\' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\")); }
| '\'' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\'")); }
)
;
HTH.
哈。

