java 如何在 ANTLR 3 中处理字符串文字中的转义序列？

Question

提问by Sam Martin

I've been looking through the ANTLR v3 documentation (and my trusty copy of "The Definitive ANTLR reference"), and I can't seem to find a clean way to implement escape sequences in string literals (I'm currently using the Java target). I had hoped to be able to do something like:

我一直在浏览 ANTLR v3 文档（以及我可信赖的“最终 ANTLR 参考”副本），但似乎找不到在字符串文字中实现转义序列的干净方法（我目前正在使用 Java目标）。我曾希望能够做这样的事情：

fragment 
ESCAPE_SEQUENCE
    : '\' '\'' { setText("'"); }
    ;

STRING  
    : '\'' (ESCAPE_SEQUENCE | ~('\'' | '\'))* '\''
      { 
        // strip the quotes from the resulting token
        setText(getText().substring(1, getText().length() - 1));
      } 
    ;

For example, I would want the input token "'Foo\'s House'" to become the String "Foo's House".

例如，我希望输入标记“ 'Foo\'s House'”成为字符串“ Foo's House”。

Unfortunately, the setText(...)call in the ESCAPE_SEQUENCEfragment sets the text for the entire STRINGtoken, which is obviously not what I want.

不幸的是，片段中的setText(...)调用ESCAPE_SEQUENCE设置了整个STRING令牌的文本，这显然不是我想要的。

Is there a way to implement this grammar without adding a method to go back through the resulting string and manually replace escape sequences (e.g., with something like setText(escapeString(getText()))in the STRINGrule)?

有没有办法实现这个语法不增加通过得到的字符串回去手动替换转义序列（例如，用类似的方法的方式setText(escapeString(getText()))在STRING规则）？

Answer 1

采纳答案by Bruno Ranschaert

Here is how I accomplished this in the JSON parser I wrote.

这是我如何在我编写的 JSON 解析器中完成此操作。

STRING      
@init{StringBuilder lBuf = new StringBuilder();}
    :   
           '"' 
           ( escaped=ESC {lBuf.append(getText());} | 
             normal=~('"'|'\'|'\n'|'\r')     {lBuf.appendCodePoint(normal);} )* 
           '"'     
           {setText(lBuf.toString());}
    ;

fragment
ESC
    :   '\'
        (   'n'    {setText("\n");}
        |   'r'    {setText("\r");}
        |   't'    {setText("\t");}
        |   'b'    {setText("\b");}
        |   'f'    {setText("\f");}
        |   '"'    {setText("\"");}
        |   '\''   {setText("\'");}
        |   '/'    {setText("/");}
        |   '\'   {setText("\");}
        |   ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT
                   {setText(ParserUtil.hexToChar(i.getText(),j.getText(),
                                                 k.getText(),l.getText()));}

        )
    ;

Answer 2

回答by jeanmi

For ANTLR4, Java target and standard escaped string grammar, I used a dedicated singleton class : CharSupport to translate string. It is available in antlr API :

对于 ANTLR4、Java 目标和标准转义字符串语法，我使用了一个专用的单例类：CharSupport 来翻译字符串。它在 antlr API 中可用：

STRING          :   '"' 
                (   ESC  
                |   ~('"'|'\'|'\n'|'\r') 
                )* 
                    '"' { 
                        setText( 
                            org.antlr.v4.misc.CharSupport.getStringFromGrammarStringLiteral(
                                getText()
                            )
                        ); 
                    }
                ;

As I saw in V4 documentation and by experiments, @init is no longer supported in lexer part!

正如我在 V4 文档和实验中看到的那样，词法分析器部分不再支持 @init！

Answer 3

回答by Trevor Robinson

Another (possibly more efficient) alternative is to use rule arguments:

另一种（可能更有效）的替代方法是使用规则参数：

STRING
@init { final StringBuilder buf = new StringBuilder(); }
:
    '"'
    (
    ESCAPE[buf]
    | i = ~( '\' | '"' ) { buf.appendCodePoint(i); }
    )*
    '"'
    { setText(buf.toString()); };

fragment ESCAPE[StringBuilder buf] :
    '\'
    ( 't' { buf.append('\t'); }
    | 'n' { buf.append('\n'); }
    | 'r' { buf.append('\r'); }
    | '"' { buf.append('\"'); }
    | '\' { buf.append('\'); }
    | 'u' a = HEX_DIGIT b = HEX_DIGIT c = HEX_DIGIT d = HEX_DIGIT { buf.append(ParserUtil.hexChar(a, b, c, d)); }
    );

Answer 4

回答by Trevor Robinson

I needed to do just that, but my target was C and not Java. Here's how I did it based on answer #1 (and comment), in case anyone needs something alike:

我需要这样做，但我的目标是 C 而不是 Java。这是我根据答案#1（和评论）所做的，以防万一有人需要类似的东西：

QUOTE   :      '\'';
STR
@init{ pANTLR3_STRING unesc = GETTEXT()->factory->newRaw(GETTEXT()->factory); }
        :       QUOTE ( reg = ~('\' | '\'') { unesc->addc(unesc, reg); }
                        | esc = ESCAPED { unesc->appendS(unesc, GETTEXT()); } )+ QUOTE { SETTEXT(unesc); };

fragment
ESCAPED :       '\'
                ( '\' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\")); }
                | '\'' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\'")); }
                )
        ;

HTH.

哈。

java 如何在 ANTLR 3 中处理字符串文字中的转义序列？

提问by Sam Martin

采纳答案by Bruno Ranschaert

回答by jeanmi

回答by Trevor Robinson

回答by Trevor Robinson

相关推荐

最近更新

标签

java 如何在 ANTLR 3 中处理字符串文字中的转义序列？

提问by Sam Martin

采纳答案by Bruno Ranschaert

回答by jeanmi

回答by Trevor Robinson

回答by Trevor Robinson

相关推荐

如果您在 jar 中运行 java，您能告诉运行时吗？

java 如何从文件 URI 中提取文件名并为其创建链接？

java 在 JSTL 中将字符串转换为标题大小写

java JBoss 内存泄漏

相关推荐

最近更新

标签