C语言 flex/lex 中字符串文字的正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2039795/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 04:07:13  来源:igfitidea点击:

Regular expression for a string literal in flex/lex

cregexlexstring-literalsflex-lexer

提问by Thomas

I'm experimenting to learn flex and would like to match string literals. My code currently looks like:

我正在尝试学习 flex 并想匹配字符串文字。我的代码目前看起来像:

"\""([^\n\"\]*(\[.\n])*)*"\""        {/*matches string-literal*/;}

I've been struggling with variations for an hour or so and can't get it working the way it should. I'm essentially hoping to match a string literal that can't contain a new-line (unless it's escaped) and supports escaped characters.

我一直在为变化而苦苦挣扎一个小时左右,但无法让它以应有的方式工作。我基本上希望匹配一个不能包含换行符(除非它被转义)并支持转义字符的字符串文字。

I am probably just writing a poor regular expression or one incompatible with flex. Please advise!

我可能只是写了一个糟糕的正则表达式或一个与 flex 不兼容的正则表达式。请指教!

回答by Jonathan Feinberg

A string consists of a quote mark

字符串由引号组成

"

followed by zero or more of either an escaped anything

后跟零个或多个转义的任何内容

\.

or a non-quote character, non-backslash character

或非引号字符、非反斜杠字符

[^"\]

and finally a terminating quote

最后是一个终止报价

"

Put it all together, and you've got

把它们放在一起,你就有了

\"(\.|[^"\])*\"

The delimiting quotes are escaped because they are Flex meta-characters.

分隔引号被转义,因为它们是 Flex 元字符。

回答by Pete

For a single line... you can use this:

对于单行......你可以使用这个:

\"([^\\"]|\.)*\"  {/*matches string-literal on a single line*/;}

回答by t0mm13b

How about using a start state...

如何使用开始状态...

int enter_dblquotes = 0;

%x DBLQUOTES
%%

\"  { BEGIN(DBLQUOTES); enter_dblquotes++; }

<DBLQUOTES>*\" 
{ 
   if (enter_dblquotes){
       handle_this_dblquotes(yytext); 
       BEGIN(INITIAL); /* revert back to normal */
       enter_dblquotes--; 
   } 
}
         ...more rules follow...

It was similar to that effect (flex uses %sor %xto indicate what state would be expected. When the flex input detects a quote, it switches to another state, then continues lexing until it reaches another quote, in which it reverts back to the normal state.

它类似于那个效果(flex 使用%sor%x来指示预期的状态。当 flex 输入检测到一个引用时,它切换到另一个状态,然后继续词法分析,直到它到达另一个引用,在该状态下它恢复到正常状态.

回答by david

An answer that arrives late but which can be useful for the next one who will need it:

一个迟到但对下一个需要它的人有用的答案:

\"(([^\"]|\\")*[^\])?\"

回答by Torvaldur Rúnarsson

This is what we use in Zolangfor single line string literals with embedded templates ${...}

这就是我们在Zolang 中使用的带有嵌入式模板的单行字符串文字${...}

\"(\$\{.*\}|\\.|[^\"\\])*\"

\"(\$\{.*\}|\\.|[^\"\\])*\"

回答by pwxcoo

Paste my code snippet about handling string in flex, hope inspire your thinking.

粘贴我关于在 flex 中处理字符串的代码片段,希望能激发您的思考。

Use Start Conditionto handle string literal will be more scalable and clear.

使用Start Condition处理字符串文字将更具可扩展性和清晰性。

%x SINGLE_STRING

%%

\"                          BEGIN(SINGLE_STRING);
<SINGLE_STRING>{
  \n                        yyerror("the string misses \" to termiate before newline");
  <<EOF>>                   yyerror("the string misses \" to terminate before EOF");
  ([^\\"]|\.)*            {/* do your work like save in here */}
  \"                        BEGIN(INITIAL);
  .                         ;
}