java 正则表达式匹配 C 风格的多行注释
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13014947/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex to match a C-style multiline comment
提问by hanumant
I have a string for e.g.
我有一个字符串,例如
String src = "How are things today /* this is comment *\*/ and is your code /*\* this is another comment */ working?"
I want to remove /* this is comment *\*/
and /** this is another comment */
substrings from the src
string.
我想从字符串中删除/* this is comment *\*/
和/** this is another comment */
子src
字符串。
I tried to use regex but failed due to less experience.
我尝试使用正则表达式但由于经验不足而失败。
采纳答案by David Kroukamp
Try using this regex (Single line comments only):
尝试使用此正则表达式(仅限单行注释):
String src ="How are things today /* this is comment */ and is your code /* this is another comment */ working?";
String result=src.replaceAll("/\*.*?\*/","");//single line comments
System.out.println(result);
REGEX explained:
正则表达式解释:
Match the character "/" literally
Match the character "*" literally
"." Match any single character
"*?" Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
Match the character "*" literally
Match the character "/" literally
字面匹配字符“/”
字面匹配字符“*”
“。” 匹配任意单个字符
“*?” 在零次和无限次之间,尽可能少的次数,按需扩展(懒惰)
字面匹配字符“*”
字面匹配字符“/”
Alternatively here is regex for single and multi-line comments by adding (?s):
或者,这里是通过添加(?s)用于单行和多行注释的正则表达式:
//note the added \n which wont work with previous regex
String src ="How are things today /* this\n is comment */ and is your code /* this is another comment */ working?";
String result=src.replaceAll("(?s)/\*.*?\*/","");
System.out.println(result);
Reference:
参考:
回答by Wiktor Stribi?ew
The best multiline comment regexis an unrolled version of (?s)/\*.*?\*/
that looks like
在最好的多行注释的正则表达式是一个展开的版本(?s)/\*.*?\*/
,看起来像
String pat = "/\*[^*]*\*+(?:[^/*][^*]*\*+)*/";
See the regex demo and explanation at regex101.com.
In short,
简而言之,
/\*
- match the comment start/*
[^*]*\*+
- match 0+ characters other than*
followed with 1+ literal*
(?:[^/*][^*]*\*+)*
- 0+ sequences of:[^/*][^*]*\*+
- not a/
or*
(matched with[^/*]
) followed with 0+ non-asterisk characters ([^*]*
) followed with 1+ asterisks (\*+
)
/
- closing/
/\*
- 匹配评论开始/*
[^*]*\*+
- 匹配 0+ 个字符,*
后面跟 1+ 个文字*
(?:[^/*][^*]*\*+)*
- 0+ 序列:[^/*][^*]*\*+
- 不是/
或*
(与 匹配[^/*]
)后跟 0+ 非星号字符 ([^*]*
) 后跟 1+ 星号 (\*+
)
/
- 关闭/
David's regexneeds 26 stepsto find the match in my example string, and my regexneeds just 12 steps. With huge inputs, David's regex is likely to fail with a stack overflow issue or something similar because the .*?
lazy dot matching is inefficient due to lazy pattern expansion at each location the regex engine performs, while my pattern matches linear chunks of text in one go.
David 的正则表达式需要26 个步骤才能在我的示例字符串中找到匹配项,而我的正则表达式只需要12 个步骤。对于大量输入,David 的正则表达式可能会因堆栈溢出问题或类似问题而失败,因为.*?
由于正则表达式引擎执行的每个位置的惰性模式扩展,惰性点匹配效率低下,而我的模式一次性匹配线性文本块。
回答by x15
Can't parse C/C++ style comments in Java source directly.
Quoted strings have to be parsed at the same time and within the same regex
because the string may embed /*
or //
, the start of a comment when it is just part
of the string.
无法直接解析 Java 源代码中的 C/C++ 样式注释。
引用的字符串必须同时在同一个正则表达式中解析,
因为字符串可能嵌入/*
或//
,当它只是
字符串的一部分时,注释的开头。
Note there is additional regex consideration needs if raw stringsconstructs
are possible in the language.
请注意,如果
语言中可以使用原始字符串构造,则需要额外考虑正则表达式
。
The regex that does this feat is this.
Where group 1 contains the Commentand group 2 contains the Non-Comment.
For example if you were removing comments it would be:
完成这项壮举的正则表达式是这样的。
其中第 1 组包含Comment,第 2 组包含Non-Comment。
例如,如果您要删除评论,它将是:
Find(/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n|$))|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)
寻找(/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n|$))|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)
Replace$2
代替$2
Stringed:"(/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\(?:\\r?\\n)?)*?(?:\\r?\\n|$))|(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|[\\S\\s][^/\"'\\\\]*)"
弦:"(/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\(?:\\r?\\n)?)*?(?:\\r?\\n|$))|(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|[\\S\\s][^/\"'\\\\]*)"
回答by Akshay
Try this one:
试试这个:
(//[^\n]*$|/(?!\)\*[\s\S]*?\*(?!\)/)
If you want to exclude the parts enclused in " " then use:
如果要排除“”中包含的部分,请使用:
(\"[^\"]*\"(?!\))|(//[^\n]*$|/(?!\)\*[\s\S]*?\*(?!\)/)
the first capturing group identifies all " " parts and second capturing group gives you comments (both single line and multi line)
第一个捕获组标识所有“”部分,第二个捕获组为您提供注释(单行和多行)
copy the regular expression to regex101if you want explanation
回答by Mahesh Yadav
This could be the best approach for multi-line comments
这可能是多行注释的最佳方法
System.out.println(text.replaceAll("\\/\\*[\\s\\S]*?\\*\\/", ""));
System.out.println(text.replaceAll("\\/\\*[\\s\\S]*?\\*\\/", ""));
回答by jens-na
System.out.println(src.replaceAll("\/\*.*?\*\/ ?", ""));
You have to use the non-greedy-quantifier ? to get the regex working. I also added a ' ?' at the end of the regex to remove one space.
你必须使用非贪婪量词?让正则表达式工作。我还加了一个“?” 在正则表达式的末尾删除一个空格。
回答by Digerkam
Try this which worked for me:
试试这个对我有用的:
System.out.println(src.replaceAll("(\/\*.*?\*\/)+",""));