java 正则表达式匹配 C 风格的多行注释

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13014947/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 11:09:44  来源:igfitidea点击:

Regex to match a C-style multiline comment

javaregexstring

提问by hanumant

I have a string for e.g.

我有一个字符串,例如

String src = "How are things today /* this is comment *\*/ and is your code  /*\* this is another comment */ working?"

I want to remove /* this is comment *\*/and /** this is another comment */substrings from the srcstring.

我想从字符串中删除/* this is comment *\*//** this is another comment */src字符串。

I tried to use regex but failed due to less experience.

我尝试使用正则表达式但由于经验不足而失败。

采纳答案by David Kroukamp

Try using this regex (Single line comments only):

尝试使用此正则表达式(仅限单行注释):

String src ="How are things today /* this is comment */ and is your code /* this is another comment */ working?";
String result=src.replaceAll("/\*.*?\*/","");//single line comments
System.out.println(result);

REGEX explained:

正则表达式解释:

Match the character "/" literally

Match the character "*" literally

"." Match any single character

"*?" Between zero and unlimited times, as few times as possible, expanding as needed (lazy)

Match the character "*" literally

Match the character "/" literally

字面匹配字符“/”

字面匹配字符“*”

“。” 匹配任意单个字符

“*?” 在零次和无限次之间,尽可能少的次数,按需扩展(懒惰)

字面匹配字符“*”

字面匹配字符“/”

Alternatively here is regex for single and multi-line comments by adding (?s):

或者,这里是通过添加(?s)用于单行和多行注释的正则表达式:

//note the added \n which wont work with previous regex
String src ="How are things today /* this\n is comment */ and is your code /* this is another comment */ working?";
String result=src.replaceAll("(?s)/\*.*?\*/","");
System.out.println(result);

Reference:

参考:

回答by Wiktor Stribi?ew

The best multiline comment regexis an unrolled version of (?s)/\*.*?\*/that looks like

最好的多行注释的正则表达式是一个展开的版本(?s)/\*.*?\*/,看起来像

String pat = "/\*[^*]*\*+(?:[^/*][^*]*\*+)*/";

See the regex demo and explanation at regex101.com.

请参阅regex101.com 上正则表达式演示和说明

In short,

简而言之,

  • /\*- match the comment start /*
  • [^*]*\*+- match 0+ characters other than *followed with 1+ literal *
  • (?:[^/*][^*]*\*+)*- 0+ sequences of:
    • [^/*][^*]*\*+- not a /or *(matched with [^/*]) followed with 0+ non-asterisk characters ([^*]*) followed with 1+ asterisks (\*+)
  • /- closing /
  • /\*- 匹配评论开始 /*
  • [^*]*\*+- 匹配 0+ 个字符,*后面跟 1+ 个文字*
  • (?:[^/*][^*]*\*+)*- 0+ 序列:
    • [^/*][^*]*\*+- 不是/*(与 匹配[^/*])后跟 0+ 非星号字符 ( [^*]*) 后跟 1+ 星号 ( \*+)
  • /- 关闭 /

David's regexneeds 26 stepsto find the match in my example string, and my regexneeds just 12 steps. With huge inputs, David's regex is likely to fail with a stack overflow issue or something similar because the .*?lazy dot matching is inefficient due to lazy pattern expansion at each location the regex engine performs, while my pattern matches linear chunks of text in one go.

David 的正则表达式需要26 个步骤才能在我的示例字符串中找到匹配项,而我的正则表达式只需要12 个步骤。对于大量输入,David 的正则表达式可能会因堆栈溢出问题或类似问题而失败,因为.*?由于正则表达式引擎执行的每个位置的惰性模式扩展,惰性点匹配效率低下,而我的模式一次性匹配线性文本块。

回答by x15

Can't parse C/C++ style comments in Java source directly.
Quoted strings have to be parsed at the same time and within the same regex
because the string may embed /*or //, the start of a comment when it is just part
of the string.

无法直接解析 Java 源代码中的 C/C++ 样式注释。
引用的字符串必须同时在同一个正则表达式中解析,
因为字符串可能嵌入/*//,当它只是
字符串的一部分时,注释的开头。

Note there is additional regex consideration needs if raw stringsconstructs
are possible in the language.

请注意,如果 语言中可以使用原始字符串构造,则需要额外考虑正则表达式

The regex that does this feat is this.
Where group 1 contains the Commentand group 2 contains the Non-Comment.
For example if you were removing comments it would be:

完成这项壮举的正则表达式是这样的。
其中第 1 组包含Comment,第 2 组包含Non-Comment
例如,如果您要删除评论,它将是:

Find
(/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n|$))|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)

寻找
(/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\(?:\r?\n)?)*?(?:\r?\n|$))|("[^"\\]*(?:\\[\S\s][^"\\]*)*"|'[^'\\]*(?:\\[\S\s][^'\\]*)*'|[\S\s][^/"'\\]*)

Replace
$2

代替
$2



Stringed:
"(/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\(?:\\r?\\n)?)*?(?:\\r?\\n|$))|(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|[\\S\\s][^/\"'\\\\]*)"

弦:
"(/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/|//(?:[^\\\\]|\\\\(?:\\r?\\n)?)*?(?:\\r?\\n|$))|(\"[^\"\\\\]*(?:\\\\[\\S\\s][^\"\\\\]*)*\"|'[^'\\\\]*(?:\\\\[\\S\\s][^'\\\\]*)*'|[\\S\\s][^/\"'\\\\]*)"

回答by Akshay

Try this one:

试试这个:

(//[^\n]*$|/(?!\)\*[\s\S]*?\*(?!\)/)

If you want to exclude the parts enclused in " " then use:

如果要排除“”中包含的部分,请使用:

(\"[^\"]*\"(?!\))|(//[^\n]*$|/(?!\)\*[\s\S]*?\*(?!\)/)

the first capturing group identifies all " " parts and second capturing group gives you comments (both single line and multi line)

第一个捕获组标识所有“”部分,第二个捕获组为您提供注释(单行和多行)

copy the regular expression to regex101if you want explanation

如果需要解释,请将正则表达式复制到regex101

回答by Mahesh Yadav

This could be the best approach for multi-line comments

这可能是多行注释的最佳方法

System.out.println(text.replaceAll("\\/\\*[\\s\\S]*?\\*\\/", ""));

System.out.println(text.replaceAll("\\/\\*[\\s\\S]*?\\*\\/", ""));

回答by jens-na

System.out.println(src.replaceAll("\/\*.*?\*\/ ?", ""));

You have to use the non-greedy-quantifier ? to get the regex working. I also added a ' ?' at the end of the regex to remove one space.

你必须使用非贪婪量词?让正则表达式工作。我还加了一个“?” 在正则表达式的末尾删除一个空格。

回答by Digerkam

Try this which worked for me:

试试这个对我有用的:

System.out.println(src.replaceAll("(\/\*.*?\*\/)+",""));