C# 从源文件中删除所有注释(单行/多行)和空行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9113163/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove all comment (single-/multi-line) & blank lines from source file
提问by nenito
How can I remove all comments and blank lines from a C# source file. Have in mind that there could be a nested comments. Some examples:
如何从 C# 源文件中删除所有注释和空行。请记住,可能存在嵌套注释。一些例子:
string text = @"//not a comment"; // a comment
/* multiline
comment */ string newText = "/*not a comment*/"; // a comment
/* multiline // not a comment
/* comment */ string anotherText = "/* not a comment */ // some text here\"// not a comment"; // a comment
We can have much more complex source than those three examples above. Can some one suggest a regex pattern or other way to solve this. I've already browsed a lot a stuff over the internet and coudn't find anything that works.
我们可以有比上面三个例子更复杂的来源。有人可以建议一种正则表达式模式或其他方法来解决这个问题。我已经在互联网上浏览了很多东西,但找不到任何有用的东西。
采纳答案by sga101
To remove the comments, see this answer. After that, removing empty lines is trivial.
要删除评论,请参阅此答案。之后,删除空行是微不足道的。
回答by Sam Greenhalgh
Unfortunatly this is really difficult to do reliably with regex without there being edge cases. I havnt investigated very far but you might be able to use the Visual Studio Language Servicesto parse comments.
不幸的是,在没有边缘情况的情况下,使用正则表达式确实很难可靠地做到这一点。我还没有调查过很远,但您也许可以使用Visual Studio 语言服务来解析注释。
回答by casperOne
First, you'll definitely want to use the RegexOptions.SingleLinewhen constructing your RegExinstance. Right now, you are processing single lines of code.
首先,您肯定希望RegexOptions.SingleLine在构建RegEx实例时使用。现在,您正在处理单行代码。
To compliment the using of the RegexOptions.SingleLineoption, you'll want to make sure you use the start and end string anchors(^and $respectively), as for the specific cases you have, you want the regular expression to apply to the entirestring.
为了赞美该RegexOptions.SingleLine选项的使用,您需要确保使用开始和结束字符串锚点(^和$),至于您拥有的特定情况,您希望正则表达式应用于整个字符串。
I'd also recommend breaking up the conditions and using alternationto handle smaller cases, constructing a larger regular expression from the smaller, easier-to-manage expressions.
我还建议分解条件并使用交替来处理较小的情况,从更小、更易于管理的表达式构建更大的正则表达式。
Finally, I know this is homework, but parsing a software language with regular expressions is an exercise in futility (it's not a practical application). It's better for more highly structured data. If you find in the future you want to do things like this, use a parser which is built for the language, (in this case, I'd highlyrecommend Roslyn).
最后,我知道这是作业,但是用正则表达式解析软件语言是徒劳的(这不是实际应用)。更适合高度结构化的数据。如果您发现将来想要做这样的事情,请使用为该语言构建的解析器(在这种情况下,我强烈推荐Roslyn)。
回答by Joe White
If you want to identify comments with regexes, you really need to use the regex as a tokenizer. I.e., it identifies and extracts the first thing in the string, whether that thing be a string literal, a comment, or a block of stuff that is neither string literal nor comment. Then you grab the remainder of the string and pull the next token off the beginning.
如果你想用正则表达式识别注释,你真的需要使用正则表达式作为标记器。即,它识别并提取字符串中的第一个内容,无论该内容是字符串文字、注释还是既不是字符串文字也不是注释的内容块。然后你抓住字符串的其余部分并从开头拉出下一个标记。
This gets you around the problems with context. If you're just trying to look for things in the middle of the string, there's no good way to identify whether a particular "comment" is inside a string literal or not -- in fact, it's hard to identify where the string literals are in the first place, because of things like \". But if you always take the first thing in the string, it's easy to say "oh, the string starts with ", so everything up to the next unescaped "is more string." Context takes care of itself.
这可以帮助您解决上下文问题。如果您只是想查找字符串中间的内容,则没有好的方法可以识别特定的“注释”是否在字符串文字内——事实上,很难识别字符串文字的位置首先,因为诸如\". 但是如果你总是取字符串中的第一件事,很容易说“哦,字符串以 开头",所以直到下一个未转义的所有内容"都是更多的字符串。” 上下文会自行处理。
So you would want three regexes:
所以你会想要三个正则表达式:
- One that identifies a comment starting at the beginning of the string (either a
//or a/*comment). - One that identifies a string literal starting at the beginning of the string. Remember to check for both
"and@"strings; each has its own edge cases. - One that identifies something that is neither of the above, and matches up until the first thing that couldbe a comment or a string literal.
- 标识从字符串开头开始的注释(a
//或/*注释)。 - 一种标识从字符串开头开始的字符串文字。记得检查
"和@"字符串;每个都有自己的边缘情况。 - 一个标识不是上述任何一个的东西,并匹配到第一个可能是注释或字符串文字的东西。
Writing the actual regex patterns is left as an exercise for the reader, since it would take hours to write and test it all and I'm not willing to do that for free. (grin) But it's certainly doable, if you have a good understanding of regexes (or have a place like StackOverflow to ask specific questions when you get stuck) and are willing to write a bunch of automated tests for your code. Watch out on that last ("anything else") case, though -- you want to stop just before an @if it's followed by a ", but not if it's an @to escape a keyword to use as an identifier.
编写实际的正则表达式模式留给读者作为练习,因为编写和测试所有内容需要数小时,而我不愿意免费这样做。(咧嘴笑)但这当然是可行的,如果你对正则表达式有很好的理解(或者有一个像 StackOverflow 这样的地方在你卡住时提出特定问题)并且愿意为你的代码编写一堆自动化测试。不过,请注意最后一个(“其他任何事情”)的情况——您想在@if 后跟 a之前停止",但如果它是 an 则不@要将关键字转义以用作标识符。
回答by Qtax
You could use the function in this answer:
您可以在此答案中使用该功能:
static string StripComments(string code)
{
var re = @"(@(?:""[^""]*"")+|""(?:[^""\n\]+|\.)*""|'(?:[^'\n\]+|\.)*')|//.*|/\*(?s:.*?)\*/";
return Regex.Replace(code, re, "");
}
And then remove empty lines.
然后删除空行。
回答by Ivan Kochurkin
Also see my project for C# code minification: CSharp-Minifier
另请参阅我的 C# 代码缩小项目:CSharp-Minifier
Aside of removing of comments, spaces and and line breaks from code, at present time it's able to compress local variable names and do another minifications.
除了从代码中删除注释、空格和换行符之外,目前它还能够压缩局部变量名称并进行另一次缩小。
回答by Jowe
Use my project to remove most comments. https://github.com/SynAppsDevelopment/CommentRemover
使用我的项目删除大多数评论。https://github.com/SynAppsDevelopment/CommentRemover
It removes all full-line, ending-line, and XML Doc code comments with some limitations for complex comments explained in the readme and source. This is a C# solution with a WinForms interface.
它删除了所有全行、结束行和 XML Doc 代码注释,但对自述文件和源代码中解释的复杂注释有一些限制。这是一个带有 WinForms 界面的 C# 解决方案。

