用于匹配/替换 JavaScript 注释的 RegEx(多行和内联)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5989315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
RegEx for match/replacing JavaScript comments (both multiline and inline)
提问by metaforce
I need to remove all JavaScript comments from a JavaScript source using the JavaScript RegExp object.
我需要使用 JavaScript RegExp 对象从 JavaScript 源中删除所有 JavaScript 注释。
What I need is the pattern for the RegExp.
我需要的是 RegExp 的模式。
So far, I've found this:
到目前为止,我发现了这个:
compressed = compressed.replace(/\/\*.+?\*\/|\/\/.*(?=[\n\r])/g, '');
This pattern works OK for:
这种模式适用于:
/* I'm a comment */
or for:
或用于:
/*
* I'm a comment aswell
*/
But doesn't seem to work for the inline:
但似乎不适用于内联:
// I'm an inline comment
I'm not quite an expert for RegEx and it's patterns, so I need help.
我不是正则表达式及其模式的专家,所以我需要帮助。
Also, I' would like to have a RegEx pattern which would remove all those HTML-like comments.
另外,我想要一个 RegEx 模式来删除所有那些类似 HTML 的注释。
<!-- HTML Comment //--> or <!-- HTML Comment -->
And also those conditional HTML comments, which can be found in various JavaScript sources.
还有那些条件 HTML 注释,可以在各种 JavaScript 源中找到。
Thanks.
谢谢。
采纳答案by AabinGunz
try this,
尝试这个,
(\/\*[\w\'\s\r\n\*]*\*\/)|(\/\/[\w\s\']*)|(\<![\-\-\s\w\>\/]*\>)
should work :)
应该管用 :)
回答by Ryan Wheale
NOTE: Regex is not a lexer or a parser. If you have some weird edge case where you need some oddly nested comments parsed out of a string, use a parser. For the other 98% of the time this regex should work.
注意:正则表达式不是词法分析器或解析器。如果您有一些奇怪的边缘情况,需要从字符串中解析出一些奇怪的嵌套注释,请使用解析器。对于其他 98% 的时间,这个正则表达式应该可以工作。
I had pretty complex block comments going on with nested asterisks, slashes, etc. The regular expression at the following site worked like a charm:
我有非常复杂的块注释,带有嵌套的星号、斜杠等。以下站点的正则表达式就像一个魅力:
http://upshots.org/javascript/javascript-regexp-to-remove-comments
(see below for original)
http://upshots.org/javascript/javascript-regexp-to-remove-comments
(原文见下文)
Some modifications have been made, but the integrity of the original regex has been preserved. In order to allow certain double-slash (//
) sequences (such as URLs), you must use back reference $1
in your replacement value instead of an empty string. Here it is:
进行了一些修改,但保留了原始正则表达式的完整性。为了允许某些双斜杠 ( //
) 序列(例如 URL),您必须$1
在替换值中使用反向引用而不是空字符串。这里是:
/\/\*[\s\S]*?\*\/|([^\:]|^)\/\/.*$/gm
// JavaScript:
// source_string.replace(/\/\*[\s\S]*?\*\/|([^\:]|^)\/\/.*$/gm, '');
// PHP:
// preg_replace("/\/\*[\s\S]*?\*\/|([^\:]|^)\/\/.*$/m", "", $source_string);
DEMO:https://regex101.com/r/B8WkuX/1
演示:https : //regex101.com/r/B8WkuX/1
FAILING USE CASES:There are a few edge cases where this regex fails. An ongoing list of those cases is documented in this public gist. Please update the gist if you can find other cases.
失败的使用案例:有一些边缘情况下,此正则表达式失败。本公开要点中记录了这些案例的持续清单。如果您能找到其他案例,请更新要点。
...and if you alsowant to remove <!-- html comments -->
use this:
......如果你也想删除<!-- html comments -->
使用此:
/\/\*[\s\S]*?\*\/|([^\:]|^)\/\/.*|<!--[\s\S]*?-->$/
(original - for historical reference only)
(原文 - 仅供历史参考)
// DO NOT USE THIS - SEE ABOVE
/(\/\*([\s\S]*?)\*\/)|(\/\/(.*)$)/gm
回答by wolffer-east
I have been putting togethor an expression that needs to do something similar.
the finished product is:
我一直在把一个需要做类似事情的表达式放在一起。
成品是:
/(?:((["'])(?:(?:\\)|\|(?!\)\|(?!).|[\n\r])*)|(\/\*(?:(?!\*\/).|[\n\r])*\*\/)|(\/\/[^\n\r]*(?:[\n\r]+|$))|((?:=|:)\s*(?:\/(?:(?:(?!\*\/).)|\\|\\/|[^\]\[(?:\\|\\]|[^]])+\])+\/))|((?:\/(?:(?:(?!\*\/).)|\\|\\/|[^\]\[(?:\\|\\]|[^]])+\])+\/)[gimy]?\.(?:exec|test|match|search|replace|split)\()|(\.(?:exec|test|match|search|replace|split)\((?:\/(?:(?:(?!\*\/).)|\\|\\/|[^\]\[(?:\\|\\]|[^]])+\])+\/))|(<!--(?:(?!-->).)*-->))/g
Scary right?
很吓人吧?
To break it down, the first part matches anything within single or double quotation marks
This is necessary to avoid matching quoted strings
分解一下,第一部分匹配单引号或双引号内的任何内容
这是避免匹配带引号的字符串所必需的
((["'])(?:(?:\\)|\|(?!\)\|(?!).|[\n\r])*)
the second part matches multiline comments delimited by /* */
第二部分匹配由 /* */ 分隔的多行注释
(\/\*(?:(?!\*\/).|[\n\r])*\*\/)
The third part matches single line comments starting anywhere in the line
第三部分匹配从行中任意位置开始的单行注释
(\/\/[^\n\r]*(?:[\n\r]+|$))
The fourth through sixth parts matchs anything within a regex literal
This relies on a preceding equals sign or the literal being before or after a regex call
第四到第六部分匹配正则表达式中的任何内容
这依赖于前面的等号或正则表达式调用之前或之后的文字
((?:=|:)\s*(?:\/(?:(?:(?!\*\/).)|\\|\\/|[^\]\[(?:\\|\\]|[^]])+\])+\/))
((?:\/(?:(?:(?!\*\/).)|\\|\\/|[^\]\[(?:\\|\\]|[^]])+\])+\/)[gimy]?\.(?:exec|test|match|search|replace|split)\()
(\.(?:exec|test|match|search|replace|split)\((?:\/(?:(?:(?!\*\/).)|\\|\\/|[^\]\[(?:\\|\\]|[^]])+\])+\/))
and the seventh which I originally forgot removes the html comments
我最初忘记的第七个删除了html评论
(<!--(?:(?!-->).)*-->)
I had an issue with my dev environment issuing errors for a regex that broke a line, so I used the following solution
我的开发环境遇到了一个问题,它为一个断线的正则表达式发出错误,所以我使用了以下解决方案
var ADW_GLOBALS = new Object
ADW_GLOBALS = {
quotations : /((["'])(?:(?:\\)|\|(?!\)\|(?!).|[\n\r])*)/,
multiline_comment : /(\/\*(?:(?!\*\/).|[\n\r])*\*\/)/,
single_line_comment : /(\/\/[^\n\r]*[\n\r]+)/,
regex_literal : /(?:\/(?:(?:(?!\*\/).)|\\|\\/|[^\]\[(?:\\|\\]|[^]])+\])+\/)/,
html_comments : /(<!--(?:(?!-->).)*-->)/,
regex_of_doom : ''
}
ADW_GLOBALS.regex_of_doom = new RegExp(
'(?:' + ADW_GLOBALS.quotations.source + '|' +
ADW_GLOBALS.multiline_comment.source + '|' +
ADW_GLOBALS.single_line_comment.source + '|' +
'((?:=|:)\s*' + ADW_GLOBALS.regex_literal.source + ')|(' +
ADW_GLOBALS.regex_literal.source + '[gimy]?\.(?:exec|test|match|search|replace|split)\(' + ')|(' +
'\.(?:exec|test|match|search|replace|split)\(' + ADW_GLOBALS.regex_literal.source + ')|' +
ADW_GLOBALS.html_comments.source + ')' , 'g'
);
changed_text = code_to_test.replace(ADW_GLOBALS.regex_of_doom, function(match, , , , , , , , , offset, original){
if (typeof != 'undefined') return ;
if (typeof != 'undefined') return ;
if (typeof != 'undefined') return ;
if (typeof != 'undefined') return ;
return '';
}
This returns anything captured by the quoted string text and anything found in a regex literal intact but returns an empty string for all the comment captures.
这将返回由引用的字符串文本捕获的任何内容以及在正则表达式中找到的任何内容完整无缺,但为所有注释捕获返回一个空字符串。
I know this is excessive and rather difficult to maintain but it does appear to work for me so far.
我知道这太过分了,而且很难维护,但到目前为止它似乎对我有用。
回答by aMarCruz
This works for almost all cases:
这适用于几乎所有情况:
var RE_BLOCKS = new RegExp([
/\/(\*)[^*]*\*+(?:[^*\/][^*]*\*+)*\//.source, // : multi-line comment
/\/(\/)[^\n]*$/.source, // single-line comment
/"(?:[^"\]*|\[\S\s])*"|'(?:[^'\]*|\[\S\s])*'/.source, // - string, don't care about embedded eols
/(?:[$\w\)\]]|\+\+|--)\s*\/(?![*\/])/.source, // - division operator
/\/(?=[^*\/])[^[/\]*(?:(?:\[(?:\.|[^\]\]*)*\]|\.)[^[/\]*)*?\/[gim]*/.source
].join('|'), // - regex
'gm' // note: global+multiline with replace() need test
);
// remove comments, keep other blocks
function stripComments(str) {
return str.replace(RE_BLOCKS, function (match, mlc, slc) {
return mlc ? ' ' : // multiline comment (replace with space)
slc ? '' : // single/multiline comment
match; // divisor, regex, or string, return as-is
});
}
The code is based on regexes from jspreproc, I wrote this tool for the riot compiler.
代码基于 jspreproc 中的正则表达式,我为riot compiler编写了这个工具。
回答by Shobhit Sharma
In plain simple JS regex, this:
在简单的 JS 正则表达式中,这个:
my_string_or_obj.replace(/\/\*[\s\S]*?\*\/|([^:]|^)\/\/.*$/gm, ' ')
回答by Aurielle Perlmann
回答by vantrung -cuncon
Simple regex ONLY for multi-lines:
仅用于多行的简单正则表达式:
/\*((.|\n)(?!/))+\*/
回答by John Smith
If you click on the link below you find a comment removal script written in regex.
如果您点击下面的链接,您会发现一个用正则表达式编写的评论删除脚本。
These are 112 lines off code that work together also works with mootools and Joomla and drupal and other cms websites. Tested it on 800.000 lines of code and comments. works fine. This one also selects multiple parenthetical like ( abc(/nn/('/xvx/'))"// testing line") and comments that are between colons and protect them. 23-01-2016..! This is the code with the comments in it.!!!!
这些是 112 行的代码,它们可以与 mootools、Joomla、drupal 和其他 cms 网站一起工作。在 800.000 行代码和注释上对其进行了测试。工作正常。这个还选择了多个括号,如 (abc(/ nn/('/ xvx/'))"// testing line") 和冒号之间的注释并保护它们。23-01-2016..!这是带有注释的代码。!!!
回答by Nolo
This is late to be of much use to the original question, but maybe it will help someone.
这对原始问题很有用已经很晚了,但也许它会帮助某人。
Based on @Ryan Wheale's answer, I've found this to work as a comprehensive capture to ensure that matches exclude anything found inside a string literal.
基于@Ryan Wheale 的回答,我发现这可以作为一个全面的捕获来确保匹配排除在字符串文字中找到的任何内容。
/(?:\r\n|\n|^)(?:[^'"])*?(?:'(?:[^\r\n\']|\'|[\]{2})*'|"(?:[^\r\n\"]|\"|[\]{2})*")*?(?:[^'"])*?(\/\*(?:[\s\S]*?)\*\/|\/\/.*)/g
The last group (all others are discarded) is based on Ryan's answer. Example here.
最后一组(所有其他人都被丢弃)基于 Ryan 的回答。示例在这里。
This assumes code is well structured and valid javascript.
这假设代码结构良好并且是有效的 javascript。
Note: this has not been tested on poorly structured code which may or may not be recoverable depending on the javascript engine's own heuristics.
注意:这尚未在结构不佳的代码上进行测试,根据 javascript 引擎自己的启发式方法,这些代码可能会或可能不会恢复。
Note: this should hold for valid javascript < ES6, however, ES6 allows multi-line string literals, in which case this regex will almost certainly break, though that case has not been tested.
注意:这应该适用于有效的 javascript < ES6,但是,ES6 允许多行字符串文字,在这种情况下,这个正则表达式几乎肯定会中断,尽管这种情况尚未经过测试。
However, it is still possible to match something that looks like a comment inside a regex literal (see comments/results in the Example above).
但是,仍然可以在正则表达式文字中匹配看起来像注释的内容(请参阅上面示例中的注释/结果)。
I use the above capture after replacing all regex literals using the following comprehensive capture extracted from es5-lexer hereand here, as referenced in Mike Samuel's answer to this question:
我使用从 es5-lexer here和here 中提取的以下全面捕获替换所有正则表达式文字后使用上述捕获,如Mike Samuel对此问题的回答中所述:
/(?:(?:break|case|continue|delete|do|else|finally|in|instanceof|return|throw|try|typeof|void|[+]|-|[.]|[/]|,|[*])|[!%&(:;<=>?[^{|}~])?(\/(?![*/])(?:[^\\[/\r\n\u2028\u2029]|\[(?:[^\]\\r\n\u2028\u2029]|\(?:[^\r\n\u2028\u2029ux]|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}))+\]|\(?:[^\r\n\u2028\u2029ux]|u[0-9A-Fa-f]{4}|x[0-9A-Fa-f]{2}))*\/[gim]*)/g
For completeness, see also this trivial caveat.
回答by pery mimon
2019:
2019年:
All the answer come with fit fall so I write something that just work, try it out:
所有的答案都伴随着fit fall,所以我写了一些有用的东西,试试看:
function scriptComment(code){
const savedText = [];
return code
.replace(/(['"`]).*?/gm,function (match) {
var i = savedText.push(match);
return (i-1)+'###';
})
// remove // comments
.replace(/\/\/.*/gm,'')
// now extract all regex and save them
.replace(/\/[^*\n].*\//gm,function (match) {
var i = savedText.push(match);
return (i-1)+'###';
})
// remove /* */ comments
.replace(/\/\*[\s\S]*\*\//gm,'')
// remove <!-- --> comments
.replace(/<!--[\s\S]*-->/gm, '')
.replace(/\d+###/gm,function(match){
var i = Number.parseInt(match);
return savedText[i];
})
}
var cleancode = scriptComment(scriptComment.toString())
console.log(cleancode)
old answer: not working on sample code like this :
旧答案:不处理这样的示例代码:
// won't execute the creative code ("Can't execute code form a freed script"),
navigator.userAgent.match(/\b(MSIE |Trident.*?rv:|Edge\/)(\d+)/);
function scriptComment(code){
const savedText = [];
return code
// extract strings and regex
.replace(/(['"`]).*?/gm,function (match) {
savedText.push(match);
return '###';
})
// remove // comments
.replace(/\/\/.*/gm,'')
// now extract all regex and save them
.replace(/\/[^*\n].*\//gm,function (match) {
savedText.push(match);
return '###';
})
// remove /* */ comments
.replace(/\/\*[\s\S]*\*\//gm,'')
// remove <!-- --> comments
.replace(/<!--[\s\S]*-->/gm, '')
/*replace \ with \ so we not lost \b && \t*/
.replace(/###/gm,function(){
return savedText.shift();
})
}
var cleancode = scriptComment(scriptComment.toString())
console.log(cleancode)