在 Javascript 中使用 Regex 删除 HTML 注释

Question

提问by rodbv

I've got some ugly HTML generated from Word, from which I want to strip all HTML comments.

我有一些从 Word 生成的难看的 HTML，我想从中删除所有 HTML 注释。

The HTML looks like this:

HTML 如下所示：

<!--[if gte mso 9]><xml> <o:OfficeDocumentSettings> <o:RelyOnVML/> <o:AllowPNG/> </o:OfficeDocumentSettings> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:HyphenationZone>21</w:HyphenationZone> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>NO-BOK</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:EnableOpenTypeKerning/> <w:DontFlipMirrorIndents/> <w:OverrideTableStyleHps/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="&#45;-"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]-->

..and the regex I am using is this one

..我使用的正则表达式就是这个

html = html.replace(/<!--(.*?)-->/gm, "")

But there seems to be no match, the string is unchanged.

但是好像没有匹配，字符串没有变化。

What I am missing?

我缺少什么？

Answer 1

回答by Mike Samuel

The regex //gshould work.

正则表达式//g应该可以工作。

You're going to kill escaping text spansin CDATA blocks.

您将杀死CDATA 块中的转义文本跨度。

E.g.

例如

<script><!-- notACommentHere() --></script>

and literal text in formatted code blocks

和格式化代码块中的文字文本

<xmp>I'm demoing HTML <!-- comments --></xmp>

<textarea><!-- Not a comment either --></textarea>

EDIT:

编辑：

This also won't prevent new comments from being introduced as in

这也不会阻止引入新的评论，如

<!-<!-- A comment -->- not comment text -->

which after one round of that regexp would become

在一轮正则表达式之后将成为

<!-- not comment text -->

If this is a problem, you can escape <that are not part of a comment or tag (complicated to get right) or you can loop and replace as above until the string settles down.

如果这是一个问题，您可以转义<不属于注释或标签的部分（复杂到正确），或者您可以按上述方式循环和替换，直到字符串稳定下来。

Here's a regex that will match comments including psuedo-commentsand unclosed comments per the HTML-5 spec. The CDATA section are only strictly allowed in foreign XML. This suffers the same caveats as above.

这是一个正则表达式，它将根据 HTML-5 规范匹配注释，包括伪注释和未关闭的注释。CDATA 部分只在外部 XML 中被严格允许。这受到与上述相同的警告。

var COMMENT_PSEUDO_COMMENT_OR_LT_BANG = new RegExp(
    '<!--[\s\S]*?(?:-->)?'
    + '<!---+>?'  // A comment with no body
    + '|<!(?![dD][oO][cC][tT][yY][pP][eE]|\[CDATA\[)[^>]*>?'
    + '|<[?][^>]*>?',  // A pseudo-comment
    'g');

Answer 2

回答by rodbv

You should use the /smodifier

你应该使用/s修饰符

html = html.replace(//sg, "")

html = html.replace( //sg, "")

Tested in perl:

在 perl 中测试：

use strict;
use warnings;

my $str = 'hello <!--[if gte mso 9]><xml> <o:OfficeDocumentSettings> <o:RelyOnVML/> <o:AllowPNG/> </o:OfficeDocumentSettings> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:TrackMoves/> <w:TrackFormatting/> <w:HyphenationZone>21</w:HyphenationZone> <w:PunctuationKerning/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>NO-BOK</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:BreakWrappedTables/> <w:SnapToGridInCell/> <w:WrapTextWithPunct/> <w:UseAsianBreakRules/> <w:DontGrowAutofit/> <w:SplitPgBreakAndParaMark/> <w:EnableOpenTypeKerning/> <w:DontFlipMirrorIndents/> <w:OverrideTableStyleHps/> </w:Compatibility> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="&#45;-"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]-->world!';

$str =~ s/<!--.*?-->//sg;
print $str;

Output:
hello world!

输出：
hello world!

Answer 3

回答by Aurielle Perlmann

this works also for multiline - ()|()

这也适用于多行 - ()|()

Answer 4

回答by Sachin Gaur

const regex = /<!--(.*?)-->/gm;
const str = `You will be able to see this text. <!-- You will not be able to see this text. --> You can even comment out things in <!-- the middle of --> a sentence. <!-- Or you can comment out a large number of lines. --> <div class="example-class"> <!-- Another --> thing you can do is put comments after closing tags, to help you find where a particular element ends. <br> (This can be helpful if you have a lot of nested elements.) </div> <!-- /.example-class -->`;
const subst = ``;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);

Answer 5

回答by Zach Bloomquist

This is based off Aurielle Perlmann's answer, it supports all cases (single-line, multi-line, un-terminated, and nested comments):

这是基于Aurielle Perlmann 的回答，它支持所有情况（单行、多行、未终止和嵌套注释）：

/(<!--.*?-->)|(<!--[\S\s]+?-->)|(<!--[\S\s]*?$)/g

https://regex101.com/r/az8Lu6/1

Answer 6

回答by Dmitry Negoda

html = html.replace("(?s)<!--\[if(.*?)\[endif\] *-->", "")

在 Javascript 中使用 Regex 删除 HTML 注释

提问by rodbv

回答by Mike Samuel

回答by rodbv

回答by Aurielle Perlmann

回答by Sachin Gaur

回答by Zach Bloomquist

回答by Dmitry Negoda

相关推荐

最近更新

标签

在 Javascript 中使用 Regex 删除 HTML 注释

提问by rodbv

回答by Mike Samuel

回答by rodbv

回答by Aurielle Perlmann

回答by Sachin Gaur

回答by Zach Bloomquist

回答by Dmitry Negoda

相关推荐

Javascript 这个 package.json 文件需要改变什么才能使用 npm 0.3.0？

Javascript 如何组合（缩小）已编译的 Angular 2 组件？

Javascript return false onclick 锚点没有完全工作

如何在 JavaScript 中为日期添加月份？

相关推荐

最近更新

标签