检查 HTML 片段是否对 Javascript 有效

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10026626/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 23:33:10  来源:igfitidea点击:

Check if HTML snippet is valid with Javascript

javascripthtmlvalidation

提问by Ixx

I need a reliable Javascript library / function to check if a HTML snippet is valid that I can call from my code. For example, it should check that opened tags and quotation marks are closed, nesting is correct, etc.

我需要一个可靠的 Javascript 库/函数来检查我可以从我的代码中调用的 HTML 片段是否有效。例如,应该检查打开的标签和引号是否关闭,嵌套是否正确等。

I don't want the validation to fail because something is not 100% standard (but would work anyways).

我不希望验证失败,因为某些东西不是 100% 标准的(但无论如何都可以)。

回答by mikemaccana

Update: this answer is limited - please see the edit below.

更新:这个答案是有限的 - 请参阅下面的编辑。

Expanding on @kolink's answer, I use:

扩展@kolink的答案,我使用:

var checkHTML = function(html) {
  var doc = document.createElement('div');
  doc.innerHTML = html;
  return ( doc.innerHTML === html );
}

I.e., we create a temporary div with the HTML. In order to do this, the browser will create a DOM tree based on the HTML string, which may involve closing tags etc.

即,我们使用 HTML 创建一个临时 div。为此,浏览器将根据 HTML 字符串创建一个 DOM 树,这可能涉及结束标记等。

Comparing the div's HTML contents with the original HTML will tell us if the browser needed to change anything.

将 div 的 HTML 内容与原始 HTML 进行比较将告诉我们浏览器是否需要更改任何内容。

checkHTML('<a>hell<b>o</b>')

Returns false.

返回假。

checkHTML('<a>hell<b>o</b></a>')

Returns true.

返回真。

Edit:As @Quentin notes below, this is excessively strictfor a variety of reasons: browsers will often fix omitted closing tags, even if closing tags are optional for that tag. Eg:

编辑:正如下面@Quentin 所指出的,由于各种原因,这过于严格:浏览器通常会修复省略的结束标签,即使结束标签对该标签是可选的。例如:

<p>one para
<p>second para

...is considered valid (since Ps are allowed to omit closing tags) but checkHTMLwill return false. Browsers will also normalise tag cases, and alter white space. You should be aware of these limits when deciding to use this approach.

...被认为是有效的(因为允许 P 省略结束标记)但checkHTML会返回 false。浏览器还将规范化标签大小写,并改变空白。在决定使用这种方法时,您应该了解这些限制。

回答by Niet the Dark Absol

Well, this code:

嗯,这段代码:

function tidy(html) {
    var d = document.createElement('div');
    d.innerHTML = html;
    return d.innerHTML;
}

This will "correct" malformed HTML to the best of the browser's ability. If that's helpful to you, it's a lot easier than trying to validate HTML.

这将尽浏览器的最大能力“纠正”格式错误的 HTML。如果这对您有帮助,那么它比尝试验证 HTML 容易得多。

回答by foobored

None of the solutions presented so far is doing a good job in answering the original question, especially when it comes to

迄今为止提出的解决方案都没有很好地回答原始问题,尤其是在涉及

I don't want the validation to fail because something is not 100% standard (but would work anyways).

我不希望验证失败,因为某些东西不是 100% 标准的(但无论如何都可以)。

tldr >> check the JSFiddle

tldr >> 检查JSFiddle

So I used the input of the answers and comments on this topic and created a method that does the following:

所以我使用了关于这个主题的答案和评论的输入,并创建了一个执行以下操作的方法:

  • checks html string tag by tag if valid
  • trys to render html string
  • compares theoretically to be created tag count with actually rendered html dom tag count
  • if checked 'strict', <br/>and empty attribute normalizations =""are not ignored
  • compares rendered innerHTML with given html string (while ignoring whitespaces and quotes)
  • 如果有效,则逐个标签检查 html 字符串
  • 尝试呈现 html 字符串
  • 将理论上创建的标签计数与实际呈现的 html dom 标签计数进行比较
  • 如果选中“严格”,则不会忽略<br/>空属性规范化=""
  • 将渲染的 innerHTML 与给定的 html 字符串进行比较(同时忽略空格和引号)

Returns

退货

  • trueif rendered html is same as given html string
  • falseif one of the checks fails
  • normalized html stringif rendered html seems valid but is not equal to given html string
  • 如果呈现的 html 与给定的 html 字符串相同,则为true
  • 如果其中一项检查失败,则为false
  • 如果呈现的 html 看起来有效但不等于给定的 html 字符串,则归一化html 字符串

normalizedmeans, that on rendering, the browser ignores or repairs sometimes specific parts of the input (like adding missing closing-tags for <p>and converts others (like single to double quotes or encoding of ampersands). Making a distinction between "failed" and "normalized" allows to flag the content to the user as "this will not be rendered as you might expect it".

规范化意味着,在渲染时,浏览器有时会忽略或修复输入的特定部分(例如为<p>其他部分添加缺少的结束标记并转换其他部分(例如单引号到双引号或与号的编码)。区分“失败”和“规范化”允许向用户标记内容为“这不会像您期望的那样呈现”。

Most times normalizedgives back an only slightly altered version of the original html string - still, sometimes the result is quite different. So this should be used e.g. to flag user-input for further review before saving it to a db or rendering it blindly. (see JSFiddlefor examples of normalization)

大多数情况下,归一化返回原始 html 字符串的仅略有更改的版本 - 尽管如此,有时结果却大不相同。因此,这应该用于例如标记用户输入以供进一步,然后再将其保存到数据库或盲目渲染。(有关规范化的示例,请参阅JSFiddle

The checks take the following exceptions into consideration

检查考虑了以下例外情况

  • ignoring of normalization of single quotes to double quotes
  • imageand other tags with a srcattribute are 'disarmed' during rendering
  • (if non strict) ignoring of <br/>>> <br>conversion
  • (if non strict) ignoring of normalization of empty attributes (<p disabled>>> <p disabled="">)
  • encoding of initially un-encoded ampersands when reading .innerHTML, e.g. in attribute values
  • 忽略单引号到双引号的规范化
  • image和其他具有src属性的标签在渲染期间被“解除武装”
  • (如果不严格)忽略<br/>>><br>转换
  • (如果不严格)忽略空属性的规范化(<p disabled>>> <p disabled="">
  • 读取时对最初未编码的&符号进行编码.innerHTML,例如在属性值中

.

.

function simpleValidateHtmlStr(htmlStr, strictBoolean) {
  if (typeof htmlStr !== "string")
    return false;

  var validateHtmlTag = new RegExp("<[a-z]+(\s+|\"[^\"]*\"\s?|'[^']*'\s?|[^'\">])*>", "igm"),
    sdom = document.createElement('div'),
    noSrcNoAmpHtmlStr = htmlStr
      .replace(/ src=/, " svhs___src=") // disarm src attributes
      .replace(/&amp;/igm, "#svhs#amp##"), // 'save' encoded ampersands
    noSrcNoAmpIgnoreScriptContentHtmlStr = noSrcNoAmpHtmlStr
      .replace(/\n\r?/igm, "#svhs#nl##") // temporarily remove line breaks
      .replace(/(<script[^>]*>)(.*?)(<\/script>)/igm, "") // ignore script contents
      .replace(/#svhs#nl##/igm, "\n\r"),  // re-add line breaks
    htmlTags = noSrcNoAmpIgnoreScriptContentHtmlStr.match(/<[a-z]+[^>]*>/igm), // get all start-tags
    htmlTagsCount = htmlTags ? htmlTags.length : 0,
    tagsAreValid, resHtmlStr;


  if(!strictBoolean){
    // ignore <br/> conversions
    noSrcNoAmpHtmlStr = noSrcNoAmpHtmlStr.replace(/<br\s*\/>/, "<br>")
  }

  if (htmlTagsCount) {
    tagsAreValid = htmlTags.reduce(function(isValid, tagStr) {
      return isValid && tagStr.match(validateHtmlTag);
    }, true);

    if (!tagsAreValid) {
      return false;
    }
  }


  try {
    sdom.innerHTML = noSrcNoAmpHtmlStr;
  } catch (err) {
    return false;
  }

  // compare rendered tag-count with expected tag-count
  if (sdom.querySelectorAll("*").length !== htmlTagsCount) {
    return false;
  }

  resHtmlStr = sdom.innerHTML.replace(/&amp;/igm, "&"); // undo '&' encoding

  if(!strictBoolean){
    // ignore empty attribute normalizations
    resHtmlStr = resHtmlStr.replace(/=""/, "")
  }

  // compare html strings while ignoring case, quote-changes, trailing spaces
  var
    simpleIn = noSrcNoAmpHtmlStr.replace(/["']/igm, "").replace(/\s+/igm, " ").toLowerCase().trim(),
    simpleOut = resHtmlStr.replace(/["']/igm, "").replace(/\s+/igm, " ").toLowerCase().trim();
  if (simpleIn === simpleOut)
    return true;

  return resHtmlStr.replace(/ svhs___src=/igm, " src=").replace(/#svhs#amp##/, "&amp;");
}

Here you can find it in a JSFiddle https://jsfiddle.net/abernh/twgj8bev/, together with different test-cases, including

在这里,您可以在 JSFiddle https://jsfiddle.net/abernh/twgj8bev/ 中找到它,以及不同的测试用例,包括

"<a href='blue.html id='green'>missing attribute quotes</a>" // FAIL
"<a>hell<B>o</B></a>"                                        // PASS
'<a href="test.html">hell<b>o</b></a>'                       // PASS
'<a href=test.html>hell<b>o</b></a>',                        // PASS
"<a href='test.html'>hell<b>o</b></a>",                      // PASS
'<ul><li>hell</li><li>hell</li></ul>',                       // PASS
'<ul><li>hell<li>hell</ul>',                                 // PASS
'<div ng-if="true && valid">ampersands in attributes</div>'  // PASS

.

.

回答by Tarun

function validHTML(html) {
  var openingTags, closingTags;

  html        = html.replace(/<[^>]*\/\s?>/g, '');      // Remove all self closing tags
  html        = html.replace(/<(br|hr|img).*?>/g, '');  // Remove all <br>, <hr>, and <img> tags
  openingTags = html.match(/<[^\/].*?>/g) || [];        // Get remaining opening tags
  closingTags = html.match(/<\/.+?>/g) || [];           // Get remaining closing tags

  return openingTags.length === closingTags.length ? true : false;
}

var htmlContent = "<p>your html content goes here</p>" // Note: String without any html tag will consider as valid html snippet. If it's not valid in your case, in that case you can check opening tag count first.

if(validHTML(htmlContent)) {
  alert('Valid HTML')
}
else {
  alert('Invalid HTML');
}

回答by Leviscus Tempris

Using pure JavaScript you may check if an element exists using the following function:

使用纯 JavaScript,您可以使用以下函数检查元素是否存在:

if (typeof(element) != 'undefined' && element != null)

Using the following code we can test this in action:

使用以下代码,我们可以在实际中对此进行测试:

HTML:

HTML:

<input type="button" value="Toggle .not-undefined" onclick="toggleNotUndefined()">
<input type="button" value="Check if .not-undefined exists" onclick="checkNotUndefined()">
<p class=".not-undefined"></p>

CSS:

CSS:

p:after {
    content: "Is 'undefined'";
    color: blue;
}
p.not-undefined:after {
    content: "Is not 'undefined'";
    color: red;
}

JavaScript:

JavaScript:

function checkNotUndefined(){
    var phrase = "not ";
    var element = document.querySelector('.not-undefined');
    if (typeof(element) != 'undefined' && element != null) phrase = "";
    alert("Element of class '.not-undefined' does "+phrase+"exist!");
    // $(".thisClass").length checks to see if our elem exists in jQuery
}

function toggleNotUndefined(){
    document.querySelector('p').classList.toggle('not-undefined');
}

It can be found on JSFiddle.

它可以在JSFiddle找到

回答by David Castro

function isHTML(str)
{
 var a = document.createElement('div');
 a.innerHTML = str;
 for(var c= a.ChildNodes, i = c.length; i--)
 {
    if (c[i].nodeType == 1) return true;
 }
return false;
}

Good Luck!

祝你好运!

回答by Eugene Kaurov

It depends on js-library which you use.

这取决于您使用的 js-library。

Html validatod for node.js https://www.npmjs.com/package/html-validator

node.js 的 HTML 验证https://www.npmjs.com/package/html-validator

Html validator for jQuery https://api.jquery.com/jquery.parsehtml/

jQuery 的 HTML 验证器https://api.jquery.com/jquery.parsehtml/

But, as mentioned before, using the browser to validate broken HTML is a great idea:

但是,如前所述,使用浏览器验证损坏的 HTML 是一个好主意:

function tidy(html) {
    var d = document.createElement('div');
    d.innerHTML = html;
    return d.innerHTML;
}