以跨浏览器的方式使用 Javascript 的 DOMParser 时，如何检测 XML 解析错误？

Question

提问by cspotcode

It seems that all major browsers implement the DOMParser API so that XML can be parsed into a DOM and then queried using XPath, getElementsByTagName, etc...

似乎所有主流浏览器都实现了 DOMParser API，以便可以将 XML 解析为 DOM，然后使用 XPath、getElementsByTagName 等进行查询……

However, detecting parsing errors seems to be trickier. DOMParser.prototype.parseFromStringalways returns a valid DOM. When a parsing error occurs, the returned DOM contains a <parsererror>element, but it's slightly different in each major browser.

然而，检测解析错误似乎更棘手。 DOMParser.prototype.parseFromString总是返回一个有效的 DOM。当发生解析错误时，返回的 DOM 包含一个<parsererror>元素，但在各个主要浏览器中略有不同。

Sample JavaScript:

示例 JavaScript：

xmlText = '<root xmlns="http://default" xmlns:other="http://other"><child><otherr:grandchild/></child></root>';
parser = new DOMParser();
dom = parser.parseFromString(xmlText, 'text/xml');
console.log((new XMLSerializer()).serializeToString(dom));

Result in Opera:

歌剧结果：

DOM's root is a <parsererror>element.

DOM 的根是一个<parsererror>元素。

<?xml version="1.0"?><parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">Error<sourcetext>Unknown source</sourcetext></parsererror>

Result in Firefox:

结果在 Firefox：

DOM's root is a <parsererror>element.

DOM 的根是一个<parsererror>元素。

<?xml-stylesheet href="chrome://global/locale/intl.css" type="text/css"?>
<parsererror xmlns="http://www.mozilla.org/newlayout/xml/parsererror.xml">XML Parsing Error: prefix not bound to a namespace
Location: http://fiddle.jshell.net/_display/
Line Number 1, Column 64:<sourcetext>&lt;root xmlns="http://default" xmlns:other="http://other"&gt;&lt;child&gt;&lt;otherr:grandchild/&gt;&lt;/child&gt;&lt;/root&gt;
---------------------------------------------------------------^</sourcetext></parsererror>

Result in Safari:

Safari 中的结果：

The <root>element parses correctly but contains a nested <parsererror>in a different namespace than Opera and Firefox's <parsererror>element.

该<root>元素解析正确，但包含嵌套<parsererror>在与 Opera 和 Firefox<parsererror>元素不同的命名空间中。

<root xmlns="http://default" xmlns:other="http://other"><parsererror xmlns="http://www.w3.org/1999/xhtml" style="display: block; white-space: pre; border: 2px solid #c77; padding: 0 1em 0 1em; margin: 1em; background-color: #fdd; color: black"><h3>This page contains the following errors:</h3><div style="font-family:monospace;font-size:12px">error on line 1 at column 50: Namespace prefix otherr on grandchild is not defined
</div><h3>Below is a rendering of the page up to the first error.</h3></parsererror><child><otherr:grandchild/></child></root>

Am I missing a simple, cross-browser way of detecting if a parsing error occurred anywhere in the XML document? Or must I query the DOM for each of the possible <parsererror>elements that different browsers might generate?

我是否缺少一种简单的跨浏览器方式来检测 XML 文档中是否发生解析错误？或者我必须为<parsererror>不同浏览器可能生成的每个可能元素查询 DOM ？

Answer 1

采纳答案by cspotcode

This is the best solution I've come up with.

这是我想出的最好的解决方案。

I attempt to parse a string that is intentionally invalid XML and observe the namespace of the resulting <parsererror>element. Then, when parsing actual XML, I can use getElementsByTagNameNSto detect the same kind of <parsererror>element and throw a Javascript Error.

我尝试解析一个故意无效的 XML 字符串并观察结果<parsererror>元素的命名空间。然后，在解析实际 XML 时，我可以使用getElementsByTagNameNS来检测相同类型的<parsererror>元素并抛出 Javascript Error。

// My function that parses a string into an XML DOM, throwing an Error if XML parsing fails
function parseXml(xmlString) {
    var parser = new DOMParser();
    // attempt to parse the passed-in xml
    var dom = parser.parseFromString(xmlString, 'text/xml');
    if(isParseError(dom)) {
        throw new Error('Error parsing XML');
    }
    return dom;
}

function isParseError(parsedDocument) {
    // parser and parsererrorNS could be cached on startup for efficiency
    var parser = new DOMParser(),
        errorneousParse = parser.parseFromString('<', 'text/xml'),
        parsererrorNS = errorneousParse.getElementsByTagName("parsererror")[0].namespaceURI;

    if (parsererrorNS === 'http://www.w3.org/1999/xhtml') {
        // In PhantomJS the parseerror element doesn't seem to have a special namespace, so we are just guessing here :(
        return parsedDocument.getElementsByTagName("parsererror").length > 0;
    }

    return parsedDocument.getElementsByTagNameNS(parsererrorNS, 'parsererror').length > 0;
};

Note that this solution doesn't include the special-casing needed for Internet Explorer. However, things are much more straightforward in IE. XML is parsed with a loadXMLmethod which returns true or false if parsing succeeded or failed, respectively. See http://www.w3schools.com/xml/xml_parser.aspfor an example.

请注意，此解决方案不包括 Internet Explorer 所需的特殊外壳。然而，在 IE 中事情要简单得多。XML 使用一种loadXML方法进行解析，该方法分别在解析成功或失败时返回 true 或 false。有关示例，请参见http://www.w3schools.com/xml/xml_parser.asp。

Answer 2

回答by Rast

When I came here the first time, I upvoted original answer (by cspotcode), however, it does not work in Firefox. The resulting namespace is always "null" because of the structure of the produced document. I made a little research (check the code here). The idea is to use not

当我第一次来到这里时，我赞成原始答案（通过cspotcode），但是，它在 Firefox 中不起作用。由于生成的文档的结构，生成的命名空间始终为“空”。我做了一些研究（在这里查看代码）。这个想法是使用 not

invalidXml.childNodes[0].namespaceURI

but

但

invalidXml.getElementsByTagName("parsererror")[0].namespaceURI

And then select "parsererror" element by namespace as in original answer. However, if you have a valid XML document with <parsererror>tag in same namespace as used by browser, you end up with false alarm. So, here's a heuristic to check if your XML parsed successfully:

然后按原始答案中的命名空间选择“parsererror”元素。但是，如果您有一个有效的 XML 文档，其<parsererror>标签与浏览器使用的命名空间相同，那么您最终会得到误报。因此，这里有一个启发式方法来检查您的 XML 是否成功解析：

function tryParseXML(xmlString) {
    var parser = new DOMParser();
    var parsererrorNS = parser.parseFromString('INVALID', 'text/xml').getElementsByTagName("parsererror")[0].namespaceURI;
    var dom = parser.parseFromString(xmlString, 'text/xml');
    if(dom.getElementsByTagNameNS(parsererrorNS, 'parsererror').length > 0) {
        throw new Error('Error parsing XML');
    }
    return dom;
}

Why not implement exceptions in DOMParser?

为什么不在 DOMParser 中实现异常？

Interesting thing worth mentioning in current context: if you try to get XML file with XMLHttpRequest, parsed DOM will be stored in responseXMLproperty, or null, if XML file content was invalid. Not an exception, not parsererroror another specific indicator. Just null.

在当前上下文中值得一提的有趣事情：如果您尝试使用获取 XML 文件XMLHttpRequest，则解析的 DOM 将存储在responseXML属性中，或者null，如果 XML 文件内容无效。不是例外，不是parsererror或其他特定指标。只是空的。

Answer 3

回答by Cauterite

In current browsers, the DOMParser appears to have two possible behaviours when given malformed XML:

在当前的浏览器中，当给定格式错误的 XML 时，DOMParser 似乎有两种可能的行为：

Discard the resulting document entirely — return a <parsererror>document with error details. Firefox and Edge seem to always take this approach; browsers from the Chrome family do this in mostcases.
Return the resulting document with one extra <parsererror>inserted as the root element's first child. Chrome's parser does this in cases where it's able to produce a root element despite finding errors in the source XML. The inserted <parsererror>may or may not have a namespace. The rest of the document seems to be left intact, including comments, etc. Refer to xml_errors.cc— search for XMLErrors::InsertErrorMessageBlock.

完全丢弃生成的文档 - 返回<parsererror>带有错误详细信息的文档。Firefox 和 Edge 似乎总是采用这种方法；大多数情况下，Chrome 系列的浏览器都会执行此操作。
返回结果文档，其中<parsererror>插入了一个额外的元素作为根元素的第一个子元素。尽管在源 XML 中发现错误，但 Chrome 的解析器会在能够生成根元素的情况下执行此操作。插入的<parsererror>可能有也可能没有命名空间。文档的其余部分似乎保持不变，包括注释等。请参阅xml_errors.cc— 搜索XMLErrors::InsertErrorMessageBlock。

For (1), the way to detect an error is to add a node to the source string, parse it, check whether the node exists in the resulting document, then remove it. As far as I'm aware, the only way to achieve this without potentially affecting the result is to append a processing instruction or comment to the end of the source.

对于（1），检测错误的方法是在源字符串中添加一个节点，解析它，检查结果文档中是否存在该节点，然后将其删除。据我所知，在不影响结果的情况下实现这一点的唯一方法是在源的末尾附加处理指令或注释。

Example:

例子：

let key = `a`+Math.random().toString(32);

let doc = (new DOMParser).parseFromString(src+`<?${key}?>`, `application/xml`);

let lastNode = doc.lastChild;
if (!(lastNode instanceof ProcessingInstruction)
    || lastNode.target !== key
    || lastNode.data !== ``)
{
    /* the XML was malformed */
} else {
    /* the XML was well-formed */
    doc.removeChild(lastNode);
}

If case (2) occurs, the error won't be detected by the above technique, so another step is required.

如果出现情况（2），则上述技术无法检测到错误，因此需要执行另一个步骤。

We can leverage the fact that only one <parsererror>is inserted, even if there are multiple errors found in different places within the source. By parsing the source string again, by this time with a syntax error appended, we can ensure the (2) behaviour is triggered, then check whether the number of <parsererror>elements has changed — if not, the first parseFromStringresult already contained a true <parsererror>.

我们可以利用仅<parsererror>插入一个的事实，即使在源中的不同位置发现多个错误。通过再次解析源字符串，此时附加了一个语法错误，我们可以确保触发了（2）行为，然后检查<parsererror>元素数量是否发生了变化——如果没有，第一个parseFromString结果已经包含一个 true <parsererror>。

Example:

例子：

let errCount = doc.documentElement.getElementsByTagName(`parsererror`).length;
if (errCount !== 0) {
    let doc2 = parser.parseFromString(src+`<?`, `application/xml`);
    if (doc2.documentElement.getElementsByTagName(`parsererror`).length === errCount) {
        /* the XML was malformed */
    }
}

I put together a test page to verify this approach: https://github.com/Cauterite/domparser-tests.

我整理了一个测试页面来验证这种方法：https: //github.com/Cauterite/domparser-tests。

It tests against the entire XML W3C Conformance Test Suite, plus a few extra samples to ensure it can distinguish documents containing <parsererror>elements from actual errors emitted by the DOMParser. Only a handful of test cases are excluded because they contain invalid unicode sequences.

它针对整个XML W3C 一致性测试套件进行测试，加上一些额外的示例，以确保它可以将包含<parsererror>元素的文档与 DOMParser 发出的实际错误区分开来。只有少数测试用例被排除在外，因为它们包含无效的 unicode 序列。

To be clear, it is only testing whether the result is identical to XMLHttpRequest.responseXMLfor a given document.

需要明确的是，它只是测试结果是否与给XMLHttpRequest.responseXML定文档相同。

You can run the tests yourself at https://cauterite.github.io/domparser-tests/index.html, but note that it uses ECMAScript 2018.

您可以在https://cauterite.github.io/domparser-tests/index.html 上自行运行测试，但请注意，它使用 ECMAScript 2018。

At time of writing, all tests pass in recent versions of Firefox, Chrome, Safari and Firefox on Android. Edge and Presto-based Opera should pass since their DOMParsers appear to behave like Firefox's, and current Opera should pass since it's a fork of Chromium.

在撰写本文时，Android 上最新版本的 Firefox、Chrome、Safari 和 Firefox 中的所有测试均通过。Edge 和基于 Presto 的 Opera 应该通过，因为它们的 DOMParsers 表现得像 Firefox 的，而当前的 Opera 应该通过，因为它是 Chromium 的一个分支。

Please let me know if you can find any counter-examples or possible improvements.

如果您能找到任何反例或可能的改进，请告诉我。

For the lazy, here's the complete function:

对于懒人，这里是完整的功能：

const tryParseXml = function(src) {
    /* returns an XMLDocument, or null if `src` is malformed */

    let key = `a`+Math.random().toString(32);

    let parser = new DOMParser;

    let doc = null;
    try {
        doc = parser.parseFromString(
            src+`<?${key}?>`, `application/xml`);
    } catch (_) {}

    if (!(doc instanceof XMLDocument)) {
        return null;
    }

    let lastNode = doc.lastChild;
    if (!(lastNode instanceof ProcessingInstruction)
        || lastNode.target !== key
        || lastNode.data !== ``)
    {
        return null;
    }

    doc.removeChild(lastNode);

    let errElemCount =
        doc.documentElement.getElementsByTagName(`parsererror`).length;
    if (errElemCount !== 0) {
        let errDoc = null;
        try {
            errDoc = parser.parseFromString(
                src+`<?`, `application/xml`);
        } catch (_) {}

        if (!(errDoc instanceof XMLDocument)
            || errDoc.documentElement.getElementsByTagName(`parsererror`).length
                === errElemCount)
        {
            return null;
        }
    }

    return doc;
}

Answer 4

回答by John

My web platform is HTML5 served as XML (application/xhtml+xml) and nothinginvalid is allowed to be saved. I recently determined that I was losing code because it was malformed when switching between the Rich Editor and XML Editor. Catching malformed errors across various rendering engines is not uniform though it's not too difficult either. Gecko will still pollute the consolewith a malformed XML error though all rendering engines will still proceed as desired. Tested in:

我的网络平台是 HTML5 作为 XML (application/xhtml+xml) 并且不允许保存任何无效的内容。我最近确定我丢失了代码，因为它在 Rich Editor 和 XML Editor 之间切换时格式错误。在各种渲染引擎中捕获格式错误的错误并不统一，但也不太困难。console尽管所有渲染引擎仍将按需要进行，但Gecko 仍会使用格式错误的 XML 错误来污染。测试于：

Gecko/Waterfox 56
Presto/Opera 12.1
Trident/IE 11
WebKit/Safari 12.1
Blink/Chrome 55/75

壁虎/水狐 56
Presto/Opera 12.1
三叉戟/IE 11
WebKit/Safari 12.1
闪烁/铬 55/75

I've also included my id_(), entities()and xml_add()functions which go a long way in preventing Unicode characters from themselves being malformed if the database isn't compliant. As of 2019 you'll want to use MariaDB and set the encoding for a database to utf8mb4_unicode_520_ci. My entities()function is veryaggressive (encoding very low numeric entities in Unicode). Rick Jameshas a really in-depth MySQL utf8 Collationscomparison page that is obviously compatible with MariaDB. At some point 520 willbe superseded so I'd recommend adding an annual (yearly) reminder to check for what the highest encoding available is.

我还包含了我的id_(),entities()和xml_add()函数，如果数据库不兼容，这些函数在防止 Unicode 字符本身格式错误方面大有帮助。从 2019 年开始，您将需要使用 MariaDB 并将数据库的编码设置为utf8mb4_unicode_520_ci. 我的entities()功能非常激进（在 Unicode 中编码非常低的数字实体）。Rick James有一个非常深入的MySQL utf8 Collations比较页面，它显然与 MariaDB 兼容。在某些时候 520将被取代，因此我建议添加年度（每年）提醒以检查可用的最高编码是什么。

While all of this will cover almosteverything when you import XML to the DOM browsers will notcheck for duplicate idattribute/values! On my platform I simply delete the page layer in most cases. I also make note if a page being imported has the same idtwo or more times. If your code contains the same idtwice then the browser will either choose for the first or second instance. This could be verymaddening to deal with if you think some otherpart of your code is bugged. Strict is alwayssuperior to loose code and pure JavaScript is always superior to frameworks and libraries.

当您将 XML 导入到 DOM 浏览器时，所有这些将涵盖几乎所有内容，但不会检查重复的id属性/值！在我的平台上，大多数情况下我只是删除页面层。如果导入的页面具有相同的id两次或更多次，我还会记下。如果您的代码包含相同的id两次，则浏览器将选择第一个或第二个实例。如果您认为代码的其他部分有问题，处理起来可能会非常令人抓狂。严格总是优于松散的代码，纯 JavaScript 总是优于框架和库。

try
{
 if (!id_('xml_temp')) {xml_add('after', 'editor_rich', '<div class="hidden" id="xml_temp"></div>');}
 var f = id_('xml_temp').appendChild(new DOMParser().parseFromString(entities('<div xmlns="http://www.w3.org/1999/xhtml">'+id_('post_xml').value+'</div>'),'application/xml').childNodes[0]);
}
catch (err) {var f = false}

if (!f || f.childNodes.length == 0 || f.childNodes[0].nodeName == 'parsererror') {dialog.alert(error);}
else
{
 //Proceed with compliant XML.
}

The prerequisites my code above uses from my platform.

我上面的代码在我的平台上使用的先决条件。

function id_(id) {return (document.getElementById(id)) ? document.getElementById(id) : false;}


function entities(s)
{
 var i = 0;
 var r = '';

 while (i<=s.length)
 {
  if (!isNaN(s.charCodeAt(i)))
  {
   if (s.charCodeAt(i)<127) {r += s.charAt(i);}
   else {r += '&#'+s.charCodeAt(i)+';';}
  }
  i++;
 }

 return r;
}

function xml_add(pos, e, xml)
{
 e = (typeof e == 'string' && id_(e)) ? id_(e) : e;

 if (e.nodeName)
 {
  if (pos=='after') {e.parentNode.insertBefore(document.importNode(new DOMParser().parseFromString(xml,'application/xml').childNodes[0],true),e.nextSibling);}
  else if (pos=='before') {e.parentNode.insertBefore(document.importNode(new DOMParser().parseFromString(xml,'application/xml').childNodes[0],true),e);}
  else if (pos=='inside') {e.appendChild(document.importNode(new DOMParser().parseFromString(xml,'application/xml').childNodes[0],true));}
  else if (pos=='replace') {e.parentNode.replaceChild(document.importNode(new DOMParser().parseFromString(xml,'application/xml').childNodes[0],true),e);}
  //Add fragment and have it returned.
 }
}

以跨浏览器的方式使用 Javascript 的 DOMParser 时，如何检测 XML 解析错误？

提问by cspotcode

采纳答案by cspotcode

回答by Rast

回答by Cauterite

回答by John

相关推荐

最近更新

标签

以跨浏览器的方式使用 Javascript 的 DOMParser 时，如何检测 XML 解析错误？

提问by cspotcode

采纳答案by cspotcode

回答by Rast

回答by Cauterite

回答by John

相关推荐

Rails - 如何向用 javascript 创建的表单添加 CSRF 保护？

Javascript 轻松清理 sinon 存根

Javascript 如何使用 JQuery 删除 HTML 字符串中的所有“脚本”标签？

Javascript 如何在 Chrome 扩展中使用内容脚本文件注入 CSS？

相关推荐

最近更新

标签