XML 中的 HTML。我应该使用 CDATA 还是对 HTML 进行编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1398571/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 12:43:34  来源:igfitidea点击:

Html inside XML. Should I use CDATA or encode the HTML

xmlcdatahtml-encode

提问by alberto

I am using XML to share HTML content. AFAIK, I could embed the HTML either by:

我正在使用 XML 来共享 HTML 内容。AFAIK,我可以通过以下方式嵌入 HTML:

  • Encoding it: I don't know if it is completely safe to use. And I would have to decode it again.

  • Use CDATA sections: I could still have problems if the content contains the closing tag "]]>" and certain hexadecimal characters, I believe. On the other hand, the XML parser would extract the info transparently for me.

  • 编码它:我不知道它是否完全安全使用。我将不得不再次解码它。

  • 使用 CDATA 部分:我相信,如果内容包含结束标记“]]>”和某些十六进制字符,我仍然会遇到问题。另一方面,XML 解析器会为我透明地提取信息。

Which option should I choose?

我应该选择哪个选项?

UPDATE: The xml will be created in java and passed as a string to a .net web service, were it will be parsed back. Therefore I need to be able to export the xml as a string and load it using "doc.LoadXml(xmlString);"

更新:xml 将在 java 中创建并作为字符串传递给 .net web 服务,如果它将被解析回来。因此,我需要能够将 xml 导出为字符串并使用“doc.LoadXml(xmlString);”加载它

采纳答案by Ned Batchelder

The two options are almost exactly the same. Here are your two choices:

这两个选项几乎完全相同。这是您的两个选择:

<html>This is &lt;b&gt;bold&lt;/b&gt;</html>

<html><![CDATA[This is <b>bold</b>]]></html>

In both cases, you have to check your string for special characters to be escaped. Lots of people pretend that CDATA strings don't need any escaping, but as you point out, you have to make sure that "]]>" doesn't slip in unescaped.

在这两种情况下,您都必须检查字符串中是否有要转义的特殊字符。许多人假装 CDATA 字符串不需要任何转义,但正如您指出的那样,您必须确保“]]>”不会在未转义的情况下滑入。

In both cases, the XML processor will return your string to you decoded.

在这两种情况下,XML 处理器都会将您的字符串返回给您解码。

回答by Quentin

CDATA is easier to read by eye while encoded content can have end of CDATA markers in it safely — but you don't have to care. Just use an XML library and stop worrying about it. Then all you have to say is "Put this text inside this element" and the library will either encode it or wrap it in CDATA markers.

CDATA 更​​容易通过肉眼阅读,而编码内容可以安全地包含 CDATA 结束标记 — 但您不必关心。只需使用 XML 库,无需担心。然后您只需说“将此文本放入此元素中”,库将对其进行编码或将其包装在 CDATA 标记中。

回答by Mohamed

CDATA for simplicity.

CDATA 为简单起见。

回答by tony gil

If you use CDATA, then you must decode it correctly (textContent, value and innerHTML are methods that will NOT return the proper data).

如果您使用 CDATA,那么您必须正确解码它(textContent、value 和 innerHTML 是不会返回正确数据的方法)。

let us say that you use an xml structure similar to this:

假设您使用了类似于此的 xml 结构:

<response>
    <command method="setcontent">
        <fieldname>flagOK</fieldname>
        <content>479</content>
    </command>
    <command method="setcontent">
        <fieldname>htmlOutput</fieldname>
        <content>
            <![CDATA[
            <tr><td>2013/12/05 02:00 - 2013/12/07 01:59 </td></tr><tr><td width="90">Rastreado</td><td width="60">Placa</td><td width="100">Data hora</td><td width="60" align="right">Km/h</td><td width="40">Dire??o</td><td width="40">Azimute</td><td>Mapa</td></tr><tr><td>Silverado</td><td align='left'>CQK0052</td><td>05/12/2013 13:55</td><td align='right'>113</td><td align='right'>NE</td><td align='right'>40</td><td><a href="http://maps.google.com/maps?q=-22.6766,-50.2218&amp;iwloc=A&amp;t=h&amp;z=18" target="_blank">-22.6766,-50.2218</a></td></tr><tr><td>Silverado</td><td align='left'>CQK0052</td><td>05/12/2013 13:56</td><td align='right'>112</td><td align='right'>NE</td><td align='right'>23</td><td><a href="http://maps.google.com/maps?q=-22.6638,-50.2106&amp;iwloc=A&amp;t=h&amp;z=18" target="_blank">-22.6638,-50.2106</a></td></tr><tr><td>Silverado</td><td align='left'>CQK0052</td><td>05/12/2013 18:00</td><td align='right'>111</td><td align='right'>SE</td><td align='right'>118</td><td><a href="http://maps.google.com/maps?q=-22.7242,-50.2352&amp;iwloc=A&amp;t=h&amp;z=18" target="_blank">-22.7242,-50.2352</a></td></tr>
            ]]>
        </content>
    </command>
</response>

in javascript, then you will decode by loading the xml (jquery, for example) into a variable like xmlDoc below and then getting the nodeValue for the 2nd occurence ( item(1)) of the contenttag

在javascript中,然后您将通过将xml(例如jquery)加载到像下面的xmlDoc这样的变量中来进行解码,然后获取标签的第二次出现(item(1))的nodeValuecontent

xmlDoc.getElementsByTagName("content").item(1).childNodes[0].nodeValue

or (both notations are equivalent)

或(两种符号是等价的)

xmlDoc.getElementsByTagName("content")[1].childNodes[0].nodeValue

回答by jrharshath

It makes sense to wrap HTML in CDATA. The HTML text will probably constitute on single value in XML.

在 CDATA 中包装 HTML 是有意义的。HTML 文本可能由 XML 中的单个值构成。

So not wrapping it in CDATA will cause all xml parsers to read it as a part of the XML document. While it is easy to circumvent this problem while using the xml, why the extra headache?

因此,不将其包装在 CDATA 中将导致所有 xml 解析器将其作为 XML 文档的一部分进行读取。虽然在使用 xml 时很容易规避这个问题,但为什么要额外头疼呢?

If you want to actually parse the HTML into a DOM, then its better to read the HTML text, and setup a parser to read the test separately.

如果您想真正将 HTML 解析为 DOM,那么最好读取 HTML 文本,并设置一个解析器来单独读取测试。

Hope that came out the way I intended it to.

希望结果如我所愿。

回答by Wim ten Brink

Personally, I hate CDATA segments, so I'd use encoding instead. Of course, if you add XML to XML to XML then this would result in encoding over encoding over encoding and thus some very unreadable results. Why I hate CDATA segments? I wish I knew. Personal preference, mostly. I just don't like getting used to adding "forbidden characters" inside a special segment where they would suddenly be allowed again. It just confuses me when I see XML mark-up within a CDATA segment and it's not part of the XML surrounding it. At least with encoding I will see that it's encoded.

就个人而言,我讨厌 CDATA 段,所以我会改用编码。当然,如果您将 XML 添加到 XML 到 XML,那么这将导致编码超过编码超过编码,从而导致一些非常不可读的结果。为什么我讨厌 CDATA 段?我希望我知道。个人喜好,主要是。我只是不喜欢习惯于在一个特殊的段中添加“禁止字符”,在那里它们会突然再次被允许。当我在 CDATA 段中看到 XML 标记并且它不是围绕它的 XML 的一部分时,它只是让我感到困惑。至少通过编码我会看到它被编码。

Good XML libraries will handle both encoding and CDATA segments transparently. It's just my eyes that get hurt.

好的 XML 库将透明地处理编码和 CDATA 段。只是我的眼睛受伤了。

回答by Ionu? G. Stan

I don't know what XML builder you're using, but PHP (actually libxml) knows how to handle ]]>inside CDATA sections, and so should every other XML framework. So, I'd use a CDATA section.

我不知道您使用的是什么 XML 构建器,但 PHP(实际上是 libxml)知道如何处理]]>CDATA 部分,其他所有 XML 框架也应该如此。所以,我会使用 CDATA 部分。

回答by Xinus

You can use combination of both. For example: you want to pass <h1>....</h1>in xml node you have use CDATA section to pass it. Contents inside <h1>...</h1>must be encoded to html entities like e.g. &lt;, for <. Encoding between tags will solve the problem of ]]> getting interprited as it gets converted to ]]&gt;and html tags do not contain ]]>.

您可以使用两者的组合。例如:你想传入<h1>....</h1>xml 节点,你已经使用 CDATA 部分来传递它。里面的内容<h1>...</h1>必须编码为 html 实体,例如&lt;, for <。标签之间的编码将解决 ]]> 在转换为]]&gt;并且 html 标签不包含]]>.

You can do this only if html is generated by yourself.

仅当 html 由您自己生成时,您才能执行此操作。

回答by Brian Agnew

Encoding it will work fine and is reliable. You can encode encoded sections etc. without any difficulty.

编码它会正常工作并且是可靠的。您可以毫无困难地对编码部分等进行编码。

Decoding will be done automatically by whatever XML parser is used to handle your encoded HTML.

任何用于处理编码的 HTML 的 XML 解析器都会自动完成解码。

回答by Niko

i think the answer depends on what you are planning to do with the html content, and also what type of html content you plan to support.

我认为答案取决于您计划对 html 内容做什么,以及您计划支持什么类型的 html 内容。

Especially when it comes to included javascript, encoding often results in problems. CDATA definitely helps you there.

特别是当涉及到包含的 javascript 时,编码经常会导致问题。CDATA 绝对可以帮助您。

If you plan to use only small snippets (ie. a paragraph) and have a way to preprocess/filter it (because oyu dont want javascript or fancy things anyways), you will probably be better off with encoding or actually just putting it directly as subtree in the xml. You can then also post-process the html (ie filter style or onclick attributes). But this is definitely more work.

如果您打算只使用小片段(即一个段落)并有办法对其进行预处理/过滤(因为 oyu 无论如何都不想要 javascript 或花哨的东西),那么您可能最好使用编码或实际上只是将它直接作为xml中的子树。然后,您还可以对 html 进行后处理(即过滤器样式或 onclick 属性)。但这绝对是更多的工作。