是否可以在 XML 属性中包含 HTML 文本或 CDATA?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1289524/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 00:36:40  来源:igfitidea点击:

Is it possible to have HTML text or CDATA inside an XML attribute?

htmlxml

提问by Boon

I keep getting "XML parser failure: Unterminated attribute" with my parser when I attempt to put HTML text or CDATA inside my XML attribute. Is there a way to do this or is this not allowed by the standard?

当我尝试将 HTML 文本或 CDATA 放入我的 XML 属性时,我的解析器不断收到“XML 解析器失败:未终止的属性”。有没有办法做到这一点,或者这是标准不允许的?

采纳答案by Rich Seller

If an attribute is not a tokenized or enumerated type, it is processed as CDATA. The details for how the attribute is processed can be found in the Extensible Markup Language (XML) 1.0 (Fifth Edition).

如果属性不是标记化或枚举类型,则将其作为 CDATA 处理。有关如何处理属性的详细信息,请参见可扩展标记语言 (XML) 1.0(第五版)

3.3.1 Attribute Types

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types are more constrained. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3.3 Attribute-Value Normalization.

[54]  AttType       ::=    StringType | TokenizedType | EnumeratedType
[55]  StringType    ::=    'CDATA'
[56]  TokenizedType ::=    'ID' [VC: ID]
            [VC: One ID per Element Type]
            [VC: ID Attribute Default]
        | 'IDREF'      [VC: IDREF]
        | 'IDREFS'     [VC: IDREF]
        | 'ENTITY'     [VC: Entity Name]
        | 'ENTITIES'   [VC: Entity Name]
        | 'NMTOKEN'    [VC: Name Token]
        | 'NMTOKENS'   [VC: Name Token]

3.3.1 属性类型

XML 属性类型分为三种:字符串类型、一组标记类型和枚举类型。string 类型可以将任何文字字符串作为值;标记化类型受到更多限制。如 3.3.3 属性值规范化中所述,在属性值被规范化之后应用语法中注明的有效性约束。

[54]  AttType       ::=    StringType | TokenizedType | EnumeratedType
[55]  StringType    ::=    'CDATA'
[56]  TokenizedType ::=    'ID' [VC: ID]
            [VC: One ID per Element Type]
            [VC: ID Attribute Default]
        | 'IDREF'      [VC: IDREF]
        | 'IDREFS'     [VC: IDREF]
        | 'ENTITY'     [VC: Entity Name]
        | 'ENTITIES'   [VC: Entity Name]
        | 'NMTOKEN'    [VC: Name Token]
        | 'NMTOKENS'   [VC: Name Token]

...

...

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor MUST normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks MUST have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
  2. Begin with a normalized value consisting of the empty string.
  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
    • For a character reference, append the referenced character to the normalized value.
    • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
    • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
    • For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor MUST further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read SHOULD be treated by a non-validating processor as if declared CDATA.

It is an error if an attribute valuecontains a referenceto an entity for which no declaration has been read.

3.3.3 属性值归一化

在将属性值传递给应用程序或检查其有效性之前,XML 处理器必须通过应用以下算法或使用其他一些方法来规范化属性值,以便传递给应用程序的值与生成的值相同通过算法。

  1. 2.11 End-of-Line Handling 中所述,所有换行符必须在输入到#xA 时标准化,因此该算法的其余部分对以这种方式标准化的文本进行操作。
  2. 从由空字符串组成的规范化值开始。
  3. 对于非规范化属性值中的每个字符、实体引用或字符引用,从第一个开始一直到最后一个,执行以下操作:
    • 对于字符引用,将引用的字符附加到规范化值。
    • 对于实体引用,将此算法的第 3 步递归应用于实体的替换文本。
    • 对于空白字符(#x20、#xD、#xA、#x9),将空格字符 (#x20) 附加到规范化值。
    • 对于另一个字符,将该字符附加到规范化值。

如果属性类型不是 CDATA,那么 XML 处理器必须通过丢弃任何前导和尾随空格 (#x20) 字符,并将空格 (#x20) 字符序列替换为单个空格 (#x20) 来进一步处理规范化的属性值) 特点。

请注意,如果未规范化的属性值包含对除空格 (#x20) 以外的空白字符的字符引用,则规范化值包含引用的字符本身(#xD、#xA 或 #x9)。这与非规范化值包含空白字符(不是引用)的情况形成对比,后者被规范化值中的空格字符 (#x20) 替换,也与非规范化值包含实体引用的情况形成对比替换文本包含一个空白字符;在递归处理时,空白字符被归一化值中的空格字符 (#x20) 替换。

所有未读取声明的属性都应该被非验证处理器视为声明为CDATA

如果属性值包含对未读取声明的实体的引用,则会出错。

回答by JMP

No, CDATA cannot be the value of an attribute. It can only be inside an element.

不,CDATA 不能是属性的值。它只能在一个元素内。

回答by John Kugelman

Attributes can only have plain text inside, no tags, comments, or other structured data. You need to escape any special characters by using character entities. For example:

属性内部只能包含纯文本,不能包含标签、注释或其他结构化数据。您需要使用字符实体来转义任何特殊字符。例如:

<code text="&lt;a href=&quot;/&quot;&gt;">

That would give the textattribute the value <a href="/">. Note that this is just plain text so if you wanted to treat it as HTML you'd have to run that string through an HTML parser yourself. The XML DOM wouldn't parse the textattribute for you.

那会给text属性值<a href="/">。请注意,这只是纯文本,因此如果您想将其视为 HTML,则必须自己通过 HTML 解析器运行该字符串。XML DOM 不会text为您解析该属性。

回答by n611x007

CDATAis unfortunately an ambiguous thing to say here. There are "CDATA Sections", and "CDATAAttribute Type".

CDATA不幸的是,这里要说的是模棱两可的事情。有“CDATA 部分”CDATA属性类型”

Your attribute value canbe of type CDATA with the "CDATA Attribute Type".

您的属性值可以是带有“CDATA 属性类型”的 CDATA 类型。

Here is an xml that contains a "CDATA Section"(aka.CDSect):

这是一个包含“CDATA 部分”又名。CDSect)的 xml :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<elemke>
<![CDATA[
foo
]]>
</elemke>

Here is an xml that contains a "CDATA Attribute Type"(as AttType):

这是一个包含“CDATA 属性类型”(as AttType)的 xml :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE elemke [
<!ATTLIST brush wood CDATA #REQUIRED>
]>

<elemke>
<brush wood="guy&#xA;threep"/>
</elemke>

You cannotuse a "CDATA Section"for an Attribute Value: wrong:<brush wood=<![CDATA[foo]]>/>

不能为属性值使用“CDATA 部分”:错误:<brush wood=<![CDATA[foo]]>/>

You canuse a "CDATA Attribute Type"for your Attribute's Type, I think this is actually what happens in the usual case, and your attribute value isactually a CDATA: for an element like <brush wood="guy&#xA;threep"/>, in the raw binary bytestring that is the .xmlfile, you have guy&#xA;threephowever when the file is processed, the attribute value in memory willbe

可以使用“CDATA属性类型”为属性的类型,我认为这实际上是在通常情况下会发生什么,和你的属性值实际上是一个CDATA:像一个元素<brush wood="guy&#xA;threep"/>,在原始的二进制字节串即是.xml文件,你guy&#xA;threep然而,当文件被处理,在内存中的属性值

guy
threep


Your problem may lie in 1) producing a right xml file and 2) configuring a "xml processor" to produce an output you want.

您的问题可能在于 1) 生成正确的 xml 文件和 2) 配置“xml 处理器”以生成您想要的输出。

For example, in case you write a raw binary file as your xml by hand, you need to put these escapes inside the attribute value part in the raw file, like I wrote <brush wood="guy&#xA;threep"/>here, instead of <brush wood="guy(newline) threep"/>

例如,如果您手动编写原始二进制文件作为 xml,则需要将这些转义符放在原始文件的属性值部分中,就像我在<brush wood="guy&#xA;threep"/>这里写的那样,而不是<brush wood="guy(新队) threep"/>

Thenthe parse would actually give you a newline, I've tried this with a processor.

然后解析实际上会给你一个换行符,我已经用处理器试过了。

You can try it with a processor like saxon or for poor-man's experiment one like a browser, opening the xml in firefox andcopying the value to a text editor - firefox displayedthe newline as a space, but copying the string to a text editor showed the newline. (Probably with a better suited processor you could save the direct output right away.)

你可以像撒克逊或-穷人的试验一个类似浏览器的处理器尝试一下,打开Firefox中的XML的值复制到一个文本编辑器-火狐显示换行符作为一个空间,但复制字符串到文本编辑器显示换行符。(可能使用更合适的处理器,您可以立即保存直接输出。)

Now the "only" thing you need to do is make sure you handle this CDATA appropriately. For example, if you have an XSL stylesheet, that would produce you a html, you can use something like this .xslfor such an xml:

现在您需要做的“唯一”的事情是确保您适当地处理这个 CDATA。例如,如果你有一个 XSL 样式表,它会为你生成一个 html,你可以.xsl对这样的 xml使用这样的东西:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet  version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template name="split">
  <xsl:param name="list"      select="''" />
  <xsl:param name="separator" select="'&#xA;'" />
  <xsl:if test="not($list = '' or $separator = '')">
    <xsl:variable name="head" select="substring-before(concat($list, $separator), $separator)" />
    <xsl:variable name="tail" select="substring-after($list, $separator)" />

    <xsl:value-of select="$head"/>
    <br/><xsl:text>&#xA;</xsl:text>
    <xsl:call-template name="split">
        <xsl:with-param name="list"      select="$tail" />
        <xsl:with-param name="separator" select="$separator" />
    </xsl:call-template>
  </xsl:if>
</xsl:template>


<xsl:template match="brush">
  <html>
  <xsl:call-template name="split">
    <xsl:with-param name="list" select="@wood"/>
  </xsl:call-template>
  </html>
</xsl:template>

</xsl:stylesheet>

Which in a browser or with a processor like saxon using java -jar saxon9he.jar -s:eg2.xml -xsl:eg2.xsl -o:eg2.htmlsaxon home edition 9.5would produce this html-like thing:

在浏览器或像 saxon 这样的处理器中使用java -jar saxon9he.jar -s:eg2.xml -xsl:eg2.xsl -o:eg2.htmlsaxon 家庭版 9.5会产生这种类似 html 的东西:

<html>guy<br>
   threep<br>

</html>  

which will look like this in a browser:

在浏览器中看起来像这样:

guy
threep

Here I am using a recursive template 'split' from Tomalak, thanks to Mads Hansen, because my target processor doesn't support neither string-joinnor tokenizewhich are version 2.0 only.

在这里,我使用了来自 Tomalak的递归模板“拆分” ,感谢Mads Hansen,因为我的目标处理器既不支持string-join也不支持tokenize2.0 版。

回答by sumit raju

We can't use CDATA as attribute, but we can bind html using HTML codes. Here is one example:

我们不能使用 CDATA 作为属性,但我们可以使用 HTML 代码绑定 html。这是一个例子:

to achieve this: <span class="abc"></span>

为达到这个: <span class="abc"></span>

use XML code like this:

像这样使用 XML 代码:

<xmlNode attibuteName="&lt;span class=&quot;abc&quot;&gt;Your Text&lt;&#47;span&gt;"></xmlNode>

<xmlNode attibuteName="&lt;span class=&quot;abc&quot;&gt;Your Text&lt;&#47;span&gt;"></xmlNode>

回答by Martin P. Hellwig

Yes you can when you encode the content within the XML tags. I.e. use &amp;&lt;&gt;&quot;&apos;, that way it will not be seen as markup inside your markup.

是的,当您对 XML 标签中的内容进行编码时,您可以。即使用&amp;&lt;&gt;&quot;&apos;,这样它就不会被视为您的标记中的标记。