Html PCDATA 和 CDATA 究竟是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/857876/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 23:44:40  来源:igfitidea点击:

what actually is PCDATA and CDATA?

htmlxmlxhtmlcdatapcdata

提问by nonopolarity

it seems that a loose definition of PCDATA and CDATA is that

似乎 PCDATA 和 CDATA 的松散定义是

  1. PCDATA is character data, but isto be parsed.
  2. CDATA is character data, and is notto be parsed.
  1. PCDATA是字符数据,但是被解析。
  2. CDATA 是字符数据,不会被解析。

but then someone told me that CDATA is actually parsed or PCDATA is actually not parsed... so it is a bit of a confusion. Does anyone know the real deal is?

但是后来有人告诉我 CDATA 实际上被解析了或者 PCDATA 实际上没有被解析......所以这有点混乱。有谁知道真正的交易是什么?

Update: I actually added the PCDATA definition on Wikipedia... so don't take that answer too seriously as that's only my rough understanding of it.

更新:我实际上在维基百科上添加了 PCDATA 定义......所以不要太认真地对待这个答案,因为这只是我对它的粗略理解。

采纳答案by ólafur Waage

From WIKI:

来自维基:

PCDATA

电脑数据

Simply speaking, PCDATA stands for Parsed Character Data. That means the characters are to be parsed by the XML, XHTML, or HTML parser. (&lt;will be changed to <, <p>will be taken to mean a paragraph tag, etc). Compare that with CDATA, where the characters are not to be parsed by the XML, XHTML, or HTML parser.

简单地说,PCDATA 代表解析的字符数据。这意味着字符将由 XML、XHTML 或 HTML 解析器解析。(&lt;将更改为 <,<p>将表示段落标记等)。将其与 CDATA 进行比较,其中字符不被 XML、XHTML 或 HTML 解析器解析。

CDATA

数据中心

The term CDATA, meaning character data, is used for distinct, but related purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

术语 CDATA,意思是字符数据,在标记语言 SGML 和 XML 中用于不同但相关的目的。该术语表示文档的某个部分是通用字符数据,而不是非字符数据或具有更具体、有限结构的字符数据。

回答by mirod

Both PCDATA and CDATA are parsed. They are both characterdata.

PCDATA 和 CDATA 都被解析。它们都是字符数据。

They both must only include valid characters. For example if your document encoding is UTF-8, the content of CDATA sections must still be valid UTF-8 characters. So random binary data will probably prevent the document from being well-formed. Also CDATA sections are still parsed, if only to find the end section tag. But other markup-like characters, like <, > and & are ignored and passed as-is by the parser.

它们都必须只包含有效字符。例如,如果您的文档编码是 UTF-8,则 CDATA 部分的内容必须仍然是有效的 UTF-8 字符。所以随机的二进制数据可能会阻止文档的格式正确。CDATA 部分仍然被解析,如果只是为了找到结束部分标签。但是其他类似标记的字符,例如 <、> 和 & 会被解析器忽略并按原样传递。

OTOH in PCDATA litteral < and & (and ' or " in attribute values) must be escaped, or they will be interpreted as markup. Entities will also be expanded.

PCDATA litteral < 和 &(以及属性值中的 ' 或 ")中的 OTOH 必须转义,否则它们将被解释为标记。实体也将被扩展。

So yes, CDATA sections are indeed parsed. I am not sure why you were told that PCDATA is not parsed though.

所以是的,确实解析了 CDATA 部分。我不知道为什么你被告知 PCDATA 没有被解析。

回答by AndrewS

PCDATA - Parsed Character Data

PCDATA - 解析的字符数据

CDATA - (Unparsed) Character Data

CDATA -(未解析的)字符数据

http://www.w3schools.com/XML/xml_cdata.asp

http://www.w3schools.com/XML/xml_cdata.asp

回答by Rose Perrone

  • PCDATA is text that will be parsed by a parser. Tags inside the text will be treated as markup and entities will be expanded.
  • CDATA is text that will notbe parsed by a parser. Tags inside the text will notbe treated as markup and entities will not be expanded.
  • PCDATA 是将由解析器解析的文本。文本内的标签将被视为标记,实体将被扩展。
  • CDATA 是不会被解析器解析的文本。文本内的标签 不会被视为标记,实体不会被扩展。

By default, everything is PCDATA. In the following example, ignoring the root, <bar>will be parsed, and it'll have no content, but one child.

默认情况下,一切都是 PCDATA。在下面的例子中,忽略根,<bar>将被解析,它没有内容,只有一个孩子。

<?xml version="1.0"?>
<foo>
<bar><test>content!</test></bar>
</foo>

When we want to specify that an element will only contain text, and no child elements, we use the keyword PCDATA, because this keyword specifies that the element must contain parsable character data – that is , any text except the characters less-than (<) , greater-than (>) , ampersand (&), quote(') and double quote (").

当我们想指定一个元素只包含文本,不包含子元素时,我们使用关键字 PCDATA,因为这个关键字指定元素必须包含可解析的字符数据——即,除了小于 (< ) 、大于号 (>) 、与号 (&)、quote(') 和双引号 (")。

In the next example, bar is CDATA, and isn't parsed, and has the content "<test>content!</test>".

在下一个示例中, bar 是 CDATA,没有被解析,并且有 content "<test>content!</test>"

<?xml version="1.0"?>
<foo>
<bar><![CDATA[<test>content!</test>]]></bar>
</foo>

There are several content models in SGML. The #PCDATA content model says that an element may contain plain text. The "parsed" part of it means that markup (including PIs, comments and SGML directives) in it is parsed instead of displayed as raw text. It also means that entity references are replaced.

SGML 中有几种内容模型。#PCDATA 内容模型表示元素可能包含纯文本。它的“解析”部分意味着其中的标记(包括 PI、注释和 SGML 指令)被解析而不是显示为原始文本。这也意味着实体引用被替换。

Another type of content model allowing plain text contents is CDATA. In XML, the element content model may not implicitly be set to CDATA, but in SGML, it means that markup and entity references are ignored in the contents of the element. In attributes of CDATA type however, entity references are replaced.

另一种允许纯文本内容的内容模型是 CDATA。在 XML 中,元素内容模型可能不会隐式设置为 CDATA,但在 SGML 中,这意味着元素内容中的标记和实体引用被忽略。然而,在 CDATA 类型的属性中,实体引用被替换。

In XML #PCDATA is the only plain text content model. You use it if you at all want to allow text contents in the element. The CDATA content model may be used explicitly through the CDATA block markup in #PCDATA, but element contents may not be defined as CDATA per default.

在 XML 中,#PCDATA 是唯一的纯文本内容模型。如果您希望在元素中允许文本内容,则可以使用它。CDATA 内容模型可以通过#PCDATA 中的 CDATA 块标记显式使用,但元素内容可能不会默认定义为 CDATA。

In a DTD, the type of an attribute that contains text must be CDATA. The CDATA keyword in an attribute declaration has a different meaning than the CDATA section in an XML document. In CDATA section all characters are legal (including <,>,&,' and “ characters) except the “]]>” end tag.

在 DTD 中,包含文本的属性类型必须是 CDATA。属性声明中的 CDATA 关键字与 XML 文档中的 CDATA 部分具有不同的含义。在 CDATA 部分中,除了“]]>”结束标记外,所有字符都是合法的(包括 <、>、&、' 和“ 字符)。

#PCDATA is not appropriate for the type of an attribute. It is used for the type of "leaf" text.

#PCDATA 不适用于属性类型。它用于“叶”文本类型。

#PCDATA is prepended by a hash (also known as a "hashtag" or octothorp) simply for historical reasons.

只是出于历史原因,#PCDATA 前面有一个哈希(也称为“哈希标签”或 octothorp)。

回答by Ronald Wildenberg

Your first definition is correct.

你的第一个定义是正确的。

PCDATA is parsed which means that entities are expanded and that text is treated as markup. CDATA is not parsed by an XML parser.

PCDATA 被解析,这意味着实体被扩展并且文本被视为标记。CDATA 不是由 XML 解析器解析的。

回答by trojjer

If only elements were set to CDATA by default in the XHTML DTDs, it would save a lot of ugly manual overrides... Why would script blocks contain other elements? If there are such elements, they are handled by the JS interpreter in DOM manipulation actions -- in which case they should still be completely ignored by the XML parser before document insertion and rendering. I suppose it may have been designed to force the use of external script resource files, which is a ultimately a good thing.

如果在 XHTML DTD 中默认只将元素设置为 CDATA,它将节省大量难看的手动覆盖... 为什么脚本块会包含其他元素?如果有这样的元素,它们由 JS 解释器在 DOM 操作操作中处理——在这种情况下,在文档插入和呈现之前,它们仍然应该被 XML 解析器完全忽略。我想它可能被设计为强制使用外部脚本资源文件,这最终是一件好事。