我需要在 XML 文档中转义哪些字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1091945/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 12:36:35  来源:igfitidea点击:

What characters do I need to escape in XML documents?

xmlescapingcharacter

提问by Julius A

What characters must be escaped in XML documents, or where could I find such a list?

哪些字符必须在 XML 文档中转义,或者在哪里可以找到这样的列表?

回答by Welbog

If you use an appropriate class or library, they will do the escaping for you. Many XML issues are caused by string concatenation.

如果您使用适当的类或库,他们会为您进行转义。许多 XML 问题是由字符串连接引起的。

XML escape characters

XML 转义字符

There are only five:

只有五个:

"   "
'   '
<   &lt;
>   &gt;
&   &amp;

Escaping characters depends on where the special character is used.

转义字符取决于特殊字符的使用位置。

The examples can be validated at the W3C Markup Validation Service.

这些示例可以在W3C 标记验证服务中进行验证

Text

文本

The safe way is to escape all five characters in text. However, the three characters ", 'and >needn't be escaped in text:

安全的方法是转义文本中的所有五个字符。但是,三个字符", 'and>不需要在文本中转义:

<?xml version="1.0"?>
<valid>"'></valid>

Attributes

属性

The safe way is to escape all five characters in attributes. However, the >character needn't be escaped in attributes:

安全的方法是转义属性中的所有五个字符。但是,>字符不需要在属性中转义:

<?xml version="1.0"?>
<valid attribute=">"/>

The 'character needn't be escaped in attributes if the quotes are ":

'如果引号是,则不需要在属性中对字符进行转义"

<?xml version="1.0"?>
<valid attribute="'"/>

Likewise, the "needn't be escaped in attributes if the quotes are ':

同样,"如果引号是,则不需要在属性中转义'

<?xml version="1.0"?>
<valid attribute='"'/>

Comments

注释

All five special characters must notbe escaped in comments:

注释中不得对所有五个特殊字符进行转义:

<?xml version="1.0"?>
<valid>
<!-- "'<>& -->
</valid>

CDATA

数据中心

All five special characters must notbe escaped in CDATAsections:

不得CDATA部分中对所有五个特殊字符进行转义:

<?xml version="1.0"?>
<valid>
<![CDATA["'<>&]]>
</valid>

Processing instructions

加工说明

All five special characters must notbe escaped in XML processing instructions:

不得在 XML 处理指令中对所有五个特殊字符进行转义:

<?xml version="1.0"?>
<?process <"'&> ?>
<valid/>

XML vs. HTML

XML 与 HTML

HTML has its own set of escape codeswhich cover a lot more characters.

HTML有一套自己的转义码,涵盖了更多的字符。

回答by Andrew Hare

Perhaps this will help:

也许这会有所帮助:

List of XML and HTML character entity references:

XML 和 HTML 字符实体引用列表

In SGML, HTML and XML documents, the logical constructs known as character data and attribute values consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series of characters called a character reference, of which there are two types: a numeric character reference and a character entity reference. This article lists the character entity references that are valid in HTML and XML documents.

在 SGML、HTML 和 XML 文档中,称为字符数据和属性值的逻辑结构由字符序列组成,其中每个字符可以直接表现(表示自身),也可以由称为字符引用的一系列字符表示,其中有两种类型:数字字符引用和字符实体引用。本文列出了在 HTML 和 XML 文档中有效的字符实体引用。

That article lists the following five predefined XML entities:

该文章列出了以下五个预定义的 XML 实体:

quot  "
amp   &
apos  '
lt    <
gt    >

回答by Albz

According to the specifications of the World Wide Web Consortium (w3C), there are 5 characters that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. In all the other cases, these characters must be replaced either using the corresponding entity or the numeric reference according to the following table:

根据万维网联盟 (w3C) 的规范,有 5 个字符不得以其文字形式出现在 XML 文档中,除非用作标记定界符或在注释、处理指令或 CDATA 部分中使用. 在所有其他情况下,必须根据下表使用相应的实体或数字引用替换这些字符:

Original CharacterXML entity replacementXML numeric replacement
<                              &lt;                                    &#60;                                    
>                              &gt;                                   &#62;                                    
"                               &quot;                               &#34;                                    
&                              &amp;                               &#38;                                    
'                               &apos;                               &#39;                                    

Original CharacterXML entity replacementXML numeric replacement
<                              &lt;                                    &#60;                                    
>                              &gt;                                   &#62;                                    
"                               &quot;                               &#34;                                    
&                              &amp;                               &#38;                                    
'                               &apos;                               &#39;                                    

Notice that the aforementioned entities can be used also in HTML, with the exception of &apos;, that was introduced with XHTML 1.0 and is not declared in HTML 4. For this reason, and to ensure retro-compatibility, the XHTML specification recommends the use of &#39;instead.

请注意,上述实体也可以在 HTML 中使用,除了' ,它是在 XHTML 1.0 中引入的,并没有在 HTML 4 中声明。出于这个原因,并确保复古兼容性,XHTML 规范建议使用 ' 反而。

回答by Peter Bartels

Escaping characters is different for tags and attributes.

标签和属性的转义字符是不同的。

For tags:

对于标签:

 < &lt;
 > &gt; (only for compatibility, read below)
 & &amp;

For attributes:

对于属性:

" &quot;
' &apos;

From Character Data and Markup:

字符数据和标记

The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings " &amp; " and " &lt; " respectively. The right angle bracket (>) may be represented using the string " &gt; ", and must, for compatibility, be escaped using either " &gt; " or a character reference when it appears in the string " ]]> " in content, when that string is not marking the end of a CDATA section.

To allow attribute values to contain both single and double quotes, the apostrophe or single-quote character (') may be represented as " &apos; ", and the double-quote character (") as " &quot; ".

与符号 (&) 和左尖括号 (<) 不得以其文字形式出现,除非用作标记定界符,或者在注释、处理指令或 CDATA 部分中。如果其他地方需要它们,则必须分别使用数字字符引用或字符串“&”和“<”进行转义。右尖括号 (>) 可以使用字符串 " > " 表示,为了兼容性,必须使用 " > " 或出现在内容中的字符串 " ]]> " 中的字符引用进行转义,当该字符串未标记 CDATA 部分的结尾时。

为了允许属性值同时包含单引号和双引号,撇号或单引号字符 (') 可以表示为 " ' ",双引号字符 (") 可以表示为 " " ”。

回答by kjhughes

New, simplified answer to an old, commonly asked question...

对一个旧的、常见的问题的新的、简化的答案......

Simplified XML Escaping (prioritized, 100% complete)

简化的 XML 转义(优先,100% 完成)

  1. Always(90% important to remember)

    • Escape <as &lt;unless <is starting a <tag/>.
    • Escape &as &amp;unless &is starting an &entity;.
  2. Attribute Values(9% important to remember)

    • attr="'Single quotes'are ok within double quotes."
    • attr='"Double quotes"are ok within single quotes.'
    • Escape "as &quot;and 'as &apos;otherwise.
  3. Comments, CDATA, and Processing Instructions(0.9% important to remember)

    • <!--Within comments-->nothing has to be escaped but no --strings are allowed.
    • <![CDATA[Within CDATA]]>nothing has to be escaped, but no ]]>strings are allowed.
    • <?PITargetWithin PIs?>nothing has to be escaped, but no ?>strings are allowed.
  4. Esoterica(0.1% important to remember)

    • Escape ]]>as ]]&gt;unless ]]>is ending a CDATA section.
      (This rule applies to character data in general – even outside a CDATA section.)
  1. 始终(记住 90% 很重要)

    • 转义<&lt;除非<开始一个<tag/>.
    • 转义&&amp;除非&开始一个&entity;.
  2. 属性值(记住 9% 很重要)

    • attr="''双引号内的单引号是可以的。"
    • attr='"双引号"在单引号内是可以的。'
    • 逃生"&quot;'&apos;其他。
  3. 注释CDATA处理说明(记住 0.9% 很重要)

    • <!--注释中-->没有什么必须转义,但不允许使用--字符串。
    • <![CDATA[CDATA]]>中,无需转义任何内容,但不允许使用]]>字符串。
    • <?PITargetPI?>中,无需转义任何内容,但不允许使用?>字符串。
  4. Esoterica (0.1% 需要记住)

    • 转义]]>]]&gt;除非]]>结束 CDATA 部分。
      (这条规则一般适用于字符数据——甚至在 CDATA 部分之外。)

回答by Charon ME

In addition to the commonly known five characters [<, >, &, ", and '], I would also escape the vertical tab character (0x0B). It is valid UTF-8, but not valid XML 1.0, and even many libraries (including the highly portable (ANSI C) library libxml2) miss it and silently output invalid XML.

除了常见的五个字符 [<、>、&、" 和 '],我还会转义垂直制表符 (0x0B)。它是有效的 UTF-8,但不是有效的 XML 1.0,甚至许多库(包括高度可移植(ANSI C)库libxml2)错过它并静默输出无效的 XML。

回答by Tim Cooper

Abridged from: XML, Escaping

节选自:XML,转义

There are five predefined entities:

有五个预定义实体:

&lt; represents "<"
&gt; represents ">"
&amp; represents "&"
&apos; represents '
&quot; represents "

"All permitted Unicode characters may be represented with a numeric character reference."For example:

“所有允许的 Unicode 字符都可以用数字字符引用表示。” 例如:

&#20013;

Most of the control characters and other Unicode ranges are specifically excluded, meaning (I think) they can't occur either escaped or direct:

大多数控制字符和其他 Unicode 范围都被明确排除在外,这意味着(我认为)它们既不能转义也不能直接出现:

Valid characters in XML

XML 中的有效字符

回答by u991073

It depends on the context. For the content, it is <and &, and ]]>(though a string of three instead of one character).

这取决于上下文。对于内容,它是<&,和]]>(虽然是三个而不是一个字符的字符串)。

For attribute values, it is <, &, ", and '.

对于属性值,它是<&"'

For CDATA, it is ]]>.

对于 CDATA,它是]]>

回答by Questionless

Only <and &are required to be escaped if they are to be treated character data and not markup:

只有<&需要进行转义,如果他们要处理的字符数据,而不是标记:

2.4 Character Data and Markup

2.4 字符数据和标记