- <?xml version="1.0" encoding="utf-8"?> 的含义

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13743250/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-06 13:49:03  来源:igfitidea点击:

Meaning of - <?xml version="1.0" encoding="utf-8"?>

xmlcharacter-encodingxml-declarationxml-encoding

提问by XML Boy

I am new to XML and I am trying to understand the basics. I read the line below in "Learning XML", but it is still not clear, for me. Can someone point me to a book or website which explains these basics clearly?

我是 XML 的新手,我正在尝试了解基础知识。我在“学习 XML”中阅读了下面的行,但对我来说仍然不清楚。有人可以指点我清楚地解释这些基础知识的书籍或网站吗?

From Learning XML:

学习 XML

The XML declaration describes some of the most general properties of the document, telling the XML processor that it needs an XML parser to interpret this document.

XML 声明描述了文档的一些最通用的属性,告诉 XML 处理器它需要一个 XML 解析器来解释这个文档。

What does this mean?

这是什么意思?

I understand the xml versionpart - both doc and user of doc should "talk" in the same version of XML. But what about the encodingpart? Why is that necessary?

我理解这xml version部分 - 文档和文档用户都应该在同一版本的 XML 中“交谈”。但是那encoding部分呢?为什么这是必要的?

回答by rghome

To understand the "encoding" attribute, you have to understand the difference between bytesand characters.

要了解“编码”属性,您必须了解bytescharacters之间的区别。

Think of bytes as numbers between 0 and 255, whereas characters are things like "a", "1" and "?". The set of all characters that are available is called a character set.

将字节视为 0 到 255 之间的数字,而字符则是诸如“a”、“1”和“?”之类的东西。所有可用字符的集合称为字符集

Each character has a sequence of one or more bytes that are used to represent it; however, the exact number and value of the bytes depends on the encodingused and there are many different encodings.

每个字符都有一个用于表示它的一个或多个字节的序列;但是,字节的确切数量和值取决于所使用的编码,并且有许多不同的编码。

Most encodings are based on an old character set and encoding called ASCII which is a single byte per character (actually, only 7 bits) and contains 128 characters including a lot of the common characters used in US English.

大多数编码基于旧的字符集和称为 ASCII 的编码,它是每个字符一个字节(实际上只有 7 位),包含 128 个字符,其中包括许多美国英语中使用的常见字符。

For example, here are 6 characters in the ASCII character set that are represented by the values 60 to 65.

例如,这里是 ASCII 字符集中的 6 个字符,由值 60 到 65 表示。

Extract of ASCII Table 60-65
╔══════╦══════════════╗
║ Byte ║  Character   ║
╠══════╬══════════════║
║  60  ║      <       ║
║  61  ║      =       ║
║  62  ║      >       ║
║  63  ║      ?       ║
║  64  ║      @       ║
║  65  ║      A       ║
╚══════╩══════════════╝

In the full ASCII set, the lowest value used is zero and the highest is 127 (both of these are hidden control characters).

在完整的 ASCII 集中,使用的最低值是 0,最高值是 127(这两个都是隐藏的控制字符)。

However, once you start needing more characters than the basic ASCII provides (for example, letters with accents, currency symbols, graphic symbols, etc.), ASCII is not suitable and you need something more extensive. You need more characters (a different character set) and you need a different encoding as 128 characters is not enough to fit all the characters in. Some encodings offer one byte (256 characters) or up to six bytes.

但是,一旦您开始需要比基本 ASCII 提供的字符更多的字符(例如,带重音的字母、货币符号、图形符号等),ASCII 就不适合了,您需要更广泛的字符。您需要更多字符(不同的字符集)并且需要不同的编码,因为 128 个字符不足以容纳所有字符。某些编码提供一个字节(256 个字符)或最多六个字节。

Over time a lot of encodings have been created. In the Windows world, there is CP1252, or ISO-8859-1, whereas Linux users tend to favour UTF-8. Java uses UTF-16 natively.

随着时间的推移,已经创建了许多编码。在 Windows 世界中,有 CP1252 或 ISO-8859-1,而 Linux 用户则倾向于使用 UTF-8。Java 本机使用 UTF-16。

One sequence of byte values for a character in one encoding might stand for a completely different character in another encoding, or might even be invalid.

一种编码中字符的一个字节值序列可能代表另一种编码中完全不同的字符,甚至可能无效。

For example, in ISO 8859-1, ais represented by one byte of value 226, whereas in UTF-8it is two bytes: 195, 162. However, in ISO 8859-1, 195, 162would be two characters, ?, ¢.

例如,在ISO 8859-1 中a由一个字节的 value 表示226,而在UTF-8 中它是两个字节:195, 162. 但是,在ISO 8859-1 中195, 162将是两个字符,?, ¢

Think of XML as not a sequence of characters but a sequence of bytes.

将 XML 视为不是字符序列而是字节序列。

Imagine the system receiving the XML sees the bytes 195, 162. How does it know what characters these are?

想象一下,接收 XML 的系统看到了字节195, 162。它怎么知道这些是什么字符?

In order for the system to interpret those bytes as actual characters (and so display them or convert them to another encoding), it needs to know the encoding used in the XML.

为了让系统将这些字节解释为实际字符(并因此显示它们或将它们转换为另一种编码),它需要知道 XML 中使用的编码。

Since most common encodings are compatible with ASCII, as far as basic alphabetic characters and symbols go, in these cases, the declaration itself can get away with using only the ASCII characters to say what the encoding is. In other cases, the parser must try and figure out the encoding of the declaration. Since it knows the declaration begins with <?xmlit is a lot easier to do this.

由于大多数常见的编码都与 ASCII 兼容,就基本字母字符和符号而言,在这些情况下,声明本身可以避免仅使用 ASCII 字符来说明编码是什么。在其他情况下,解析器必须尝试找出声明的编码。由于它知道声明以 开头,<?xml因此执行此操作要容易得多。

Finally, the versionattribute specifies the XML version, of which there are two at the moment (see Wikipedia XML versions. There are slight differences between the versions, so an XML parser needs to know what it is dealing with. In most cases (for English speakers anyway), version 1.0 is sufficient.

最后,该version属性指定了 XML 版本,目前有两个(参见维基百科 XML 版本。版本之间略有不同,因此 XML 解析器需要知道它在处理什么。在大多数情况下(对于英文)扬声器),1.0 版就足够了。

回答by Pavan

An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol. Here is an example of an XHTML document. In this example, the XML declaration is included.

并非所有 XML 文档都需要 XML 声明;但是,强烈建议 XHTML 文档作者在其所有文档中使用 XML 声明。当文档的字符编码不是默认的 UTF-8 或 UTF-16 并且没有编码由更高级别的协议确定时,需要这样的声明。下面是一个 XHTML 文档的示例。在此示例中,包含 XML 声明。

<?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE html 
 PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <title>Virtual Library</title>
  </head>
  <body>
    <p>Moved to <a href="http://example.org/">example.org</a>.</p>
 </body>
</html>

Please refer to the W3 standards for XML.

请参阅XMLW3 标准

回答by Oded

This is the XML optionalpreamble.

这是 XML可选序言。

  • version="1.0"means that this is the XML standard this file conforms to
  • encoding="utf-8"means that the file is encoded using the UTF-8 Unicode encoding
  • version="1.0"表示这是此文件符合的 XML 标准
  • encoding="utf-8"表示文件使用 UTF-8 Unicode 编码

回答by robasta

The encoding declaration identifies which encoding is used to represent the characters in the document.

编码声明标识了使用哪种编码来表示文档中的字符。

More on the XML Declarationhere: http://msdn.microsoft.com/en-us/library/ms256048.aspx

有关XML 声明的更多信息,请访问:http: //msdn.microsoft.com/en-us/library/ms256048.aspx

回答by O.Badr

Can someone point me to a book or website which explains these basics clearly ?

有人可以指点我清楚地解释这些基础知识的书或网站吗?

You can check this XML Tutorialwith examples.

您可以通过示例查看此XML 教程

But what about the encoding part ? Why is that necessary ?

但是编码部分呢?为什么有必要?

W3C provides explanationabout encoding :

W3C 提供了有关编码的解释

"The document character set for XML and HTML 4.0 is Unicode (aka ISO 10646). This means that HTML browsers and XML processors should behave as if they used Unicode internally. But it doesn't mean that documents have to be transmitted in Unicode. As long as client and server agree on the encoding, they can use any encoding that can be converted to Unicode..."

“XML 和 HTML 4.0 的文档字符集是 Unicode(又名 ISO 10646)。这意味着 HTML 浏览器和 XML 处理器应该像它们在内部使用 Unicode 一样工作。但这并不意味着文档必须以 Unicode 传输。只要客户端和服务器就编码达成一致,他们就可以使用任何可以转换为 Unicode 的编码……”

回答by kshama singh

The XML declaration in the document map consists of the following:

文档映射中的 XML 声明包括以下内容:

The version number, ?xml version="1.0"?. 
The version number, ?xml version="1.0"?. 

This is mandatory. Although the number might change for future versions of XML, 1.0 is the current version.

这是强制性的。尽管未来版本的 XML 可能会更改该数字,但 1.0 是当前版本。

The encoding declaration,

编码声明,

encoding="UTF-8"?
encoding="UTF-8"?

This is optional. If used, the encoding declaration must appear immediately after the version information in the XML declaration, and must contain a value representing an existing character encoding.

这是可选的。如果使用,编码声明必须紧跟在 XML 声明中的版本信息之后,并且必须包含表示现有字符编码的值。