Html <meta charset="utf-8"> 与 <meta http-equiv="Content-Type">
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4696499/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
<meta charset="utf-8"> vs <meta http-equiv="Content-Type">
提问by CuriousMind
In order to define charset for HTML5 Doctype, which notation should I use?
为了为HTML5 Doctype定义字符集,我应该使用哪种表示法?
Short:
<meta charset="utf-8" />
Long:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
短的:
<meta charset="utf-8" />
长:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
采纳答案by Quentin
In HTML5, they are equivalent. Use the shorter one, it is easier to remember and type. Browser support is finesince it was designed for backwards compatibility.
在 HTML5 中,它们是等价的。使用较短的一个,更容易记住和输入。浏览器支持很好,因为它是为向后兼容而设计的。
回答by CodeBoy
Both forms of the meta charsetdeclaration are equivalent and should work the same across browsers. But, there are a few things you need to remember when declaring your web files character-set as UTF-8:
元字符集声明的两种形式是等效的,并且在浏览器中的工作方式应该相同。但是,在将 Web 文件字符集声明为 UTF-8 时,您需要记住以下几点:
- Save your file(s) in UTF-8 encoding withoutthe byte-order mark(BOM).
- Declare the encoding in your HTML files using meta charset(like above).
- Your web server mustserve your files, declaring the UTF-8 encoding in the Content-Type HTTP header.
- 保存文件(S)以UTF-8编码,而不该字节顺序标记(BOM)。
- 使用元字符集(如上)在 HTML 文件中声明编码。
- 您的 Web 服务器必须为您的文件提供服务,并在 Content-Type HTTP 标头中声明 UTF-8 编码。
Apache servers are configured to serve files in ISO-8859-1 by default, so you need to add the following line to your .htaccess
file:
默认情况下,Apache 服务器配置为提供 ISO-8859-1 中的文件,因此您需要将以下行添加到您的.htaccess
文件中:
AddDefaultCharset UTF-8
This will configure Apache to serve your files declaring UTF-8 encoding in the Content-Type response header, but your files mustbe saved in UTF-8 (without BOM) to begin with.
这将配置 Apache 为您的文件提供在 Content-Type 响应标头中声明 UTF-8 编码的文件,但您的文件必须以 UTF-8(无 BOM)开始保存。
Notepad cannot save your files in UTF-8 without the BOM. A free editor that can is Notepad++. On the program menu bar, select "Encoding > Encode in UTF-8 without BOM". You can also open files and re-save them in UTF-8 using "Encoding > Convert to UTF-8 without BOM".
记事本无法在没有 BOM 的情况下以 UTF-8 格式保存您的文件。可以是Notepad++ 的免费编辑器。在程序菜单栏上,选择“编码 > 以 UTF-8 编码,无 BOM”。您还可以使用“编码 > 无 BOM 转换为 UTF-8”打开文件并以 UTF-8 格式重新保存它们。
More on the Byte Order Mark (BOM) at Wikipedia.
有关Wikipedia上的字节顺序标记 (BOM) 的更多信息。
回答by Simon White
Another reason to go with the short one is that it matches other instances where you might specify a character set in markup. For example:
使用短的另一个原因是它与您可能在标记中指定字符集的其他实例相匹配。例如:
<script type="javascript" charset="UTF-8" src="/script.js"></script>
<p><a charset="UTF-8" href="http://example.com/">Example Site</a></p>
Consistency helps to reduce errors and make code more readable.
一致性有助于减少错误并使代码更具可读性。
Note that the charset attribute is case-insensitive. You can use UTF-8 or utf-8, however UTF-8 is clearer, more readable, more accurate.
请注意,字符集属性不区分大小写。您可以使用 UTF-8 或 utf-8,但 UTF-8 更清晰、更易读、更准确。
Also, there is absolutely no reason at all to use any value other than UTF-8 in the meta charset attribute or page header. UTF-8 is the default encoding for Web documents since HTML4 in 1999 and the only practical way to make modern Web pages.
此外,完全没有理由在元字符集属性或页眉中使用 UTF-8 以外的任何值。自 1999 年的 HTML4 以来,UTF-8 是 Web 文档的默认编码,也是制作现代 Web 页面的唯一实用方法。
Also you should not use HTML entities in UTF-8. Characters like the copyright symbol should be typed directly. The only entities you should use are for the 5 reserved markup characters: less than, greater than, ampersand, prime, double prime. Entities need an HTML parser, which you may not always want to use going forward, they introduce errors, make your code less readable, increase your file sizes, and sometimes decode incorrectly in various browsers depending on which entities you used. Learn how to type/insert copyright, trademark, open quote, close quote, apostrophe, em dash, en dash, bullet, Euro, and any other characters you encounter in your content, and use those actual characters in your code. The Mac has a Character Viewer that you can turn on in the Keyboard System Preference, and you can find and then drag and drop the characters you need, or use the matching Keyboard Viewer to see which keys to type. For example, trademark is Option+2. UTF-8 contains all of the characters and symbols from every written human language. So there is no excuse for using -- instead of an em dash. It is not a bad idea to learn the rules of punctuation and typography also ... for example, knowing that a period goes inside a close quote, not outside.
此外,您不应在 UTF-8 中使用 HTML 实体。像版权符号这样的字符应该直接输入。您应该使用的唯一实体是 5 个保留标记字符:小于、大于、与号、素数、双素数。实体需要一个 HTML 解析器,您可能并不总是希望在未来使用它,它们会引入错误,使您的代码可读性降低,增加文件大小,并且有时会在各种浏览器中错误地解码,具体取决于您使用的实体。了解如何键入/插入版权、商标、左引号、右引号、撇号、破折号、破折号、项目符号、欧元和您在内容中遇到的任何其他字符,并在您的代码中使用这些实际字符。Mac 有一个字符查看器,您可以在键盘系统偏好设置中打开它,并且您可以找到然后拖放您需要的字符,或使用匹配的键盘查看器查看要键入的键。例如,商标是Option+2。UTF-8 包含来自每种书面人类语言的所有字符和符号。所以没有理由使用 -- 而不是破折号。学习标点符号和排版规则也不是一个坏主意……例如,知道句号在引号内,而不是在引号外。
Using a tag for something like content-type and encoding is highly ironic, since without knowing those things, you couldn't parse the file to get the value of the meta tag.
为内容类型和编码之类的东西使用标签是非常具有讽刺意味的,因为如果不知道这些东西,你就无法解析文件来获取元标签的值。
No, that is not true. The browser starts out parsing the file as the browser's default encoding, either UTF-8 or ISO-8859-1. Since US-ASCII is a subset of both ISO-8859-1 andUTF-8, the browser can read just fine either way ... it is the same. When the browser encounters the meta charset tag, if the encoding is different than what the browser is already using, the browser reloads the page in the specified encoding. That is why we put the meta charset tag at the top, right after the head tag, before anything else, even the title. That way you can use UTF-8 characters in your title.
不,这不是真的。浏览器开始将文件解析为浏览器的默认编码,UTF-8 或 ISO-8859-1。由于 US-ASCII 是 ISO-8859-1和UTF-8的子集,因此浏览器可以很好地读取任何一种方式……这是相同的。当浏览器遇到 meta charset 标签时,如果编码与浏览器已经使用的不同,浏览器会以指定的编码重新加载页面。这就是为什么我们将元字符集标签放在顶部,紧跟在 head 标签之后,在其他任何东西之前,甚至是标题。这样您就可以在标题中使用 UTF-8 字符。
You must save your file(s) in UTF-8 encoding without BOM
您必须以不带 BOM 的 UTF-8 编码保存您的文件
That is not strictly true. If you only have US-ASCII characters in your document, you can Save it as US-ASCII and serve it as UTF-8, because it is a subset. But if there are Unicode characters, you are correct, you must Save as UTF-8 without BOM.
严格来说并非如此。如果您的文档中只有 US-ASCII 字符,则可以将其另存为 US-ASCII 并将其用作 UTF-8,因为它是一个子集。但是如果有Unicode字符,你是对的,你必须另存为没有BOM的UTF-8。
If you want a good text editor that will save your files in UTF-8, I recommend Notepad++.
如果你想要一个好的文本编辑器,以 UTF-8 格式保存你的文件,我推荐 Notepad++。
On the Mac, use Bare Bones TextWrangler (free) from Mac App Store, or Bare Bones BBEdit which is at Mac App Store for $39.99 ... very cheap for such a great tool. In either app, there is a menu at the bottom of the document window where you specify the document encoding and you can easily choose "UTF-8 no BOM". And of course you can set that as the default for new documents in Preferences.
在 Mac 上,使用 Mac App Store 中的 Bare Bones TextWrangler(免费),或 Mac App Store 中的 Bare Bones BBEdit,售价 39.99 美元……对于这样一款出色的工具来说非常便宜。在任一应用程序中,文档窗口底部都有一个菜单,您可以在其中指定文档编码,您可以轻松选择“UTF-8 无 BOM”。当然,您可以将其设置为首选项中新文档的默认值。
But if your Webserver serves the encoding in the HTTP header, which is recommended, both [meta tags] are needless.
但是,如果您的 Web 服务器提供 HTTP 标头中的编码(推荐),则两个 [meta 标记] 都是不必要的。
That is incorrect. You should of course set the encoding in the HTTP header, but you should also set it in the meta charset attribute so that the page can be Saved by the user, out of the browser onto local storage and then Opened again later, in which case the only indication of the encoding that will be present is the meta charset attribute. You should also set a base tag for the same reason ... on the server, the base tag is unnecessary, but when opened from local storage, the base tag enables the page to work as if it is on the server, with all the assets in place and so on, no broken links.
那是不正确的。您当然应该在 HTTP 标头中设置编码,但您也应该在元字符集属性中设置它,以便用户可以将页面从浏览器保存到本地存储,然后稍后再次打开,在这种情况下将出现的编码的唯一指示是元字符集属性。出于同样的原因,您还应该设置一个基本标签......在服务器上,基本标签是不必要的,但是当从本地存储打开时,基本标签使页面能够像在服务器上一样工作,所有的资产到位等等,没有断开的链接。
AddDefaultCharset UTF-8
添加默认字符集 UTF-8
Or you can just change the encoding of particular file types like so:
或者您可以像这样更改特定文件类型的编码:
AddType text/html;charset=utf-8 html
A tip for serving both UTF-8 and Latin-1 (ISO-8859-1) files is to give the UTF-8 files a "text" extension and Latin-1 files "txt."
为 UTF-8 和 Latin-1 (ISO-8859-1) 文件提供服务的一个技巧是为 UTF-8 文件提供“文本”扩展名,为 Latin-1 文件提供“txt”扩展名。
AddType text/plain;charset=iso-8859-1 txt
AddType text/plain;charset=utf-8 text
Finally, consider Saving your documents with Unix line endings, not legacy DOS or (classic) Mac line endings, which don't help and may hurt, especially down the line as we get further and further from those legacy systems. An HTML document with valid HTML5, UTF-8 encoding, and Unix line endings is a job well done. You can share and edit and store and read and recover and rely on that document in many contexts. It's lingua franca. It's digital paper.
最后,考虑使用 Unix 行结束符保存您的文档,而不是传统的 DOS 或(经典)Mac 行结束符,这无济于事,可能会造成伤害,尤其是随着我们越来越远离那些遗留系统。具有有效 HTML5、UTF-8 编码和 Unix 行结尾的 HTML 文档是一项出色的工作。您可以在许多上下文中共享、编辑、存储、阅读和恢复并依赖该文档。这是通用语言。是数码纸。
回答by Omar
<meta charset="utf-8">
was introduced with/for HTML5.
<meta charset="utf-8">
与/为 HTML5 一起引入。
As mentioned in the documentation, both are valid. However, <meta charset="utf-8">
is only for HTML5 (and easier to type/remember).
如文档中所述,两者都是有效的。但是,<meta charset="utf-8">
仅适用于 HTML5(并且更易于输入/记忆)。
In due time, the old style is bound to become deprecatedin the near future. I'd stick to the new <meta charset="utf-8">
.
There's only one way, but up. In tech's case, that's phasing out the old (really, REALLY fast)
在适当的时候,旧样式必然会在不久的将来被弃用。我会坚持使用新的<meta charset="utf-8">
.
只有一种方法,但是向上。在技术的情况下,这是逐步淘汰旧的(真的,非常快)
Documentation:HTML meta charset Attribute—W3Schools
回答by squirrel
While not contesting the other answers, I think the following is worthy of mentioning.
虽然不反对其他答案,但我认为以下内容值得一提。
- The “long” (
http-equiv
) notation and the “short” one are equal, whichever comes first wins; - Web server headers will override all the
<meta>
tags; - BOM (Byte order mark) will override everything, and in many cases it will affect html 4 (and probably other stuff, too);
- If you don't declare any encoding, you will probably get your text in “fallback text encoding” that is defined your browser. Neither in Firefox nor in Chrome it's utf-8;
- In absence of other clues the browser will attempt to read your document as if it was in ASCII to get the encoding, so you can't use any weird encodings (utf-16 with BOM should do, though);
- While the specs say that the encoding declaration must be within the first 512 bytes of the document, most browsers will try reading more than that.
- “长” (
http-equiv
) 符号和“短”符号相等,以先到者为准; - Web 服务器标头将覆盖所有
<meta>
标签; - BOM(字节顺序标记)将覆盖所有内容,并且在许多情况下它会影响 html 4(可能还有其他内容);
- 如果您不声明任何编码,您可能会在浏览器定义的“回退文本编码”中获取文本。在 Firefox 和 Chrome 中都不是 utf-8;
- 在没有其他线索的情况下,浏览器将尝试读取您的文档,就好像它是 ASCII 一样以获取编码,因此您不能使用任何奇怪的编码(尽管应该使用带有 BOM 的 utf-16);
- 虽然规范说编码声明必须在文档的前 512 个字节内,但大多数浏览器会尝试阅读更多。
You can test by running echo 'HTTP/1.1 200 OK\r\nContent-type: text/html; charset=windows-1251\r\n\r\n\xef\xbb\xbf<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta charset="windows-1251"><title>привет</title></head><body>привет</body></html>' | nc -lp 4500
and pointing your browser at localhost:4500
. (Of course you will want to change or remove parts. The BOM part is \xef\xbb\xbf
. Be wary of the encoding of your shell.)
您可以通过运行echo 'HTTP/1.1 200 OK\r\nContent-type: text/html; charset=windows-1251\r\n\r\n\xef\xbb\xbf<!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta charset="windows-1251"><title>привет</title></head><body>привет</body></html>' | nc -lp 4500
浏览器并将其指向进行测试localhost:4500
。(当然,您会想要更改或删除部分。BOM 部分是\xef\xbb\xbf
。注意外壳的编码。)
Please mind that it's very important that you explicitly declare the encoding. Letting browsers guess can lead to security issues.
请注意,明确声明编码非常重要。让浏览器猜测可能会导致安全问题。
回答by Timo Huovinen
Use <meta charset="utf-8" />
for web browsers when using HTML5.
使用<meta charset="utf-8" />
使用HTML5时的Web浏览器。
Use <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
when using HTML4 or XHTML, or for outdated dom parsers, like DOMDocument
in php 5.3
使用<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
使用HTML4和XHTML时,或过时的DOM解析器,像DOMDocument
在PHP 5.3
回答by user10089632
There is some news based on Mozilla Foundation, and sitepoint
有一些基于Mozilla Foundation和sitepoint 的新闻
Do not use this value (
http-equiv=content-type
) as it is obsolete. Prefer thecharset
attribute on the <meta
> element.
请勿使用此值 (
http-equiv=content-type
),因为它已过时。首选charset
<meta
> 元素上的属性。
回答by chelder
To embed a signature on an email, I would use the long version:
要在电子邮件中嵌入签名,我将使用长版本:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The reason is that not many email readers use html5, so it's always better use old html styles. Actually, it's better to use tables than divs + css as well.
原因是没有多少电子邮件阅读器使用 html5,所以最好使用旧的 html 样式。实际上,使用表格也比使用 divs + css 更好。