什么时候应该使用 HTML 实体?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/436615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 22:59:02  来源:igfitidea点击:

When should one use HTML entities?

htmlxhtmlhtml-entities

提问by allesklar

This has been confusing me for some time. With the advent of UTF-8 as the de-facto standard in web development I'm not sure in which situations I'm supposed to use the HTML entities and for which ones should I just use the UTF-8 character. For example,

这让我困惑了一段时间。随着 UTF-8 作为 Web 开发中事实上的标准的出现,我不确定在哪些情况下应该使用 HTML 实体,哪些情况下我应该只使用 UTF-8 字符。例如,

  • em dash (–, &emdash;)
  • ampersand (&, &)
  • 3/4 fraction (?, ¾)
  • 破折号 (-, &emdash;)
  • 与号 (&, &)
  • 3/4 分数 (?, ¾)

Please do shed light on this issue. It will be appreciated.

请务必阐明这个问题。将不胜感激。

采纳答案by JacquesB

You don't generally need to use HTML character entities if your editor supports Unicode. Entities can be useful when:

如果您的编辑器支持 Unicode,您通常不需要使用 HTML 字符实体。实体在以下情况下很有用:

  • Your keyboard does not support the character you need to type. For example, many keyboards do not have em-dash or the copyright symbol.
  • Your editor does not support Unicode (very common some years ago, but probably not today).
  • You want to make it explicit in the source what is happening. For example, the  code is clearer than the corresponding white space character.
  • You need to escape HTML special characters like <, &, or ".
  • 您的键盘不支持您需要输入的字符。例如,许多键盘没有长划线或版权符号。
  • 您的编辑器不支持 Unicode(几年前很常见,但今天可能不支持)。
  • 您希望在源代码中明确说明正在发生的事情。例如,&nbsp;代码比相应的空白字符更清晰。
  • 您需要HTML特殊字符转义喜欢<&"

回答by William Brendel

Based on the comments I have received, I looked into this a little further. It seems that currently the best practice is to forgo using HTML entities and use the actual UTF-8 character instead. The reasons listed are as follows:

根据我收到的评论,我进一步研究了这一点。目前似乎最好的做法是放弃使用 HTML 实体并使用实际的 UTF-8 字符代替。列出的原因如下:

  1. UTF-8 encodings are easier to read and edit for those who understand what the character means and know how to type it.
  2. UTF-8 encodings are just as unintelligible as HTML entity encodings for those who don't understand them, but they have the advantage of rendering as special characters rather than hard to understand decimal or hex encodings.
  1. UTF-8 编码对于了解字符含义并知道如何键入的人来说更易于阅读和编辑。
  2. 对于那些不理解它们的人来说,UTF-8 编码与 HTML 实体编码一样难以理解,但它们具有呈现为特殊字符而不是难以理解的十进制或十六进制编码的优势。

As long as your page's encoding is properly set to UTF-8, you should use the actual character instead of an HTML entity. I read several documents about this topic, but the most helpful were:

只要您的页面编码正确设置为 UTF-8,您就应该使用实际字符而不是 HTML 实体。我阅读了一些关于这个主题的文件,但最有帮助的是:

From the UTF-8: The Secret of Character Encodingarticle:

来自UTF-8:字符编码的秘密文章:

Wikipedia is a great case study for an application that originally used ISO-8859-1 but switched to UTF-8 when it became far too cumbersome to support foreign languages. Bots will now actually go through articles and convert character entities to their corresponding real characters for the sake of user-friendliness and searchability.

维基百科是一个很好的案例研究,它最初使用 ISO-8859-1,但当它变得过于繁琐而无法支持外语时切换到 UTF-8。为了用户友好性和可搜索性,机器人现在将实际浏览文章并将角色实体转换为相应的真实角色

That article also gives a nice example involving Chinese encoding. Here is the abbreviated example for the sake of laziness:

那篇文章还给出了一个很好的例子,涉及中文编码。为了懒惰,这里是缩写的例子:

UTF-8:

UTF-8:

這兩個字是甚麼意思

這兩個字是甚麼意思

HTML Entities:

HTML实体

&#36889;&#20841;&#20491;&#23383;&#26159;&#29978;&#40636;&#24847;&#24605;

&#36889;&#20841;&#20491;&#23383;&#26159;&#29978;&#40636;&#24847;&#24605;

The UTF-8 and HTML entity encodings are both meaningless to me, but at least the UTF-8 encoding is recognizable as a foreign language, and it will render properly in an edit box. The article goes on to say the following about the HTML entity-encoded version:

UTF-8 和 HTML 实体编码对我来说都没有意义,但至少 UTF-8 编码可以识别为外语,并且它会在编辑框中正确呈现。文章接着说以下关于 HTML 实体编码版本的内容:

Extremely inconvenient for those of us who actually know what character entities are, totally unintelligible to poor users who don't! Even the slightly more user-friendly, "intelligible" character entities like &theta; will leave users who are uninterested in learning HTML scratching their heads. On the other hand, if they see θ in an edit box, they'll know that it's a special character, and treat it accordingly, even if they don't know how to write that character themselves.

对于我们这些真正知道角色实体是什么的人来说非常不方便,对于不知道角色实体的可怜用户来说完全无法理解!即使是像 θ 这样稍微更用户友好、“可理解”的字符实体。会让对学习 HTML 不感兴趣的用户摸不着头脑。另一方面,如果他们在编辑框中看到 θ,他们就会知道这是一个特殊字符,并相应地对待它,即使他们自己不知道如何编写该字符。

As others have noted, you still have to use HTML entities for reserved XML characters (ampersand, less-than, greater-than).

正如其他人所指出的,您仍然必须将 HTML 实体用于保留的 XML 字符(与号、小于号、大于号)。

回答by Ned Batchelder

I would not use UTF-8 for characters that are easily confused visually. For example, it is difficult to distinguish an emdash from a minus, or especially a non-breaking space from a space. For these characters, definitely use entities.

对于容易在视觉上混淆的字符,我不会使用 UTF-8。例如,很难区分 emdash 和减号,或者特别是不间断空格和空格。对于这些角色,一定要使用实体。

For characters that are easily understood visually (such as the chinese examples above), go ahead and use UTF-8 if you like.

对于视觉上容易理解的字符(例如上面的中文示例),如果您愿意,请继续使用 UTF-8。

回答by Marco Luglio

Personally I do everything in utf-8 since a long time, however, in an html page, you always need to convert ampersands (&), greater than (>) and lesser then (<) characters to their equivalent entities, &amp;, &gt; and &lt;

就我个人而言,很长一段时间以来我都在 utf-8 中做所有事情,但是,在 html 页面中,您总是需要将与符号 (&)、大于 (>) 和小于 (<) 字符转换为它们的等效实体,&, > 并且<

Also, if you intend on doing some programming using utf-8 text, there are a few thing to watch for.

此外,如果您打算使用 utf-8 文本进行一些编程,则需要注意一些事项。

  • XML needs some extra lines to validate when using entities.
  • Some libraries do not play along nice with utf-8. For instance, PHP in some Linux distributions dropped full support for utf-8 in their regular expression libraries.
  • It is harder to limit the number of characters in a text that uses html entities, because a single entity uses many characters. Also there's always the risk of cutting the entity in half.
  • 当使用实体时,XML 需要一些额外的行来验证。
  • 一些库与 utf-8 不兼容。例如,某些 Linux 发行版中的 PHP 在其正则表达式库中放弃了对 utf-8 的完全支持。
  • 限制使用 html 实体的文本中的字符数比较困难,因为单个实体使用多个字符。此外,始终存在将实体切成两半的风险。

回答by mjy

HTML entities are useful when you want to generate content that is going to be included (dynamically) into pages with (several) different encodings. For example, we have white label content that is included both into ISO-8859-1 and UTF-8 encoded web pages...

当您想要生成将(动态)包含在具有(几种)不同编码的页面中的内容时,HTML 实体非常有用。例如,我们有包含在 ISO-8859-1 和 UTF-8 编码网页中的白标内容......

If character set conversion from/to UTF-8 wasn't such a big unreliable mess (you always stumble over some characters and some tools that don't convert properly), standardizing on UTF-8 would be the way to go.

如果从/到 UTF-8 的字符集转换不是那么大的不可靠的混乱(您总是偶然发现某些字符和一些不能正确转换的工具),那么在 UTF-8 上进行标准化将是要走的路。

回答by Jim Puls

Entities may buy you some compatibility with brain-dead clients that don't understand encodings correctly. I don't believe that includes any current browsers, but you never know what other kinds of programs might be hitting you up.

实体可能会为您购买一些与无法正确理解编码的脑残客户端的兼容性。我不相信这包括任何当前的浏览器,但你永远不知道还有哪些其他类型的程序可能会攻击你。

More useful, though, is that HTML entities protect you from your own errors: if you misconfigure something on the server and you end up serving a page with an HTTP header that says it's ISO-8859-1and a METAtag that says it's UTF-8, at least your &mdash;es will always work.

不过,更有用的是 HTML 实体可以保护您免受自己的错误的影响:如果您在服务器上错误配置了某些内容,并且最终提供的页面带有一个表示它是的 HTTP 标头ISO-8859-1和一个表示它是的META标签,UTF-8至少是您的 —es将永远工作。

回答by Otávio Décio

If your pages are correctly encoded in utf-8 you should have no need for html entities, just use the characters you want directly.

如果您的页面以 utf-8 正确编码,您应该不需要 html 实体,只需直接使用您想要的字符。

回答by blabla999

All of the previous answers make sense to me.

以前的所有答案对我来说都有意义。

In addition: It mostly depends on the editor you intent to use and the document language. As a minimum requirement for the editor is that it supports the document language. That means, that if your text is in japanese, beware of using an editor which does not show them (i.e. no entities for the document itself). If its english, you can even use an old vim-like editor and use entities only for the relative seldom &copy; and friends. Of course: &gt; for > and other HTML-specials still need escapes. But even with the other latin-1 languages (german, french etc.) writing ä is a pain in you know where...

另外:这主要取决于您打算使用的编辑器和文档语言。编辑器的最低要求是它支持文档语言。这意味着,如果您的文本是日语,请注意使用不显示它们的编辑器(即文档本身没有实体)。如果它是英文的,你甚至可以使用一个旧的类似 vim 的编辑器,并且只在相对很少的情况下使用实体 © 和朋友。当然:> for > 和其他 HTML-specials 仍然需要转义。但即使使用其他拉丁语 1 语言(德语、法语等),编写 ä 也是一种痛苦,你知道在哪里......

In addition, I personally write entities for invisible characters and those which are looking similar to standard-ascii and are therefore easily confused. For example, there is u1173 (looking like a dash in some charsets) or u1175, which looks like the vertical bar. I'd use entities for those in any case.

此外,我个人为不可见字符和那些看起来类似于标准 ascii 的实体编写实体,因此很容易混淆。例如,有 u1173(在某些字符集中看起来像一个破折号)或 u1175,它看起来像竖线。在任何情况下,我都会为那些使用实体。