用于“” HTML 中的实体

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25916943/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 02:43:39  来源:igfitidea点击:

Uses for the '"' entity in HTML

htmlxhtmlescapinglinq-to-xmlhtml-entities

提问by DavidRR

I am revising some XHTMLfiles authored by another party. As part of this effort, I am doing some bulk editing via Linq to XML.

我正在修改由另一方创作的一些XHTML文件。作为这项工作的一部分,我正在通过Linq to XML进行一些批量编辑。

I've just noticed that some of the original source XHTML files contain the "HTML entityin text nodes within those files. For instance:

我刚刚注意到一些原始源 XHTML 文件在这些文件的文本节点中包含"HTML 实体。例如:

<p>Greeting: &quot;Hello, World!&quot;</p>

And that when recovering the XHTML text via XElement.ToString(), the &quot;entities are being replaced by plain double-quotes:

并且当通过XElement.ToString()恢复 XHTML 文本时,&quot;实体被替换为纯双引号

<p>Greeting: "Hello, World!"</p>

Question:Can anyone tell me what the motivation might have been for the original author to use the &quot;entities instead of plain double-quotes? Did those entities serve a purpose which I don't fully appreciate? Or, were they truly unnecessary as I suspect?

问题:谁能告诉我原作者使用&quot;实体而不是简单的双引号的动机是什么?这些实体有没有达到我不完全理解的目的?或者,它们真的像我怀疑的那样没有必要吗?

I do understand that &quot;would be necessary in certain contexts, such as when there is a need to place a double-quote within an HTML attribute. For instance:

我确实理解这&quot;在某些情况下是必要的,例如需要在 HTML 属性中放置双引号时。例如:

<a href="/images/hello_world.jpg" alt="Greeting: &quot;Hello, World!&quot;">
  Greeting</a>

采纳答案by Jukka K. Korpela

It is impossible, and unnecessary, to know the motivation for using &quot;in element content, but possible motives include: misunderstanding of HTML rules; use of software that generates such code (probably because its author thought it was “safer”); and misunderstanding of the meaning of &quot;: many people seem to think it produces “smart quotes” (they apparently never looked at the actual results).

不可能也没有必要知道&quot;在元素内容中使用的动机,但可能的动机包括:对 HTML 规则的误解;使用生成此类代码的软件(可能是因为其作者认为它“更安全”);以及对含义的误解&quot;:许多人似乎认为它会产生“智能报价”(他们显然从未看过实际结果)。

Anyway, there is never any need to use &quot;in element content in HTML (XHTML or any other HTML version). There is nothing in any HTML specification that would assign any special meaning to the plain character " there.

无论如何,永远不需要&quot;在 HTML(XHTML 或任何其他 HTML 版本)中的元素内容中使用。任何 HTML 规范中都没有任何内容可以为纯字符 " 分配任何特殊含义。

As the question says, it has its role in attribute values, but even in them, it is mostly simpler to just use single quotes as delimiters if the value contains a double quote, e.g. alt='Greeting: "Hello, World!"'or, if you are allowed to correct errors in natural language texts, to use proper quotation marks, e.g. alt="Greeting: “Hello, World!”"

正如问题所说,它在属性值中发挥作用,但即使在属性值中,如果值包含双引号,则仅使用单引号作为分隔符通常更简单,例如,alt='Greeting: "Hello, World!"'或者,如果您被允许在自然语言中纠正错误文本,使用适当的引号,例如alt="Greeting: “Hello, World!”"

回答by Lee

Reason #1

原因#1

There was a point where buggy/lazy implementations of HTML/XHTML renderers were more common than those that got it right. Many years ago, I regularly encountered rendering problems in mainstream browsers resulting from the use of unencoded quote chars in regular text content of HTML/XHTML documents. Though the HTML spec has never disallowed use of these chars in text content, it became fairly standard practice to encode them anyway, so that non-spec-compliant browsers and other processors would handle them more gracefully. As a result, many "old-timers" may still do this reflexively. It is not incorrect, though it is now probably unnecessary, unless you're targeting some very archaic platforms.

有一点是,HTML/XHTML 渲染器的错误/懒惰实现比那些正确的实现更常见。许多年前,我经常在主流浏览器中遇到渲染问题,原因是在 HTML/XHTML 文档的常规文本内容中使用了未编码的引号字符。尽管 HTML 规范从未禁止在文本内容中使用这些字符,但无论如何编码它们已成为相当标准的做法,以便不符合规范的浏览器和其他处理器可以更优雅地处理它们。因此,许多“老前辈”可能仍然条件反射地这样做。这并没有错,尽管现在可能没有必要,除非您针对的是一些非常陈旧的平台。

Reason #2

原因#2

When HTML content is generated dynamically, for example, by populating an HTML template with simple string values from a database, it's necessary to encode each value before embedding it in the generated content. Some common server-side languages provided a single function for this purpose, which simply encoded all chars that mightbe invalid in somecontext within an HTML document. Notably, PHP's htmlspecialchars()function is one such example. Though there are optional arguments to htmlspecialchars()that will cause it to ignore quotes, those arguments were (and are) rarely used by authors of basic template-driven systems. The result is that all "special chars" are encoded everywhere they occur in the generated HTML, without regard for the context in which they occur. Again, this is not incorrect, it's simply unnecessary.

当动态生成 HTML 内容时,例如,通过使用来自数据库的简单字符串值填充 HTML 模板,有必要对每个值进行编码,然后再将其嵌入到生成的内容中。一些常见的服务器端语言为此目的提供了一个单一的功能,它只是对在 HTML 文档中的某些上下文中可能无效的所有字符进行编码。值得注意的是,PHP 的函数就是这样一个例子。虽然有可选参数htmlspecialchars()htmlspecialchars()这将导致它忽略引号,这些参数过去(并且现在)很少被基本模板驱动系统的作者使用。结果是所有“特殊字符”都在它们出现在生成的 HTML 中的任何地方进行编码,而不考虑它们出现的上下文。同样,这并没有错,只是没有必要。

回答by comdiv

In my experience it may be the result of auto-generation by a string-based tools, where the author did not understand the rules of HTML.

根据我的经验,这可能是基于字符串的工具自动生成的结果,作者不了解 HTML 的规则。

When some developers generate HTML without the use of special XML-oriented tools, they may try to be sure the resulting HTML is valid by taking the approach that everything must be escaped.

当一些开发人员在不使用特殊的面向 XML 的工具的情况下生成 HTML 时,他们可能会尝试通过采取一切都必须转义的方法来确保生成的 HTML 是有效的。

Referring to your example, the reason why every occurrence of "is represented by &quot;could be because using that approach, you can safely use such "special" characters in both attributes and values.

参考您的示例,每次出现的"都由 表示的原因&quot;可能是因为使用该方法,您可以安全地在属性和值中使用此类“特殊”字符。

Another motivation I've seen is where people believe, "We must explicitly show that our symbols are not part of the syntax." Whereas, valid HTML can be created by using the proper string-manipulation tools, see the previous paragraph again.

我看到的另一个动机是人们相信“我们必须明确表明我们的符号不是语法的一部分。” 然而,可以使用适当的字符串操作工具创建有效的 HTML,请再次参见上一段。

Here is some pseudo-code loosely based on C#, although it is preferred to use valid methods and tools:

下面是一些基于 C# 的伪代码,尽管最好使用有效的方法和工具:

public class HtmlAndXmlWriter
{
    private string Escape(string badString)
    {
        return badString.Replace("&", "&amp;").Replace("\"", "&quot;").Replace("'", "&apos;").Replace(">", "&gt;").Replace("<", "&lt;");

    }

    public string GetHtmlFromOutObject(Object obj)
    {
        return "<div class='type_" + Escape(obj.Type) + "'>" + Escape(obj.Value) + "</div>";    

    }

}

It's really very common to see such approaches taken to generate HTML.

使用此类方法生成 HTML 确实很常见。

回答by Foumpie

As other answers pointed out, it is most likely generated by some tool.

正如其他答案所指出的那样,它很可能是由某种工具生成的。

But if I were the original author of the file, my answer would be: Consistency.

但如果我是该文件的原作者,我的答案将是:Consistency

If I am not allowed to put double quotes in my attributes, why put them in the element's content ? Why do these specs always have these exceptional cases .. If I had to write the HTML spec, I would say All double quotes need to be encoded. Done.

如果我不允许在我的属性中放置双引号,为什么要将它们放在元素的内容中?为什么这些规范总是有这些例外情况.. 如果我必须编写 HTML 规范,我会说All double quotes need to be encoded。完毕。

Today it is like In attribute values we need to encode double quotes, except when the attribute value itself is defined by single quotes. In the content of elements, double quotes can be, but are not required to be, encoded.(And I am surely forgetting some cases here).

今天就像In attribute values we need to encode double quotes, except when the attribute value itself is defined by single quotes. In the content of elements, double quotes can be, but are not required to be, encoded.(我肯定忘记了这里的一些案例)。

Double quotes are a keyword of the spec, encode them. Lesser/greater than are a keyword of the spec, encode them. etc..

双引号是规范的关键字,对它们进行编码。小于/大于规范的关键字,对它们进行编码。等等..

回答by Timmmm

It is likely because they used a single function for escaping attributes and text nodes. &amp;doesn't do any harm so why complicate your code and make it more error-prone by having two escaping functions and having to pick between them?

这可能是因为他们使用了一个函数来转义属性和文本节点。&amp;不会造成任何伤害,那么为什么要使您的代码复杂化并通过具有两个转义函数并且必须在它们之间进行选择而使其更容易出错呢?