Html 我真的需要将“&”编码为“&”吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3493405/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Do I really need to encode '&' as '&'?
提问by Haroldo
I'm using an '&
' symbol with HTML5 and UTF-8 in my site's <title>
. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles.
我&
在我网站的<title>
. 谷歌在其 SERP 上显示 & 符号,标题中的所有浏览器也是如此。
http://validator.w3.orgis giving me this:
http://validator.w3.org给了我这个:
& did not start a character reference. (& probably should have been escaped as
&
.)
& 没有开始字符引用。(& 可能应该被转义为
&
.)
Do I really need to do &
?
我真的需要这样做&
吗?
I'm not fussed about my pages validating for the sake of validating, but I'm curious to hear people's opinions on this and if it's important and why.
我对我的页面为了验证而验证并不大惊小怪,但我很想听听人们对此的看法,以及它是否重要以及为什么。
回答by Delan Azabani
Yes. Just as the error said, in HTML, attributes are #PCDATA meaning they're parsed. This means you can use character entities in the attributes. Using &
by itself is wrong and if not for lenient browsers and the fact that this is HTML not XHTML, would break the parsing. Just escape it as &
and everything would be fine.
是的。正如错误所说,在 HTML 中,属性是 #PCDATA 意味着它们被解析。这意味着您可以在属性中使用字符实体。单独使用&
是错误的,如果不是宽松的浏览器以及这是 HTML 而不是 XHTML 的事实,将破坏解析。只是逃避它&
,一切都会好起来的。
HTML5 allows you to leave it unescaped, but only when the data that follows does not look like a valid character reference. However, it's better just to escape all instances of this symbol than worry about which ones should be and which ones don't need to be.
HTML5 允许您不转义它,但前提是后面的数据看起来不像有效的字符引用。然而,最好只是逃避这个符号的所有实例,而不是担心哪些应该是,哪些不需要。
Keep this point in mind; if you're not escaping & to &, it's bad enough for data that you create (where the code could very well be invalid), you might also not be escaping tag delimiters, which is a huge problem for user-submitted data, which could very well lead to HTML and script injection, cookie stealing and other exploits.
记住这一点;如果你没有转义 & 到 &,这对于你创建的数据来说已经够糟糕了(代码很可能是无效的),你也可能没有转义标签分隔符,这对于用户提交的数据来说是一个巨大的问题,这很可能导致 HTML 和脚本注入、cookie 窃取和其他漏洞利用。
Please just escape your code. It will save you a lot of trouble in the future.
请转义您的代码。以后会省去很多麻烦。
回答by Richard JP Le Guen
Validation aside, the fact remains that encoding certain characters is important to an HTML document so that it can render properly and safely as a web page.
撇开验证不谈,事实仍然是编码某些字符对 HTML 文档很重要,以便它可以作为网页正确和安全地呈现。
Encoding &
as &
under all circumstances, for me, is an easier rule to live by, reducing the likelihood of errors and failures.
编码&
为&
在任何情况下,对我来说,是生活,减少错误和失败的可能性更简单的规则。
Compare the following: which is easier? which is easier to bugger up?
比较以下内容:哪个更容易?哪个更容易搞砸?
Methodology 1
方法一
- Write some content which includes ampersand characters.
- Encode them all.
- 写一些包含与号字符的内容。
- 将它们全部编码。
Methodology 2
方法二
(with a grain of salt, please ;) )
(请加一点盐;))
- Write some content which includes a ampersand characters.
- On a case-by-case basis, look at each ampersand. Determine if:
- It is isolated, and as such unambiguously an ampersand. eg.
volt & amp
> In that case don't bother encoding it. - It is not isolated, but you feel it is nonetheless unambiguous, as the resulting entity does not exist and will never exist since the entity list could never evolve. eg
amp&volt
> In that case don't bother encoding it. - It is not isolated, and ambiguous. eg.
volt&
> Encode it.
- It is isolated, and as such unambiguously an ampersand. eg.
- 写一些包含与号字符的内容。
- 根据具体情况,查看每个 & 符号。确定是否:
- 它是孤立的,因此毫无疑问是一个&符号。例如。
volt & amp
> 在这种情况下,不要打扰编码。 - 它不是孤立的,但您仍然觉得它是明确的,因为生成的实体不存在并且永远不会存在,因为实体列表永远不会演变。例如
amp&volt
> 在这种情况下,不要打扰编码。 - 它不是孤立的,也是模棱两可的。例如。
volt&
> 编码。
- 它是孤立的,因此毫无疑问是一个&符号。例如。
??
??
回答by Matthew Wilson
HTML5 rules are different from HTML4. It's not required in HTML5 - unless the ampersand looks like it starts a parameter name. "©=2" is still a problem, for example, since © is the copyright symbol.
HTML5 规则与 HTML4 不同。它在 HTML5 中不是必需的 - 除非与号看起来像它作为参数名称的开头。"©=2" 仍然是一个问题,例如,因为 © 是版权符号。
However it seems to me that it's harder work to decide to encode or not to encode depending on the following text. So the easiest path is probably to encode all the time.
然而,在我看来,根据以下文本决定编码或不编码更难。所以最简单的方法可能是一直编码。
回答by Ryan Kinal
I think this has turned into more of a question of "why follow the spec when browser's don't care." Here is my generalized answer:
我认为这更多地变成了“当浏览器不在乎时为什么要遵循规范”的问题。这是我的概括答案:
Standards are not a "present" thing. They are a "future" thing. If we, as developers, follow web standards, then browser vendors are more likely to correctly implement those standards, and we move closer to a completely interoperable web, where CSS hacks, feature detection, and browser detection are not necessary. Where we don't have to figure out why our layouts break in a particular browser, or how to work around that.
标准不是“现在”的东西。它们是“未来”的东西。如果我们作为开发人员遵循 Web 标准,那么浏览器供应商更有可能正确实施这些标准,并且我们会更接近一个完全可互操作的 Web,其中 CSS 黑客、功能检测和浏览器检测是不必要的。我们不必弄清楚为什么我们的布局在特定浏览器中会中断,或者如何解决这个问题。
Specifically, if HTML5 does not require using & in your specific situation, and you're using an HTML5 doctype (and also expecting your users to be using HTML5-compliant browsers), then there is no reason to do it.
具体来说,如果 HTML5 不需要使用 & 在您的特定情况下,并且您正在使用 HTML5 文档类型(并且还期望您的用户使用符合 HTML5 的浏览器),那么没有理由这样做。
回答by Thomas Bonini
Well, if it comes from user input then absolutely yes, for obvious reasons. Think if this very website didn't do it: the title of this question would show up as do i really need to encode ‘&' as ‘&'?
好吧,如果它来自用户输入,那么绝对是的,原因显而易见。想想如果这个网站没有这样做:这个问题的标题会显示为我真的需要将“&”编码为“&”吗?
If it's just something like echo '<title>Dolce & Gabbana</title>';
then strictly speaking you don't have to. It would be better, but if you don't no user will notice the difference.
如果它只是像echo '<title>Dolce & Gabbana</title>';
那么严格来说你没有必要。这样做会更好,但如果您不这样做,则没有用户会注意到差异。
回答by AakashM
Could you show us what your title
actually is? When I submit
你能告诉我们你的title
实际情况吗?当我提交
<!DOCTYPE html>
<html>
<title>Dolce & Gabbana</title>
<body>
<p>am i allowed loose & mpersands?</p>
</body>
</html>
to http://validator.w3.org/- explicitly asking it to use the experimental HTML 5 mode- it has no complaints about the &
s...
到http://validator.w3.org/-明确要求它使用实验性 HTML 5 模式- 它没有抱怨&
...
回答by Gumbo
In HTML a &
marks the begin of a reference, either of a character referenceor of an entity reference. From that point on the parser expects either a #
denoting a character reference, or an entity name denoting an entity reference, both followed by a ;
. That's the normal behavior.
在 HTML 中,a&
标记引用的开始,无论是字符引用还是实体引用。从那时起,解析器需要一个#
表示字符引用的实体名称,或者表示实体引用的实体名称,两者后跟一个;
. 这是正常的行为。
But if the reference name or just the reference opening &
is followed by a white space or other delimiters like "
, '
, <
, >
, &
, the ending ;
and even a reference to represent a plain &
can be omitted:
但是,如果引用名称或仅引用开头&
后跟一个空格或其他分隔符,如"
, '
, <
, >
, &
,则结尾;
甚至表示普通的引用&
都可以省略:
<p title="&">foo & bar</p>
<p title="&">foo & bar</p>
<p title="&">foo & bar</p>
Only in these cases the ending ;
or even the reference itself can be omitted (at least in HTML 4). I think HTML 5 requires the ending ;
.
只有在这些情况下,结尾;
甚至引用本身才能被省略(至少在 HTML 4 中)。我认为 HTML 5 需要结尾;
.
But the specification recommendsto always use a reference like the character reference &
or the entity reference &
to avoid confusion:
但是规范建议始终使用像字符引用&
或实体引用这样的引用&
以避免混淆:
Authors should use "
&
" (ASCII decimal 38) instead of "&
" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&
" in attribute values since character references are allowed within CDATA attribute values.
作者应使用“
&
”(ASCII 十进制 38)而不是“&
”,以避免与字符引用的开头(实体引用的开放分隔符)混淆。作者还应该&
在属性值中使用“ ”,因为在 CDATA 属性值中允许字符引用。
回答by Nishant
Update (March 2020):The W3C validator no longer complains about escaping URLs.
更新(2020 年 3 月):W3C 验证器不再抱怨转义 URL。
I was checking why Image URL's need escaping, hence tried it in https://validator.w3.org. The explanation is pretty nice. It highlights that even URL's need to be escaped. [PS:I guess it will unescaped when its consumed since URL's need &
. Can anyone clarify?]
我正在检查为什么图像 URL 需要转义,因此在https://validator.w3.org 中进行了尝试。解释很不错。它强调甚至需要对 URL 进行转义。[PS:我猜它会因为 URL 的需要而被消耗掉&
。谁能解释一下?】
<img alt="" src="foo?bar=qut&qux=fop" />
An entity reference was found in the document, but there is no reference by that name defined. Often this is caused by misspelling the reference name, unencoded ampersands, or by leaving off the trailing semicolon (;). The most common cause of this error is unencoded ampersands in URLs as described by the WDG in "Ampersands in URLs". Entity references start with an ampersand (&) and end with a semicolon (;). If you want to use a literal ampersand in your document you must encode it as "&" (even inside URLs!). Be careful to end entity references with a semicolon or your entity reference may get interpreted in connection with the following text. Also keep in mind that named entity references are case-sensitive; &Aelig; and æ are different characters. If this error appears in some markup generated by PHP's session handling code, this article has explanations and solutions to your problem.
在文档中找到了实体引用,但没有定义该名称的引用。这通常是由于引用名称拼写错误、未编码的 & 符号或省略了尾随分号 (;) 造成的。此错误的最常见原因是 URL 中未编码的&符号,如 WDG 在“URL 中的&符号”中所述。实体引用以与号 (&) 开头,以分号 (;) 结尾。如果您想在文档中使用文字 & 符号,您必须将其编码为“&”(甚至在 URL 中!)。小心以分号结束实体引用,否则您的实体引用可能会被解释为与以下文本相关。还要记住,命名实体引用区分大小写;&Aelig; 和 æ 是不同的字符。
回答by Dean J
If the user passes it to you, or it will wind up in a URL, you need to escape it.
如果用户将它传递给您,或者它会在 URL 中结束,您需要对其进行转义。
If it appears in static text on a page? All browsers will get this one right either way, you don't worry much about it, since it will work.
如果它出现在页面上的静态文本中?无论哪种方式,所有浏览器都会正确执行此操作,您不必担心,因为它会起作用。
回答by Guffa
Yes, you should try to serve valid code if possible.
是的,如果可能,您应该尝试提供有效的代码。
Most browsers will silently correct this error, but there is a problem with relying on the error handling in the browsers. There is no standard for how to handle incorrect code, so it's up to each browser vendor to try to figure out what to do with each error, and the results may vary.
大多数浏览器会默默地纠正这个错误,但是依赖浏览器中的错误处理存在问题。如何处理不正确的代码没有标准,因此每个浏览器供应商都试图找出如何处理每个错误,结果可能会有所不同。
Some examples where browsers are likely to react differently is if you put elements inside a table but outside the table cells, or if you nest links inside each other.
浏览器可能会有不同反应的一些示例是,如果您将元素放在表格内但在表格单元格之外,或者如果您将链接嵌套在彼此内部。
For your specific example it's not likely to cause any problems, but error correction in the browser might for example cause the browser to change from standards compliant mode into quirks mode, which could make your layout break down completely.
对于您的特定示例,它不太可能导致任何问题,但浏览器中的错误更正可能会导致浏览器从标准兼容模式更改为 quirks 模式,这可能会使您的布局完全崩溃。
So, you should correct errors like this in the code, if not for anything else so to keep the error list in the validator short, so that you can spot more serious problems.
因此,您应该更正代码中的此类错误,如果不是为了其他任何事情,则应使验证器中的错误列表保持简短,以便您可以发现更严重的问题。