Html 我是否在 <a href...> 中编码 & 符号?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3705591/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Do I encode ampersands in <a href...>?
提问by JW.
I'm writing code that automatically generates HTML, and I want it to encode things properly.
我正在编写自动生成 HTML 的代码,我希望它正确编码。
Say I'm generating a link to the following URL:
假设我正在生成指向以下 URL 的链接:
http://www.google.com/search?rls=en&q=stack+overflow
I'm assuming that all attribute values should be HTML-encoded. (Please correct me if I'm wrong.) So that means if I'm putting the above URL into an anchor tag, I should encode the ampersand as &
, like this:
我假设所有属性值都应该是 HTML 编码的。(如果我错了,请纠正我。)所以这意味着如果我将上面的 URL 放入锚标记中,我应该将 & 符号编码为&
,如下所示:
<a href="http://www.google.com/search?rls=en&q=stack+overflow">
Is that correct?
那是对的吗?
回答by zneak
Yes, it is. HTML entities are parsed inside HTML attributes, and a stray &
would create an ambiguity. That's why you should always write &
instead of just &
inside allHTML attributes.
是的。HTML 实体在 HTML 属性内进行解析,并且杂散&
会产生歧义。这就是为什么您应该始终编写&
而不是只&
在所有HTML 属性中编写。
That said, only &
and quotes needto be encoded. If you have special characters like é
in your attribute, you don't need to encode those to satisfy the HTML parser.
也就是说,只需要对&
和 引号进行编码。如果您的属性中有特殊字符,则不需要对这些字符进行编码以满足 HTML 解析器的要求。é
It used to be the case that URLs needed special treatment with non-ASCII characters, like é
. You had to encode those using percent-escapes, and in this case it would give %C3%A9
, because they were defined by RFC 1738. However, RFC 1738 has been superseded by RFC 3986(URIs, Uniform Resource Identifiers) and RFC 3987(IRIs, Internationalized Resource Identifiers), on which the WhatWG based its work to define how browsers should behave when they see an URL with non-ASCII characters in it since HTML5. It's therefore now safe to include non-ASCII characters in URLs, percent-encoded or not.
过去,URL 需要使用非 ASCII 字符进行特殊处理,例如é
. 您必须使用百分比转义对那些进行编码,在这种情况下它会给出%C3%A9
,因为它们是由RFC 1738定义的。但是,RFC 1738 已被RFC 3986(URI,统一资源标识符)和RFC 3987(IRI,国际化资源标识符)所取代,WhatWG 基于其工作来定义浏览器在看到带有非 ASCII 的 URL 时的行为方式自 HTML5 以来,其中的字符。因此,现在可以安全地在 URL 中包含非 ASCII 字符,无论是否进行百分比编码。
回答by Jukka K. Korpela
By current official HTML recommendations, the ampersand must be escaped e.g. as &
in contexts like this. However, browsers do not require it, and the HTML5 CR proposes to make this a rule, so that special rules apply in attribute values. Current HTML5 validators are outdated in this respect (see bug reportwith comments).
根据当前的官方 HTML 建议,必须对符号进行转义,例如&
在这样的上下文中。但是,浏览器不需要它,HTML5 CR 建议将此作为规则,以便在属性值中应用特殊规则。当前的 HTML5 验证器在这方面已经过时(请参阅带有注释的错误报告)。
It will remain possible to escape ampersands in attribute values, but apart from validation with current tools, there is no practical need to escape them in href
values (and there is a small risk of making mistakes if you start escaping them).
仍然可以在属性值中转义&符号,但除了使用当前工具进行验证之外,实际上没有必要在href
值中转义它们(如果开始转义它们,犯错误的风险很小)。
回答by Daniel W.
I am posting a new answer because I find zneak's answer does not have enough examples, does not show HTML and URI handling as different aspects and standards and has some minor things missing.
我发布了一个新答案,因为我发现 zneak 的答案没有足够的示例,没有将 HTML 和 URI 处理显示为不同的方面和标准,并且缺少一些小东西。
You have two standards concerning URLs in links (<a href
).
关于链接 ( <a href
) 中的URL,您有两个标准。
The first standard is RFC 1866(HTML 2.0) where in "3.2.1. Data Characters" you can read the characters which need to be escaped when used as the value for an HTML attribute. (Attributes themselves do not allow special characters at all, e.g. <a hr&ef="http://...
is not allowed, nor is <a hr&ef="http://...
.)
第一个标准是RFC 1866(HTML 2.0),其中在“3.2.1. 数据字符”中,您可以读取用作 HTML 属性值时需要转义的字符。(属性本身根本不允许特殊字符,例如<a hr&ef="http://...
不允许,也不允许<a hr&ef="http://...
。)
Later this has gone into the HTML 4standard, the characters you need to escape are:
后来这进入了HTML 4标准,您需要转义的字符是:
< to <
> to >
& to &
" to "e;
' to '
The other standard is RFC 3986"Generic URI standard", where URLs are handled (this happens when the browser is about to follow a link because the user clicked on the HTML element).
另一个标准是RFC 3986“通用 URI 标准”,在该标准中处理 URL(当浏览器由于用户单击 HTML 元素而即将跟随链接时会发生这种情况)。
reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
It is important to escape those characters so the client knows wether they represent data or a delimiter.
转义这些字符很重要,这样客户端才能知道它们是代表数据还是分隔符。
Example unescaped:
未转义示例:
https://example.com/?user=test&password&te&st&goto=https://google.com
Example, fully legit URL
示例,完全合法的 URL
https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com
Example fully legit URL in value of HTML attribute:
HTML 属性值中完全合法的 URL 示例:
https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com
Also important scenarios:
同样重要的场景:
Javascript as a value:
<img src="..." onclick="window.location.href = "https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com";">...</a>
(Yes,;;
is correct.)JSON as a value:
<a href="..." data-analytics="{"event": "click"}">...</a>
Escaped things inside escaped things, double encoding, URL inside URL inside paramter etc,...
http://x.com/?passwordUrl=http%3A%2F%2Fy.com%2F%3Fuser%3Dtest&password=""123
Javascript 作为值:
<img src="..." onclick="window.location.href = "https://example.com/?user=test&password&te%26st&goto=https%3A%2F%2Fgoogle.com";">...</a>
(是的,;;
是正确的。)JSON 作为值:
<a href="..." data-analytics="{"event": "click"}">...</a>
转义内容中的转义内容,双重编码,参数中的 URL 中的 URL 等,...
http://x.com/?passwordUrl=http%3A%2F%2Fy.com%2F%3Fuser%3Dtest&password=""123
回答by Randy Greencorn
Yes, you should convert &
to &
.
是的,您应该转换&
为&
.
This html validator tool by W3Cis helpful for questions like this. It will tell you the errors and warnings for a particular page.
W3C 的这个 html 验证器工具对此类问题很有帮助。它会告诉您特定页面的错误和警告。