Html 哪些字符必须在 HTTP 查询字符串中转义?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2322764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-29 02:18:00  来源:igfitidea点击:

What characters must be escaped in an HTTP query string?

htmlhttpurlquery-string

提问by Jason Kresowaty

This question concerns the characters in the query string portion of the URL, which appear after the ?mark character.

此问题涉及 URL 的查询字符串部分中出现在?标记字符之后的字符。

Per Wikipedia, certain characters are left as is and others are encoded (usually with a %escape sequence).

根据Wikipedia,某些字符保持原样,其他字符进行编码(通常使用%转义序列)。

I've been trying to track this down to actual specifications, so that I understand the justification behind every bullet point in that Wikipedia page.

我一直试图将其追溯到实际规范,以便我了解该 Wikipedia 页面中每个要点背后的理由。

Contradiction Example 1:

矛盾示例 1:

The HTML specificationsays to encode space as +and defers the rest to RFC1738. However, this RFC says that ~is unsafe and furthermore that "[a]ll unsafe characters must always be encoded within the URL". This seems to contradict Wikipedia.

HTML规范说来编码空间,+并按照其余RFC1738。但是,这个 RFC 说这~是不安全的,而且“[a]ll 不安全的字符必须始终在 URL 中编码”。这似乎与维基百科相矛盾。

In practice, IE8 encodes ~in the query strings it generates, while FF3 leaves it as is.

实际上,IE8~在它生成的查询字符串中进行编码,而 FF3 则保持原样。

Contradiction Example 2:

矛盾示例2:

Wikipedia states that all characters that it does not mention must be encoded. !is not mentioned in Wikipedia. But RFC1738states that !is a "special" character and "may be used unencoded". This seems to contradict Wikipedia which says that it must be encoded.

维基百科指出,它没有提到的所有字符都必须进行编码。!维基百科中没有提到。但是RFC1738指出这!是一个“特殊”字符并且“可以未编码使用”。这似乎与维基百科说它必须被编码相矛盾。

In practice, IE8 encodes !in the query strings it generates, while FF3 leaves it as is.

实际上,IE8!在它生成的查询字符串中进行编码,而 FF3 则保持原样。

I understand that the moral of this is probably going to be to encode those characters that are in doubt between Wikipedia and the specifications. Perhaps even going as far as encoding everything that is not [A-Za-z0-9]. I would just like to know the actual standards on this.

我知道这样做的寓意可能是对维基百科和规范之间存在疑问的那些字符进行编码。甚至可能会编码所有不是 [A-Za-z0-9] 的内容。我只想知道这方面的实际标准。

Conclusions

结论

The algorithm described on Wikipedia encodes precisely those characters which are not RFC3986 unreserved characters. That is, it encodes all characters other than alphanumerics and -._~. As a special case, space is encoded as +instead of %20per RFC3986.

维基百科上描述的算法对那些不是RFC3986 非保留字符的字符进行精确编码。也就是说,它对字母数字和 以外的所有字符进行编码-._~。作为一种特殊情况,空格被编码为+而不是%20按照 RFC3986。

Some applications use an older RFC. For comparison, the RFC2396 unreserved charactersare alphanumerics and !'()*-._~.

某些应用程序使用较旧的 RFC。为了比较,RFC2396 非保留字符是字母数字和!'()*-._~.

For comparison, the HTML5 working draft algorithmencodes all characters other than alphanumerics and *-._. The special case encoding for space remains +. Notable differences are that *is not encoded and ~is encoded. (Technically, this handling of *is compatible with RFC3986 even though *is in reservedbecause it is in the sub-delimswhich are allowed in the queryproduction.)

为了进行比较,HTML5 工作草案算法对除字母数字和*-._. 空间的特殊情况编码仍然是+。显着的区别*是未编码和~已编码。(从技术上讲,这种处理*与 RFC3986 兼容,即使*是在reserved因为它sub-delimsquery生产中是允许的。)

回答by PJ King

The answer lies in the RFC 3986 document, specifically Section 3.4.

答案在 RFC 3986 文档中,特别是第 3.4 节

The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.

...

The characters slash ("/") and question mark ("?") may represent data within the query component.

查询组件由第一个问号 (“?”) 字符指示,并以数字符号 (“#”) 字符或 URI 的结尾结束。

...

字符斜杠(“/”)和问号(“?”)可以代表查询组件内的数据。

Technically, RFC 3976-3.4 defines the query component as:

从技术上讲,RFC 3976-3.4 将查询组件定义为:

query       = *( pchar / "/" / "?" )

This syntax means that query can include all characters from pcharas well as /and ?. pcharrefers to another specification of path characters. Helpfully, Appendix Aof RFC 3986 lists the relevant ABNF definitions, most notably:

此语法意味着查询可以包含来自pchar以及/和的所有字符?pchar指的是另一种路径字符规范。有用的是,RFC 3986 的附录 A列出了相关的 ABNF 定义,最值得注意的是:

query         = *( pchar / "/" / "?" )
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
pct-encoded   = "%" HEXDIG HEXDIG
sub-delims    = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Thus, in addition to all alphanumerics and percent encoded characters, a query can legally include the following unencoded characters:

因此,除了所有字母数字和百分比编码字符之外,查询还可以合法地包含以下未编码字符:

/ ? : @ - . _ ~ ! $ & ' ( ) * + , ; =

Of course, you may want to keep in mind that '=' and '&' usually have special significance within a query.

当然,您可能要记住,'=' 和 '&' 在查询中通常具有特殊意义。