URL中是否允许使用方括号?
URL中是否允许使用方括号?
我注意到Apache Commons HttpClient(3.0.1)抛出IOException,wget和Firefox接受方括号。
网址示例:
http://example.com/path/to/file[3].html
我的HTTP客户端遇到这样的URL,但是我不确定是要修补代码还是引发异常(实际上应该如此)。
解决方案
回答
最好使用URL对其进行编码,因为显然并非所有Web服务器都支持它们。有时,即使有一个标准,也并非每个人都遵循它。
回答
几乎只有路径名中不允许使用的字符是和?因为它们表示道路的尽头。
uri rfc将有明确的答案:
http://www.ietf.org/rfc/rfc1738.txt
Unsafe: Characters can be unsafe for a number of reasons. The space character is unsafe because significant spaces may disappear and insignificant spaces may be introduced when URLs are transcribed or typeset or subjected to the treatment of word-processing programs. The characters "<" and ">" are unsafe because they are used as the delimiters around URLs in free text; the quote mark (""") is used to delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for encodings of other characters. Other characters are unsafe because gateways and other transport agents are known to sometimes modify such characters. These characters are "{", "}", "|", "\", "^", "~", "[", "]", and "`". All unsafe characters must always be encoded within a URL. For example, the character "#" must be encoded within URLs even in systems that do not normally deal with fragment or anchor identifiers, so that if the URL is copied into another system that does use them, it will not be necessary to change the URL encoding.
答案是它们应该是十六进制编码的,但是了解Postel的定律,大多数事情都会逐字接受它们。
回答
根据URL规范,方括号不是有效的URL字符。
以下是相关片段:
The "national" and "punctuation" characters do not appear in any productions and therefore may not appear in URLs. national { | } | vline | [ | ] | \ | ^ | ~ punctuation < | >
回答
任何引入URL且在引入特殊字符时都不会引发异常的浏览器或者支持Web的软件几乎可以保证在幕后对特殊字符进行编码。弯括号,方括号,空格等均具有特殊的编码方式来表示它们,以免产生冲突。按照前面的答案,处理这些问题的最安全方法是先对它们进行URL编码,然后再将其交给尝试解析URL的对象。
回答
对于使用HttpClient commons类,我们需要研究org.apache.commons.httpclient.util.URIUtil类,特别是encode()方法。在尝试获取URL之前,使用它对URL进行URI编码。
回答
RFC 3986状态
A host identified by an Internet Protocol literal address, version 6 [RFC3513] or later, is distinguished by enclosing the IP literal within square brackets ("[" and "]"). This is the only place where square bracket characters are allowed in the URI syntax.
因此,从理论上讲,我们不应该看到这种URI,因为它们应该经过编码。
回答
我知道这个问题有点老了,但我只想指出PHP使用括号在URL中传递数组。
http://www.example.com/foo.php?bar[]=1&bar[]=2&bar[]=3
在这种情况下," $ _ GET ['bar']"将包含" array(1、2、3)"。
回答
StackOverflow似乎不对它们进行编码:
https://stackoverflow.com/search?q=square+brackets+[url]