java 编码和解码 rfc2396 URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/304806/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 11:51:22  来源:igfitidea点击:

Encode and Decode rfc2396 URLs

javarfc2396

提问by Martin OConnor

What is the best way to encode URL strings such that they are rfc2396 compliant and to decode a rfc2396 compliant string such that for example %20 is replaced with a space character?

对 URL 字符串进行编码以使其符合 rfc2396 以及对符合 rfc2396 的字符串进行解码(例如将 %20 替换为空格字符)的最佳方法是什么?

edit: URLEncoder and URLDecoder classes do notencode/decode rfc2396 compliant URLs, they encode to a MIME type of application/x-www-form-urlencoded which is used to encode HTML form parameter data.

编辑:URLEncoder 和 URLDecoder 类编码/解码符合 rfc2396 的 URL,它们编码为 MIME 类型的 application/x-www-form-urlencoded,用于编码 HTML 表单参数数据。

回答by larf311

Use the URI class as follows:

使用 URI 类如下:

URI uri = new URI("http", "//www.someurl.com/has spaces in url", null);
URL url = uri.toURL();

or if you want a String:

或者如果你想要一个字符串:

String urlString = uri.toASCIIString();

回答by bobince

Your component parts, potentially containing characters that must be escaped, should already have been escaped using URLEncoder before being concatenated into a URI.

您的组件部分,可能包含必须转义的字符,在连接到 URI 之前应该已经使用 URLEncoder 进行了转义。

If you have a URI with out-of-band characters in (like space, "<>[]{}\|^`, and non-ASCII bytes), it's not really a URI. You can try to fix them up by manually %-escaping them, but this is a last-ditch fix-up operation and not a standard form of encoding. This is usually necessary when you are accepting potentially-malformed URIs from user input, but it's not a standardised operation and I don't know of any built-in Java library function that will do it for you; you may have to hack something up yourself with a RegExp.

如果您的 URI 中包含带外字符(例如空格、“<>[]{}\|^` 和非 ASCII 字节),则它不是真正的 URI。您可以尝试通过以下方式修复它们手动 %-escaping 它们,但这是最后的修复操作,而不是标准的编码形式。当您接受来自用户输入的潜在格式错误的 URI 时,这通常是必要的,但它不是标准化操作,我不这样做不知道有任何内置的 Java 库函数可以为您完成此操作;您可能必须使用 RegExp 自己动手做一些事情。

In the other direction, you must take your URI apart into its component parts (each separate path part, query parameter name and value, and so on) before you can unescape each part (using an URLDecoder). There is no sensible way to %-decode a whole URI in one go; you could try to ‘decode %-escapes that do not decode to delimiters' (like /?=&;%) but you would be left with a strange inconsistent string that doesn't conform to any URI-processing standard.

另一方面,您必须将 URI 分解为其组成部分(每个单独的路径部分、查询参数名称和值等),然后才能对每个部分进行转义(使用 URLDecoder)。没有明智的方法可以一次性对整个 URI 进行 %-decode;您可以尝试“解码不解码为分隔符的 %-escapes”(如 /?=&;%),但会留下一个奇怪的不一致字符串,该字符串不符合任何 URI 处理标准。

URLEncoder/URLDecoder are fine for handling URI query components, both names and values. However they are not quiteright for handling URI path part components. The difference is that the ‘+' character does not mean a space in a path part. You can fix this up with a simple string replace: after URLEncoding, replace ‘+' with ‘%20'; before URLDecoding, replace ‘+' with ‘%2B'. You can ignore the difference if you are not planning to include segments containing spaces or pluses in your path.

URLEncoder/URLDecoder 可以很好地处理 URI 查询组件,包括名称和值。然而,它们不太适合处理 URI 路径部分组件。不同之处在于“+”字符并不表示路径部分中的空格。您可以通过简单的字符串替换来解决此问题:在 URLEncoding 之后,将 '+' 替换为 '%20';在 URLDecoding 之前,将 '+' 替换为 '%2B'。如果您不打算在路径中包含包含空格或加号的段,则可以忽略差异。

回答by Martin OConnor

The javadocs recommend using the java.net.URI class to accomplish the encoding. To ensure that the URI class properly encodes the url, one of the multi-argument constructors must be used. These constructors will perform the required encoding, but require you to parse any url string into the parameters.

javadocs 推荐使用 java.net.URI 类来完成编码。为确保 URI 类正确编码 url,必须使用多参数构造函数之一。这些构造函数将执行所需的编码,但要求您将任何 url 字符串解析为参数。

If you want to decode, you must construct the URI with the single argument constructor, which does not do any encoding. You can then call methods such as getPath() etc. to retrieve and build the decoded URL.

如果要解码,则必须使用不进行任何编码的单参数构造函数构造 URI。然后,您可以调用 getPath() 等方法来检索和构建解码后的 URL。