必要时用于 URL 编码的 Java 库（如浏览器）

Question

提问by palacsint

If I put the http://localhost:9000/space testURL to the address bar of a web browser it calls the server with http://localhost:9000/space%20test. http://localhost:9000/specáéítestwill be also encoded to http://localhost:9000/spec%C3%81%C3%89%C3%8Dtest.

如果我将http://localhost:9000/space testURL放在Web 浏览器的地址栏中，它会使用http://localhost:9000/space%20test. http://localhost:9000/specáéítest也将被编码为http://localhost:9000/spec%C3%81%C3%89%C3%8Dtest.

If put the encoded URLs to the address bar (i.e. http://localhost:9000/space%20testand http://localhost:9000/spec%C3%81%C3%89%C3%8Dtest) they remain the same (they won't be double-encoded).

如果将编码的 URL 放在地址栏（即http://localhost:9000/space%20test和http://localhost:9000/spec%C3%81%C3%89%C3%8Dtest），它们将保持不变（它们不会被双重编码）。

Is there any Java API or library which does this encoding? The URLs comes from the user so I don't know if they are encoded or not.

是否有任何 Java API 或库可以进行这种编码？URL 来自用户，所以我不知道它们是否被编码。

(If there isn't would it be enough to search for %in the input string and encode if it's not found, or is there any special case where this would not work?)

（如果没有，%在输入字符串中搜索并在未找到的情况下进行编码就足够了，或者是否有任何特殊情况不起作用？）

Edit:

编辑：

URLEncoder.encode("space%20test", "UTF-8")returns with space%2520testwhich is not what I would like since it is double-encoded.

URLEncoder.encode("space%20test", "UTF-8")返回space%2520test这不是我想要的，因为它是双重编码的。

Edit 2:

编辑2：

Furthermore, browsers handle partially encoded URLs, like http://localhost:9000/specáé%C3%8Dtest, well, without double-encoding them. In this case the server receives the following URL: http://localhost:9000/spec%C3%81%C3%89%C3%8Dtest. It is same as the encoded form of ...specáéítest.

此外，浏览器处理部分编码的 URL，比如http://localhost:9000/specáé%C3%8Dtest，好吧，没有对它们进行双重编码。在这种情况下，服务器会收到以下 URL：http://localhost:9000/spec%C3%81%C3%89%C3%8Dtest。它与的编码形式相同...specáéítest。

Answer 1

采纳答案by Veniamin

What every web developer must know about URL encoding

每个 Web 开发人员必须了解的有关 URL 编码的知识

Url Encoding Explained

网址编码解释

Why do I need URL encoding?

为什么需要 URL 编码？

The URL specification RFC 1738 specifies that only a small set of characters 
can be used in a URL. Those characters are:

A to Z (ABCDEFGHIJKLMNOPQRSTUVWXYZ)
a to z (abcdefghijklmnopqrstuvwxyz)
0 to 9 (0123456789)
$ (Dollar Sign)
- (Hyphen / Dash)
_ (Underscore)
. (Period)
+ (Plus sign)
! (Exclamation / Bang)
* (Asterisk / Star)
' (Single Quote)
( (Open Bracket)
) (Closing Bracket)

How does URL encoding work?

URL 编码如何工作？

All offending characters are replaced by a % and a two digit hexadecimal value 
that represents the character in the proper ISO character set. Here are a 
couple of examples:

$ (Dollar Sign) becomes %24
& (Ampersand) becomes %26
+ (Plus) becomes %2B
, (Comma) becomes %2C
: (Colon) becomes %3A
; (Semi-Colon) becomes %3B
= (Equals) becomes %3D
? (Question Mark) becomes %3F
@ (Commercial A / At) becomes %40

Simple Example:

简单示例：

import java.util.logging.Level;
import java.util.logging.Logger;
import javax.script.ScriptEngine;
import javax.script.ScriptEngineManager;
import javax.script.ScriptException;

public class TextHelper {
    private static ScriptEngine engine = new ScriptEngineManager()
        .getEngineByName("JavaScript");

/**
 * Encoding if need escaping %$&+,/:;=?@<>#%
 *
 * @param str should be encoded
 * @return encoded Result 
 */
public static String escapeJavascript(String str) {
    try {
        return engine.eval(String.format("escape(\"%s\")", 
            str.replaceAll("%20", " "))).toString()
                .replaceAll("%3A", ":")
                .replaceAll("%2F", "/")
                .replaceAll("%3B", ";")
                .replaceAll("%40", "@")
                .replaceAll("%3C", "<")
                .replaceAll("%3E", ">")
                .replaceAll("%3D", "=")
                .replaceAll("%26", "&")
                .replaceAll("%25", "%")
                .replaceAll("%24", "$")
                .replaceAll("%23", "#")
                .replaceAll("%2B", "+")
                .replaceAll("%2C", ",")
                .replaceAll("%3F", "?");
    } catch (ScriptException ex) {
        Logger.getLogger(TextHelper.class.getName())
            .log(Level.SEVERE, null, ex);
        return null;
    }
}

Answer 2

回答by Veger

Use the java java.net.URLEncoder#encode():

使用 java java.net.URLEncoder#encode()：

String page = "space test";
String ecodedURL = "http://localhost:9000/" + URLEncoder.encode(page, "UTF-8");

Note: encoding the complete URL would result in an undesired situation, for example http://encodes in http%3A%2F%2F!

注意：编码完整的 URL 会导致不希望出现的情况，例如http://编码为http%3A%2F%2F!

Edit: to prevent encoding an URL twice you could check whether the URL contains a %as it is only valid for encodings. But if a user wrongly messes up the encodings (like, only encode the URL partially or use a %in an URL without it being used for encoding something) then there is not much to do using this method...

编辑：为了防止对 URL 进行两次编码，您可以检查 URL 是否包含 a，%因为它仅对编码有效。但是，如果用户错误地弄乱了编码（例如，仅对 URL 进行部分编码或%在 URL 中使用 a而没有用于编码某些内容），那么使用这种方法就没什么可做的了......

Answer 3

回答by palacsint

Finally, I've checked what Firefox and Chrome do. I've used the following URL with both browsers and capture the HTTP request with netcat (nc -l -p 9000):

最后，我检查了 Firefox 和 Chrome 的功能。我在两个浏览器中都使用了以下 URL，并使用 netcat ( nc -l -p 9000)捕获了 HTTP 请求：

http://localhost:9000/!"$%&'()*+,-./:;<=>?@[\]^_`{|}~

This URL contains every character from ASCII 32 to 127 except [0-9A-Za-z#].

此 URL 包含从 ASCII 32 到 127 的每个字符，除了[0-9A-Za-z#].

The captured request is the following with Firefox 18.0.1:

Firefox 18.0.1 捕获的请求如下：

GET /!%22$%&%27()*+,-./:;%3C=%3E?@[\]^_%60{|}~%7F HTTP/1.1

With Chrome:

使用铬：

GET /!%22$%&'()*+,-./:;%3C=%3E?@[\]^_`{|}~%7F HTTP/1.1

Firefox encodes more characters than Chrome. Here is it in a table:

Firefox 编码的字符比 Chrome 多。这是它在一个表中：

Char | Hex    | Dec     | Encoded by
-----------------------------------------
"    | %22    | 34      | Firefox, Chrome
'    | %27    | 39      | Firefox
<    | %3C    | 60      | Firefox, Chrome
>    | %3E    | 62      | Firefox, Chrome
`    | %60    | 96      | Firefox
     | %7F    | 127     | Firefox, Chrome

I've found some code in their source tree which does something similar but I'm not quite sure that these are the actually used algorithms or not:

我在他们的源代码树中发现了一些类似的代码，但我不太确定这些是否是实际使用的算法：

Chrome: http://src.chromium.org/viewvc/chrome/trunk/src/net/base/escape.cc?revision=HEAD&view=markup
Firefox: toolkit/components/url-classifier/nsUrlClassifierUtils.cpp

铬：http: //src.chromium.org/viewvc/chrome/trunk/src/net/base/escape.cc?revision= HEAD&view= markup
火狐： toolkit/components/url-classifier/nsUrlClassifierUtils.cpp

Anyway, here is a proof of concept code in Java:

无论如何，这是 Java 中的概念代码证明：

// does not handle "#"
public static String encode(final String input) {
    final StringBuilder result = new StringBuilder();
    for (final char c: input.toCharArray()) {
        if (shouldEncode(c)) {
            result.append(encodeChar(c));
        } else {
            result.append(c);
        }
    }
    return result.toString();
}

private static String encodeChar(final char c) {
    if (c == ' ') {
        return "%20"; // URLEncode.encode returns "+"
    }
    try {
        return URLEncoder.encode(String.valueOf(c), "UTF-8");
    } catch (final UnsupportedEncodingException e) {
        throw new IllegalStateException(e);
    }
}

private static boolean shouldEncode(final char c) {
    if (c <= 32 || c >= 127) {
        return true;
    }
    if (c == '"' || c == '<' || c == '>') {
        return true;
    }
    return false;
}

Since it uses URLEncoder.encode, it handles áéícharacters as well as ASCII characters.

由于它使用URLEncoder.encode，它处理áéí字符以及 ASCII 字符。

Answer 4

回答by suin

This is a Scala code snippet. This encoder will encode non-ascii characters and reserved characters in the URL. Also, as the operation is idempotent, the URL won't be double-encoded.

这是一个 Scala 代码片段。此编码器将对 URL 中的非 ascii 字符和保留字符进行编码。此外，由于操作是幂等的，因此不会对 URL 进行双重编码。

import java.net.URL
import scala.util.parsing.combinator.RegexParsers

object IdempotentURLEncoder extends RegexParsers {
  override def skipWhitespace = false
  private def segment = rep(char)
  private def char = unreserved | escape | any ^^ { java.net.URLEncoder.encode(_, "UTF-8") }
  private def unreserved = """[A-Za-z0-9._~!$&'()*+,;=:@-]""".r
  private def escape = """%[A-Fa-f0-9]{2}""".r
  private def any = """.""".r
  private def encodeSegment(input: String): String = parseAll(segment, input).get.mkString
  private def encodeSearch(input: String): String = encodeSegment(input)
  def encode(url: String): String = {
    val u = new URL(url)
    val path = u.getPath.split("/").map(encodeSegment).mkString("/")
    val query = u.getQuery match {
      case null      => ""
      case q: String => "?" + encodeSearch(q)
    }
    val hash = u.getRef match {
      case null      => ""
      case h: String => "#" + encodeSegment(h)
    }
    s"${u.getProtocol}://${u.getAuthority}$path$query$hash"
  }
}

Example usage(test code)

示例用法（测试代码）

import org.scalatest.{ FunSuite, Matchers }

class IdempotentURLEncoderSpec extends FunSuite with Matchers {
  import IdempotentURLEncoder._

  test("Idempotent operation") {
    val url = "http://ja.wikipedia.org/wiki/文字"
    assert(encode(url) == encode(encode(url)))
    assert(encode(url) == encode(encode(encode(url))))
  }

  test("Segment encoding") {
    encode("http://ja.wikipedia.org/wiki/文字")
      .shouldBe("http://ja.wikipedia.org/wiki/%E6%96%87%E5%AD%97")
  }

  test("Query string encoding") {
    encode("http://qiita.com/search?utf8=?&sort=rel&q=開発&sort=rel")
      .shouldBe("http://qiita.com/search?utf8=%E2%9C%93&sort=rel&q=%E9%96%8B%E7%99%BA&sort=rel")
  }

  test("Hash encoding") {
    encode("https://www.google.co.jp/#q=文字")
      .shouldBe("https://www.google.co.jp/#q=文字")
  }

  test("Partial encoding") {
    encode("http://en.wiktionary.org/wiki/fran?ais")
      .shouldBe("http://en.wiktionary.org/wiki/fran%C3%A7ais")
  }

  test("Space is encoded as +") {
    encode("http://example.com/foo bar buz")
      .shouldBe("http://example.com/foo+bar+buz")
  }

  test("Multibyte domain names are not supported yet :(") {
    encode("http://日本語.jp")
      .shouldBe("http://日本語.jp")
  }
}

This code is from Qiita.

此代码来自奇塔。

Answer 5

回答by TheWhiteRabbit

Standard Java api's it self will do the URL encoding and decoding.

标准 Java api 自己会做 URL 编码和解码。

java.net.URI

try the classes URLDecoderand URLEncoder

尝试课程URLDecoder和URLEncoder

To encode text for safe passage through the internets:

对文本进行编码以安全通过互联网：

import java.net.*;
...
try {
    encodedValue= URLEncoder.encode(rawValue, "UTF-8");
} catch (UnsupportedEncodingException uee) { }

And to decode:

并解码：

try {
    decodedValue = URLDecoder.decode(rawValue, "UTF-8");
} catch (UnsupportedEncodingException uee) { }

必要时用于 URL 编码的 Java 库（如浏览器）

提问by palacsint

采纳答案by Veniamin

回答by Veger

回答by palacsint

回答by suin

Example usage(test code)

示例用法（测试代码）

回答by TheWhiteRabbit

相关推荐

最近更新

标签

必要时用于 URL 编码的 Java 库（如浏览器）

提问by palacsint

采纳答案by Veniamin

回答by Veger

回答by palacsint

回答by suin

Example usage(test code)

示例用法（测试代码）

回答by TheWhiteRabbit

相关推荐

为什么在 ubuntu 上的 java 安装中找不到 javaw？

从 Java SE 程序调用 Web 服务方法

java apache axis2 com.ctc.wstx.exc.WstxUnexpectedCharException 错误

java 如何集成 Spring Security 和 Struts2

相关推荐

最近更新

标签