java urlencode() '星号'(星号?)字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6533561/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 16:14:27  来源:igfitidea点击:

urlencode() the 'asterisk' (star?) character

javaphpurlencode

提问by etienne

I'm testing PHPurlencode()vs. Javajava.net.URLEncoder.encode().

我正在测试PHPurlencode()Javajava.net.URLEncoder.encode()

Java

爪哇

String all = "";
for (int i = 32; i < 256; ++i) {
    all += (char) i;
}

System.out.println("All characters:         -||" + all + "||-");
try {
    System.out.println("Encoded characters:     -||" + URLEncoder.encode(all, "utf8") + "||-");
} catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}

PHP

PHP

$all = "";
for($i = 32; $i < 256; ++$i)
{
    $all = $all.chr($i);
}

echo($all.PHP_EOL);
echo(urlencode(utf8_encode($all)).PHP_EOL);

All characters seem to be encoded in the same way with both functions, except for the 'asterisk' character that is not encoded by Java, and translated to %2A by PHP. Which behaviour is supposed to be the 'right' one, if any?

除了“星号”字符不是由 Java 编码并由 PHP 转换为 %2A 外,所有字符似乎都以相同的方式使用这两个函数进行编码。如果有的话,哪种行为应该是“正确的”行为?

Note: I tried with rawurlencode(), too - no luck.

注意:我也尝试rawurlencode()过 - 没有运气。

采纳答案by aioobe

It is okay to have a *in a URL, (but it is also okay to have it in its encoded form).

*在 URL 中包含 a 是可以的(但也可以以编码形式使用)。

RFC1738: Uniform Resource Locators (URL)states the following:

RFC1738:统一资源定位符 (URL)声明如下:

Reserved:

[...]

Usually a URL has the same interpretation when an octet is represented by a character and when it encoded. However, this is not true for reserved characters: encoding a character reserved for a particular scheme may change the semantics of a URL.

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencodedwithin a URL.

On the other hand, characters that are not required to be encoded (including alphanumerics) may be encodedwithin the scheme-specific part of a URL, as long as they are not being used for a reserved purpose.

预订的:

[...]

通常,当八位字节由字符表示和编码时,URL 具有相同的解释。但是,对于保留字符来说,情况并非如此:对为特定方案保留的字符进行编码可能会改变 URL 的语义。

因此,只能在 URL 中使用未编码的字母数字、特殊字符"$-_.+!*'(),"和用于保留目的的保留字符。

另一方面,不需要编码的字符(包括字母数字)可以在 URL 的特定于方案的部分中编码,只要它们不用于保留目的。

回答by You

Wikipedia suggeststhat *is a reserved characterwhen it comes to URIs, and that it must be encoded if not used for the reserved purpose. According to RFC3986, pages 12-13:

维基百科建议,当涉及到 URI 时,这*是一个保留字符,如果不用于保留目的,则必须对其进行编码。根据RFC3986,第 12-13 页:

URIs include components and subcomponents that are delimited by characters in the "reserved" set. These characters are called "reserved" because they may (or may not) be defined as delimiters by the generic syntax, by each scheme-specific syntax, or by the implementation-specific syntax of a URI's dereferencing algorithm. If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed.

URI 包括由“保留”集中的字符分隔的组件和子组件。这些字符被称为“保留”,因为它们可能(或可能不会)被通用语法、每个方案特定的语法或 URI 解引用算法的实现特定的语法定义为定界符。如果 URI 组件的数据与作为分隔符的保留字符的用途发生冲突,则必须在形成 URI 之前对冲突数据进行百分比编码。

  reserved    = gen-delims / sub-delims

  gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

  sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

(The fact that the URL RFCstill allows the *character to go unencoded is that is doesn'thave a reserved purpose i URLs, and as such doesn't have to be encoded. So wether you have to encode it or not depends on what sort of URI you're creating.)

URL RFC仍然允许*字符未编码的事实是它没有保留用途的 URL,因此不必进行编码。因此,您是否必须对其进行编码取决于什么您正在创建的 URI 类型。)

回答by axtavt

Javadoc of URLEncoderrefers to the HTML specification:

Javadoc ofURLEncoder指的是 HTML 规范:

This class contains static methods for converting a String to the application/x-www-form-urlencodedMIME format. For more information about HTML form encoding, consult the HTML specification.

此类包含用于将字符串转换为application/x-www-form-urlencodedMIME 格式的静态方法。有关 HTML 表单编码的更多信息,请参阅 HTML 规范。

HTML4is quite unclear regarding this question and refers to RFC1738, which is quoted by aioobe:

HTML4是关于这个问题还不是很清楚,指的是RFC1738,这是由aioobe引用:

Control names and values are escaped. Space characters are replaced by '+', and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by '%HH', a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as "CR LF" pairs (i.e., '%0D%0A').

控件名称和值被转义。空格字符被 '+' 替换,然后保留字符被转义,如 [RFC1738],第 2.2 节所述:非字母数字字符被替换为 '%HH'、一个百分号和两个表示 ASCII 码的十六进制数字特点。换行符表示为“CR LF”对(即“%0D%0A”)。

However, HTML5directly states that *should not be encoded:

但是,HTML5直接声明*不应编码:

  • If the character isn't in the range U+0020, U+002A, U+002D, U+002E, U+0030 to U+0039, U+0041 to U+005A, U+005F, U+0061 to U+007A
    Replace the character with a string formed as follows:
    ...
  • Otherwise
    Leave the character as is.
  • 如果字符不在 U+0020、U+002A、U+002D、U+002E、U+0030 到 U+0039、U+0041 到 U+005A、U+005F、U+0061 到 U 的范围内+007A
    用如下形式的字符串替换该字符:
    ...
  • 否则
    保持字符不变。