php urlencode vs rawurlencode?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/996139/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
urlencode vs rawurlencode?
提问by Gary Willoughby
If I want to create a URL using a variable I have two choices to encode the string. urlencode()and rawurlencode().
如果我想使用变量创建 URL,我有两种选择来对字符串进行编码。urlencode()和rawurlencode()。
What exactly are the differences and which is preferred?
究竟有什么区别,哪个是首选?
采纳答案by Jonathan Fingland
It will depend on your purpose. If interoperability with other systems is important then it seems rawurlencode is the way to go. The one exception is legacy systems which expect the query string to follow form-encoding style of spaces encoded as + instead of %20 (in which case you need urlencode).
这将取决于您的目的。如果与其他系统的互操作性很重要,那么 rawurlencode 似乎是要走的路。一个例外是遗留系统,它期望查询字符串遵循编码为 + 而不是 %20 的空格的表单编码样式(在这种情况下,您需要 urlencode)。
rawurlencodefollows RFC 1738 prior to PHP 5.3.0 and RFC 3986 afterwards (see http://us2.php.net/manual/en/function.rawurlencode.php)
rawurlencode遵循 PHP 5.3.0 之前的 RFC 1738 和之后的 RFC 3986(参见http://us2.php.net/manual/en/function.rawurlencode.php)
Returns a string in which all non-alphanumeric characters except -_.~ have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in ? RFC 3986 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URLs from being mangled by transmission media with character conversions (like some email systems).
返回一个字符串,其中除 -_.~ 之外的所有非字母数字字符都已替换为后跟两个十六进制数字的百分号 (%)。这是 ? RFC 3986 用于保护文字字符不被解释为特殊的 URL 分隔符,以及用于保护 URL 不被传输媒体与字符转换(如某些电子邮件系统)破坏。
Note on RFC 3986 vs 1738. rawurlencode prior to php 5.3 encoded the tilde character (~) according to RFC 1738. As of PHP 5.3, however, rawurlencode follows RFC 3986 which does not require encoding tilde characters.
关于 RFC 3986 与 1738 的注意事项。php 5.3 之前的 rawurlencode~根据 RFC 1738对波浪号字符 ( ) 进行编码。然而,从 PHP 5.3 开始,rawurlencode 遵循 RFC 3986,它不需要编码波浪号字符。
urlencodeencodes spaces as plus signs (not as %20as done in rawurlencode)(see http://us2.php.net/manual/en/function.urlencode.php)
urlencode将空格编码为加号(不像%20在 rawurlencode 中那样)(参见http://us2.php.net/manual/en/function.urlencode.php)
Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the ? RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.
返回一个字符串,其中除 -_ 之外的所有非字母数字字符。已替换为百分号 (%) 后跟两个十六进制数字和编码为加号 (+) 符号的空格。它的编码方式与来自 WWW 表单的发布数据的编码方式相同,即与 application/x-www-form-urlencoded 媒体类型中的方式相同。这与 ? RFC 3986 编码(参见 rawurlencode()),因为历史原因,空格被编码为加号 (+)。
This corresponds to the definition for application/x-www-form-urlencoded in RFC 1866.
这对应于RFC 1866 中application/x-www-form-urlencoded 的定义。
Additional Reading:
补充阅读:
You may also want to see the discussion at http://bytes.com/groups/php/5624-urlencode-vs-rawurlencode.
您可能还想在http://bytes.com/groups/php/5624-urlencode-vs-rawurlencode上查看讨论。
Also, RFC 2396is worth a look. RFC 2396 defines valid URI syntax. The main part we're interested in is from 3.4 Query Component:
此外,RFC 2396值得一看。RFC 2396 定义了有效的 URI 语法。我们感兴趣的主要部分来自 3.4 Query Component:
Within a query component, the characters
";", "/", "?", ":", "@",are reserved.
"&", "=", "+", ",", and "$"
在查询组件中,字符是保留的。
";", "/", "?", ":", "@",
"&", "=", "+", ",", and "$"
As you can see, the +is a reserved character in the query string and thus would need to be encoded as per RFC 3986 (as in rawurlencode).
如您所见,这+是查询字符串中的保留字符,因此需要按照 RFC 3986(如 rawurlencode)进行编码。
回答by Incognito
Proof is in the source code of PHP.
证明在 PHP 的源代码中。
I'll take you through a quick process of how to find out this sort of thing on your own in the future any time you want. Bear with me, there'll be a lot of C source code you can skim over (I explain it). If you want to brush up on some C, a good place to start is our SO wiki.
我将带您完成一个快速过程,了解如何在将来您需要的任何时间自行找出此类事情。请耐心等待,您可以浏览很多 C 源代码(我会解释)。如果你想复习一些 C,一个很好的起点是我们的 SO wiki。
Download the source (or use http://lxr.php.net/to browse it online), grep all the files for the function name, you'll find something such as this:
下载源码(或者使用http://lxr.php.net/在线浏览),grep函数名的所有文件,你会发现这样的:
PHP 5.3.6 (most recent at time of writing) describes the two functions in their native C code in the file url.c.
PHP 5.3.6(撰写本文时的最新版本)在文件url.c中的原生 C 代码中描述了这两个函数。
RawUrlEncode()
RawUrlEncode()
PHP_FUNCTION(rawurlencode)
{
char *in_str, *out_str;
int in_str_len, out_str_len;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &in_str,
&in_str_len) == FAILURE) {
return;
}
out_str = php_raw_url_encode(in_str, in_str_len, &out_str_len);
RETURN_STRINGL(out_str, out_str_len, 0);
}
UrlEncode()
网址编码()
PHP_FUNCTION(urlencode)
{
char *in_str, *out_str;
int in_str_len, out_str_len;
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &in_str,
&in_str_len) == FAILURE) {
return;
}
out_str = php_url_encode(in_str, in_str_len, &out_str_len);
RETURN_STRINGL(out_str, out_str_len, 0);
}
Okay, so what's different here?
好的,那么这里有什么不同?
They both are in essence calling two different internal functions respectively: php_raw_url_encodeand php_url_encode
它们本质上都分别调用了两个不同的内部函数:php_raw_url_encode和php_url_encode
So go look for those functions!
所以去寻找那些功能吧!
Lets look at php_raw_url_encode
让我们看看 php_raw_url_encode
PHPAPI char *php_raw_url_encode(char const *s, int len, int *new_length)
{
register int x, y;
unsigned char *str;
str = (unsigned char *) safe_emalloc(3, len, 1);
for (x = 0, y = 0; len--; x++, y++) {
str[y] = (unsigned char) s[x];
#ifndef CHARSET_EBCDIC
if ((str[y] < '0' && str[y] != '-' && str[y] != '.') ||
(str[y] < 'A' && str[y] > '9') ||
(str[y] > 'Z' && str[y] < 'a' && str[y] != '_') ||
(str[y] > 'z' && str[y] != '~')) {
str[y++] = '%';
str[y++] = hexchars[(unsigned char) s[x] >> 4];
str[y] = hexchars[(unsigned char) s[x] & 15];
#else /*CHARSET_EBCDIC*/
if (!isalnum(str[y]) && strchr("_-.~", str[y]) != NULL) {
str[y++] = '%';
str[y++] = hexchars[os_toascii[(unsigned char) s[x]] >> 4];
str[y] = hexchars[os_toascii[(unsigned char) s[x]] & 15];
#endif /*CHARSET_EBCDIC*/
}
}
str[y] = 'PHPAPI char *php_url_encode(char const *s, int len, int *new_length)
{
register unsigned char c;
unsigned char *to, *start;
unsigned char const *from, *end;
from = (unsigned char *)s;
end = (unsigned char *)s + len;
start = to = (unsigned char *) safe_emalloc(3, len, 1);
while (from < end) {
c = *from++;
if (c == ' ') {
*to++ = '+';
#ifndef CHARSET_EBCDIC
} else if ((c < '0' && c != '-' && c != '.') ||
(c < 'A' && c > '9') ||
(c > 'Z' && c < 'a' && c != '_') ||
(c > 'z')) {
to[0] = '%';
to[1] = hexchars[c >> 4];
to[2] = hexchars[c & 15];
to += 3;
#else /*CHARSET_EBCDIC*/
} else if (!isalnum(c) && strchr("_-.", c) == NULL) {
/* Allow only alphanumeric chars and '_', '-', '.'; escape the rest */
to[0] = '%';
to[1] = hexchars[os_toascii[c] >> 4];
to[2] = hexchars[os_toascii[c] & 15];
to += 3;
#endif /*CHARSET_EBCDIC*/
} else {
*to++ = c;
}
}
*to = 0;
if (new_length) {
*new_length = to - start;
}
return (char *) start;
}
';
if (new_length) {
*new_length = y;
}
return ((char *) str);
}
And of course, php_url_encode:
当然,php_url_encode:
/* rfc1738:
...The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme...
...Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
reserved characters used for their reserved purposes may be used
unencoded within a URL...
For added safety, we only leave -_. unencoded.
*/
static unsigned char hexchars[] = "0123456789ABCDEF";
One quick bit of knowledge before I move forward, EBCDIC is another character set, similar to ASCII, but a total competitor. PHP attempts to deal with both. But basically, this means byte EBCDIC 0x4c byte isn't the Lin ASCII, it's actually a <. I'm sure you see the confusion here.
在我继续之前快速了解一点知识,EBCDIC 是另一种字符集,类似于 ASCII,但完全是竞争对手。PHP 试图同时处理这两种情况。但基本上,这意味着字节 EBCDIC 0x4c 字节不是LASCII,它实际上是一个<. 我相信你看到了这里的困惑。
Both of these functions manage EBCDIC if the web server has defined it.
如果 Web 服务器已经定义了 EBCDIC,那么这两个函数都会管理 EBCDIC。
Also, they both use an array of chars (think string type) hexcharslook-up to get some values, the array is described as such:
此外,它们都使用字符数组(想想字符串类型)hexchars查找来获取一些值,该数组的描述如下:
echo rawurlencode('http://www.google.com/index.html?id=asd asd');
Beyond that, the functions are really different, and I'm going to explain them in ASCII and EBCDIC.
除此之外,函数真的很不一样,我将用 ASCII 和 EBCDIC 解释它们。
Differences in ASCII:
ASCII 的差异:
URLENCODE:
网址编码:
- Calculates a start/end length of the input string, allocates memory
- Walks through a while-loop, increments until we reach the end of the string
- Grabs the present character
- If the character is equal to ASCII Char 0x20 (ie, a "space"), add a
+sign to the output string. - If it's not a space, and it's also not alphanumeric (
isalnum(c)), and also isn't and_,-, or.character, then we , output a%sign to array position 0, do an array look up to thehexcharsarray for a lookup foros_toasciiarray (an array from Apache that translateschar to hex code) for the key ofc(the present character), we then bitwise shift right by 4, assign that value to the character 1, and to position 2 we assign the same lookup, except we preform a logical and to see if the value is 15 (0xF), and return a 1 in that case, or a 0 otherwise. At the end, you'll end up with something encoded. - If it ends up it's not a space, it's alphanumeric or one of the
_-.chars, it outputs exactly what it is.
- 计算输入字符串的开始/结束长度,分配内存
- 遍历一个 while 循环,递增直到我们到达字符串的末尾
- 抓取当前字符
- 如果字符等于 ASCII Char 0x20(即“空格”),则
+向输出字符串添加一个符号。 - 如果它不是空格,也不是字母数字 (
isalnum(c)),也不是 and_,-, 或.字符,那么我们,输出一个%符号到数组位置 0,对hexchars数组进行os_toascii数组查找以查找数组 (来自Apache 的一个数组,用于将char转换为十六进制代码)作为c(当前字符)的键,然后我们按位右移 4,将该值分配给字符 1,并为位置 2 分配相同的查找,除了我们预制逻辑并查看值是否为 15 (0xF),在这种情况下返回 1,否则返回 0。最后,你会得到一些编码的东西。 - 如果它最终不是空格,而是字母数字或
_-.字符之一,它会准确输出它的内容。
RAWURLENCODE:
RAWURLEN代码:
- Allocates memory for the string
- Iterates over it based on length provided in function call (not calculated in function as with URLENCODE).
- 为字符串分配内存
- 根据函数调用中提供的长度对其进行迭代(不像 URLENCODE 在函数中计算)。
Note:Many programmers have probably never seen a for loop iterate this way, it's somewhat hackish and not the standard convention used with most for-loops, pay attention, it assigns xand y, checks for exit on lenreaching 0, and increments both xand y. I know, it's not what you'd expect, but it's valid code.
注意:许多程序员可能从未见过以这种方式迭代的 for 循环,它有点 hackish 并且不是大多数 for 循环使用的标准约定,请注意,它分配xand y,在len达到 0 时检查退出,并增加xand y。我知道,这不是您所期望的,但它是有效的代码。
- Assigns the present character to a matching character position in
str. - It checks if the present character is alphanumeric, or one of the
_-.chars, and if it isn't, we do almost the same assignment as with URLENCODE where it preforms lookups, however, we increment differently, usingy++rather thanto[1], this is because the strings are being built in different ways, but reach the same goal at the end anyway. - When the loop's done and the length's gone, It actually terminates the string, assigning the
\0byte. - It returns the encoded string.
- 将当前字符分配给 中的匹配字符位置
str。 - 它检查当前字符是字母数字还是
_-.字符之一,如果不是,我们执行与执行查找的 URLENCODE 几乎相同的分配,但是,我们以不同的方式递增,使用y++而不是to[1],这是因为字符串以不同的方式构建,但最终都会达到相同的目标。 - 当循环完成并且长度消失时,它实际上终止了字符串,分配
\0字节。 - 它返回编码的字符串。
Differences:
区别:
- UrlEncode checks for space, assigns a + sign, RawURLEncode does not.
- UrlEncode does not assign a
\0byte to the string, RawUrlEncode does (this may be a moot point) - They iterate differntly, one may be prone to overflow with malformed strings, I'm merely suggestingthis and I haven'tactually investigated.
- UrlEncode 检查空格,分配一个 + 号,而 RawURLEncode 不会。
- UrlEncode 不会
\0为字符串分配一个字节, RawUrlEncode 会(这可能是一个有争议的问题) - 他们以不同的方式迭代,一个人可能容易溢出格式错误的字符串,我只是建议这一点,我还没有真正调查过。
They basically iterate differently, one assigns a + sign in the event of ASCII 20.
它们基本上以不同的方式迭代,在 ASCII 20 的情况下分配一个 + 号。
Differences in EBCDIC:
EBCDIC 的差异:
URLENCODE:
网址编码:
- Same iteration setup as with ASCII
- Still translating the "space" character to a + sign. Note-- I think this needs to be compiled in EBCDIC or you'll end up with a bug? Can someone edit and confirm this?
- It checks if the present char is a char before
0, with the exception of being a.or-, ORless thanAbut greater than char9, ORgreater thanZand less thanabut not a_. ORgreater thanz(yeah, EBCDIC is kinda messed up to work with). If it matches any of those, do a similar lookup as found in the ASCII version (it just doesn't require a lookup in os_toascii).
- 与 ASCII 相同的迭代设置
- 仍在将“空格”字符转换为 +号。注意——我认为这需要在 EBCDIC 中编译,否则最终会出现错误?有人可以编辑并确认这一点吗?
- 它检查当前的 char 是否是之前的 char
0,除了是 a.或-,OR小于A但大于 char9,OR大于但Z小于a但不是 a_。或大于z(是的,EBCDIC 使用起来有点混乱)。如果它匹配其中任何一个,请执行与 ASCII 版本中发现的类似的查找(它只是不需要在 os_toascii 中查找)。
RAWURLENCODE:
RAWURLEN代码:
- Same iteration setup as with ASCII
- Same check as described in the EBCDIC version of URL Encode, with the exception that if it's greater than
z, it excludes~from the URL encode. - Same assignment as the ASCII RawUrlEncode
- Still appending the
\0byte to the string before return.
- 与 ASCII 相同的迭代设置
- 与 EBCDIC 版本的 URL Encode 中描述的检查相同,但如果它大于
z,~则从 URL 编码中排除。 - 与 ASCII RawUrlEncode 相同的分配
\0在返回之前仍将字节附加到字符串。
Grand Summary
大总结
- Both use the same hexchars lookup table
- URIEncode doesn't terminate a string with \0, raw does.
- If you're working in EBCDIC I'd suggest using RawUrlEncode, as it manages the
~that UrlEncode does not (this is a reported issue). It's worth noting that ASCII and EBCDIC 0x20 are both spaces. - They iterate differently, one may be faster, one may be prone to memory or string based exploits.
- URIEncode makes a space into
+, RawUrlEncode makes a space into%20via array lookups.
- 两者都使用相同的 hexchars 查找表
- URIEncode 不会以 \0 终止字符串,而 raw 会。
- 如果您在 EBCDIC 中工作,我建议使用 RawUrlEncode,因为它管理
~UrlEncode 不管理的(这是一个已报告的问题)。值得注意的是,ASCII 和 EBCDIC 0x20 都是空格。 - 它们的迭代方式不同,一种可能更快,一种可能容易受到基于内存或字符串的攻击。
- URIEncode 将空格放入
+,RawUrlEncode%20通过数组查找将空格放入。
Disclaimer:I haven't touched C in years, and I haven't looked at EBCDIC in a really really long time. If I'm wrong somewhere, let me know.
免责声明:我已经很多年没有接触过 C,而且我已经很长时间没有看过 EBCDIC。如果我在某处错了,请告诉我。
Suggested implementations
建议的实现
Based on all of this, rawurlencode is the way to go most of the time. As you see in Jonathan Fingland's answer, stick with it in most cases. It deals with the modern scheme for URI components, where as urlencode does things the old school way, where + meant "space."
基于所有这些,rawurlencode 是大多数情况下要走的路。正如您在 Jonathan Fingland 的回答中看到的那样,在大多数情况下坚持下去。它处理 URI 组件的现代方案,其中 urlencode 以老派的方式做事,其中 + 表示“空间”。
If you're trying to convert between the old format and new formats, be sure that your code doesn't goof up and turn something that's a decoded + sign into a space by accidentally double-encoding, or similar "oops" scenarios around this space/20%/+ issue.
如果您尝试在旧格式和新格式之间进行转换,请确保您的代码不会出错,并且不会通过意外双重编码或类似的“哎呀”场景将解码后的 + 符号变成空格空间/20%/+ 问题。
If you're working on an older system with older software that doesn't prefer the new format, stick with urlencode, however, I believe %20 will actually be backwards compatible, as under the old standard %20 worked, just wasn't preferred. Give it a shot if you're up for playing around, let us know how it worked out for you.
如果您正在使用不喜欢新格式的旧软件在旧系统上工作,请坚持使用 urlencode,但是,我相信 %20 实际上会向后兼容,因为在旧标准下 %20 工作,只是不是首选。如果您愿意玩,请试一试,让我们知道它对您的效果如何。
Basically, you should stick with raw, unless your EBCDIC system really hates you. Most programmers will never run into EBCDIC on any system made after the year 2000, maybe even 1990 (that's pushing, but still likely in my opinion).
基本上,你应该坚持使用 raw,除非你的 EBCDIC 系统真的讨厌你。大多数程序员永远不会在 2000 年之后制造的任何系统上遇到 EBCDIC,甚至可能是 1990 年(这是推动,但在我看来仍然可能)。
回答by jitter
http%3A%2F%2Fwww.google.com%2Findex.html%3Fid%3Dasd%20asd
yields
产量
echo urlencode('http://www.google.com/index.html?id=asd asd');
while
尽管
http%3A%2F%2Fwww.google.com%2Findex.html%3Fid%3Dasd+asd
yields
产量
echo "http://example.com"
. "/category/" . rawurlencode("latest songs")
. "/search?q=" . urlencode("lady gaga");
The difference being the asd%20asdvs asd+asd
区别在于asd%20asdvsasd+asd
urlencode differs from RFC 1738 by encoding spaces as +instead of %20
从RFC 1738进行urlencode相差编码空格作为+代替%20
回答by Neven Boyanov
One practical reason to choose one over the other is if you're going to use the result in another environment, for example JavaScript.
选择一个而不是另一个的一个实际原因是,如果您打算在另一个环境中使用结果,例如 JavaScript。
In PHP urlencode('test 1')returns 'test+1'while rawurlencode('test 1')returns 'test%201'as result.
在 PHP 中urlencode('test 1')返回'test+1'而作为结果rawurlencode('test 1')返回'test%201'。
But if you need to "decode" this in JavaScript using decodeURI()function then decodeURI("test+1")will give you "test+1"while decodeURI("test%201")will give you "test 1"as result.
但是如果你需要使用decodeURI()函数在 JavaScript 中“解码”它,那么decodeURI("test+1")会给你"test+1"whiledecodeURI("test%201")会给你"test 1"结果。
In other words the space (" ") encoded by urlencodeto plus ("+") in PHP will not be properly decoded by decodeURIin JavaScript.
换句话说,在 PHP中由urlencode 编码为加号 ("+")的空格 (" ")不会被JavaScript 中的decodeURI正确解码。
In such cases the rawurlencodePHP function should be used.
在这种情况下,应该使用rawurlencodePHP 函数。
回答by Salman A
I believe spaces must be encoded as:
我相信空格必须编码为:
%20when used inside URL path component+when used inside URL query string component or form data (see 17.13.4 Form content types)
%20在 URL 路径组件中使用时+在 URL 查询字符串组件或表单数据中使用时(请参阅17.13.4 表单内容类型)
The following example shows the correct use of rawurlencodeand urlencode:
以下示例显示了rawurlencode和的正确用法urlencode:
http://example.com/category/latest%20songs/search?q=lady+gaga
Output:
输出:
http://example.com/category/latest+songs/search?q=lady%20gaga
What happens if you encode path and query string components the other way round? For the following example:
如果反过来编码路径和查询字符串组件会发生什么?对于以下示例:
php > $url = <<<'EOD'
<<< > "Which, % of Alice's tasks saw $s @ earnings?"
<<< > EOD;
php > echo $url, PHP_EOL;
"Which, % of Alice's tasks saw $s @ earnings?"
php > echo urlencode($url), PHP_EOL;
%22Which%2C+%25+of+Alice%27s+tasks+saw+%24s+%40+earnings%3F%22
php > echo rawurlencode($url), PHP_EOL;
%22Which%2C%20%25%20of%20Alice%27s%20tasks%20saw%20%24s%20%40%20earnings%3F%22
php > echo rawurldecode(urlencode($url)), PHP_EOL;
"Which,+%+of+Alice's+tasks+saw+$s+@+earnings?"
php > // oops that's not right???
php > echo urldecode(rawurlencode($url)), PHP_EOL;
"Which, % of Alice's tasks saw $s @ earnings?"
php > // now that's more like it
- The webserver will look for the directory
latest+songsinstead oflatest songs - The query string parameter
qwill containlady gaga
- 网络服务器将查找目录
latest+songs而不是latest songs - 查询字符串参数
q将包含lady gaga
回答by nickl-
1. What exactly are the differences and
1. 究竟有什么区别和
The only difference is in the way spaces are treated:
唯一的区别在于处理空间的方式:
urlencode - based on legacy implementation converts spaces to +
urlencode - 基于遗留实现将空格转换为 +
rawurlencode - based on RFC 1738translates spaces to %20
rawurlencode - 基于RFC 1738将空格转换为 %20
The reason for the difference is because + is reserved and valid (unencoded) in urls.
不同的原因是因为 + 在 urls 中是保留的和有效的(未编码)。
2. which is preferred?
2.哪个是首选?
I'd really like to see some reasons for choosing one over the other ... I want to be able to just pick one and use it forever with the least fuss.
我真的很想知道选择一个而不是另一个的一些原因......我希望能够选择一个并永远使用它而不会大惊小怪。
Fair enough, I have a simple strategy that I follow when making these decisions which I will share with you in the hope that it may help.
公平地说,我在做出这些决定时会遵循一个简单的策略,我将与您分享,希望对您有所帮助。
I think it was the HTTP/1.1 specification RFC 2616which called for "Tolerant applications"
我认为是 HTTP/1.1 规范RFC 2616要求“容忍应用程序”
Clients SHOULD be tolerant in parsing the Status-Line and servers tolerant when parsing the Request-Line.
客户端应该容忍解析状态行,而服务器在解析请求行时应该容忍。
When faced with questions like these the best strategy is always to consume as much as possible and produce what is standards compliant.
面对此类问题时,最佳策略始终是尽可能多地消费并生产符合标准的产品。
So my advice is to use rawurlencodeto produce standards compliant RFC 1738 encoded strings and use urldecodeto be backward compatible and accomodate anything you may come across to consume.
所以我的建议是用于rawurlencode生成符合标准的 RFC 1738 编码字符串,并用于urldecode向后兼容并适应您可能遇到的任何消费。
Now you could just take my word for it but lets prove it shall we...
现在你可以相信我的话,但让我们证明这一点,我们...
echo '<a href="ftp://user:', rawurlencode('foo @+%/'),
'@ftp.example.com/x.txt">';
//Outputs <a href="ftp://user:foo%20%40%2B%25%[email protected]/x.txt">
It would appear that PHP had exactly this in mind, even though I've never come across anyone refusing either of the two formats, I cant think of a better strategy to adopt as your defacto strategy, can you?
PHP 似乎正是考虑到了这一点,尽管我从未遇到过拒绝这两种格式中的任何一种的人,但我想不出更好的策略来作为您的事实上的策略,对吗?
nJoy!
快乐!
回答by karim79
The difference is in the return values, i.e:
不同之处在于返回值,即:
Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the ? RFC 1738 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.
返回一个字符串,其中除 -_ 之外的所有非字母数字字符。已替换为百分号 (%) 后跟两个十六进制数字和编码为加号 (+) 符号的空格。它的编码方式与来自 WWW 表单的发布数据的编码方式相同,即与 application/x-www-form-urlencoded 媒体类型中的方式相同。这与 ? RFC 1738 编码(参见 rawurlencode()),因为历史原因,空格被编码为加号 (+)。
Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in ? RFC 1738 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URLs from being mangled by transmission media with character conversions (like some email systems).
返回一个字符串,其中除 -_ 之外的所有非字母数字字符。已被替换为百分号 (%) 后跟两个十六进制数字。这是 ? RFC 1738 用于保护文字字符不被解释为特殊的 URL 分隔符,以及保护 URL 不被传输媒体与字符转换(如某些电子邮件系统)破坏。
The two are very similar, but the latter (rawurlencode) will replace spaces with a '%' and two hex digits, which is suitable for encoding passwords or such, where a '+' is not e.g.:
两者非常相似,但后者 (rawurlencode) 将用 '%' 和两个十六进制数字替换空格,这适用于编码密码等,其中 '+' 不是例如:
echo urlencode("red shirt");
// red+shirt
echo rawurlencode("red shirt");
// red%20shirt
回答by Remus Rusanu
回答by Jake Wilson
Spaces encoded as %20vs. +
编码为%20vs 的空格+
The biggest reason I've seen to use rawurlencode()in most cases is because urlencodeencodes text spaces as +(plus signs) where rawurlencodeencodes them as the commonly-seen %20:
我rawurlencode()在大多数情况下看到使用的最大原因是因为urlencode将文本空间编码为+(加号),其中rawurlencode将它们编码为常见的%20:
I have specifically seen certain API endpoints that accept encoded text queries expect to see %20for a space and as a result, fail if a plus sign is used instead. Obviously this is going to differ between API implementations and your mileage may vary.
我特别看到某些接受编码文本查询的 API 端点希望看到%20一个空格,因此,如果使用加号,则会失败。显然,这在 API 实现之间会有所不同,您的里程可能会有所不同。
回答by CMCDragonkai
I believe urlencode is for query parameters, whereas the rawurlencode is for the path segments. This is mainly due to %20for path segments vs +for query parameters. See this answer which talks about the spaces: When to encode space to plus (+) or %20?
我相信 urlencode 用于查询参数,而 rawurlencode 用于路径段。这主要是由于%20路径段与+查询参数。请参阅有关空格的此答案:何时将空格编码为加号 (+) 或 %20?
However %20now works in query parameters as well, which is why rawurlencode is always safer. However the plus sign tends to be used where user experience of editing and readability of query parameters matter.
但是%20现在也适用于查询参数,这就是为什么 rawurlencode 总是更安全的原因。然而,在用户编辑体验和查询参数的可读性很重要的情况下,往往会使用加号。
Note that this means rawurldecodedoes not decode +into spaces (http://au2.php.net/manual/en/function.rawurldecode.php). This is why the $_GET is always automatically passed through urldecode, which means that +and %20are both decoded into spaces.
请注意,这意味着rawurldecode不会解码+为空格(http://au2.php.net/manual/en/function.rawurldecode.php)。这就是为什么 $_GET 总是自动通过urldecode,这意味着+和%20都被解码为空格。
If you want the encoding and decoding to be consistent between inputs and outputs and you have selected to always use +and not %20for query parameters, then urlencodeis fine for query parameters (key and value).
如果您希望输入和输出之间的编码和解码保持一致,并且您选择始终使用+而不是%20用于查询参数,那么urlencode查询参数(键和值)就可以了。
The conclusion is:
结论是:
Path Segments - always use rawurlencode/rawurldecode
路径段 - 始终使用 rawurlencode/rawurldecode
Query Parameters - for decoding always use urldecode (done automatically), for encoding, both rawurlencode or urlencode is fine, just choose one to be consistent, especially when comparing URLs.
查询参数 - 解码总是使用 urldecode(自动完成),对于编码,rawurlencode 或 urlencode 都可以,只需选择一个保持一致,尤其是在比较 URL 时。

