如何使用 mod_rewrite 和 Apache 对特殊字符进行编码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/459667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to encode special characters using mod_rewrite & Apache?
提问by Aldie
I would like to have pretty URLs for my tagging system along with all the special characters: +, &, #, %, and =. Is there a way to do this with mod_rewrite without having to double encode the links?
我想为我的标签系统,简洁的URL与所有的特殊字符一起:+,&,#,%,和=。有没有办法用 mod_rewrite 做到这一点而不必对链接进行双重编码?
I notice that delicious.com and stackoverflow seem to be able to handle singly encoded special characters. What's the magic formula?
我注意到delicious.com 和stackoverflow 似乎能够处理单一编码的特殊字符。神奇的公式是什么?
Here's an example of what I want to happen:
这是我想要发生的事情的一个例子:
http://www.foo.com/tag/c%2b%2b
Would trigger the following RewriteRule:
将触发以下 RewriteRule:
RewriteRule ^tag/(.*) script.php?tag=
and the value of tag would be "c++"
标签的值将是“c++”
The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces. If I double encode the plus sign to '%252B' then I get the desired result - however it makes for messy URLS and seems pretty hacky to me.
apache/mod_rewrite 的正常操作不是这样的,好像是把加号变成了空格。如果我将加号双重编码为“%252B”,那么我会得到想要的结果——但是它会导致 URL 混乱,对我来说似乎很棘手。
采纳答案by bobince
The normal operation of apache/mod_rewrite doesn't work like this, as it seems to turn the plus signs into spaces.
apache/mod_rewrite 的正常操作不是这样的,好像是把加号变成了空格。
I don't think that's quite what's happening. Apache is decoding the %2Bs to +s in the path part since + is a valid character there. It does this before letting mod_rewrite look at the request.
我不认为这完全是正在发生的事情。Apache 正在将路径部分中的 %2Bs 解码为 +s,因为 + 是那里的有效字符。它在让 mod_rewrite 查看请求之前执行此操作。
So then mod_rewrite changes your request '/tag/c++' to 'script.php?tag=c++'. But in a query string component in the application/x-www-form-encoded format, the escaping rules are very slightly different to those that apply in path parts. In particular, '+' is a shorthand for space (which could just as well be encoded as '%20', but this is an old behaviour we'll never be able to change now).
那么 mod_rewrite 将您的请求 '/tag/c++' 更改为 'script.php?tag=c++'。但是在 application/x-www-form-encoded 格式的查询字符串组件中,转义规则与应用于路径部分的规则略有不同。特别是,'+' 是空格的简写(它也可以编码为 '%20',但这是我们现在永远无法改变的旧行为)。
So PHP's form-reading code receives the 'c++' and dumps it in your _GET as C-space-space.
因此,PHP 的表单读取代码接收“c++”并将其作为 C 空间空间转储到您的 _GET 中。
Looks like the way around this is to use the rewriteflag 'B'. See http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags- curiously it uses more or less the same example!
看起来解决这个问题的方法是使用重写标志'B'。参见http://httpd.apache.org/docs/2.2/mod/mod_rewrite.html#rewriteflags- 奇怪的是它或多或少使用了相同的例子!
RewriteRule ^tag/(.*)$ /script.php?tag= [B]
回答by David Z
I'm not sure I understand what you're asking, but the NE(noescape) flag to Apache's RewriteRuledirective might be of some interest to you. Basically, it prevents mod_rewritefrom automatically escaping special characters in the substitution pattern you provide. The example given in the Apache 2.2 documentation is
我不确定我是否理解您的要求,但是NEApacheRewriteRule指令的(noescape) 标志可能对您有些兴趣。基本上,它可以防止mod_rewrite在您提供的替换模式中自动转义特殊字符。Apache 2.2 文档中给出的示例是
RewriteRule /foo/(.*) /bar/arg=P1\%3d [R,NE]
which will turn, for example, /foo/zedinto a redirect to /bar/arg=P1%3dzed, so that the script /barwill then see a query parameter named argwith a value P1=zed, if it looks in its PATH_INFO(okay, that's not a realquery parameter, so sue me ;-P).
例如,这将变成/foo/zed重定向到/bar/arg=P1%3dzed,这样脚本/bar就会看到一个以argvalue命名的查询参数P1=zed,如果它在它的中查找PATH_INFO(好吧,那不是一个真正的查询参数,所以起诉我;-P)。
At least, I think that's how it works . . . I've never used that particular flag myself.
至少,我认为它是这样工作的。. . 我自己从未使用过那个特定的标志。
回答by Nitin
I finally made it work with the help of RewriteMap.
我终于在 RewriteMap 的帮助下让它工作了。
Added the escape map in httpd.conf file RewriteMap es int:escape
在 httpd.conf 文件中添加了转义映射 RewriteMap es int:escape
and used it in Rewrite rule
并在重写规则中使用它
RewriteRule ([^?.]*) /abc?arg1=${es:}&country_sniff=true [L]
回答by danorton
The underlying problem is that you are moving from a request that has one encoding (specifically, a plus sign is a plus sign) into a request that has different encoding (a plus sign represents a space). The solution is to bypass the decoding that mod_rewrite does and convert your path directly from the raw request to the query string.
潜在的问题是您正在从具有一种编码(特别是加号是加号)的请求转换为具有不同编码(加号代表空格)的请求。解决方案是绕过 mod_rewrite 所做的解码,并将您的路径直接从原始请求转换为查询字符串。
To bypass the normal flow of the rewrite rules, we'll load the raw request string directly into an environment variable and modify the environment variable instead of the normal rewrite path. It will already be encoded, so we don't generally need to worry about encoding it when we move it to the query string. What we do want, however, is to percent-encode the plus signs so that they are properly relayed as plus signs and not spaces.
为了绕过重写规则的正常流程,我们将原始请求字符串直接加载到环境变量中,并修改环境变量而不是正常的重写路径。它已经被编码了,所以当我们将它移动到查询字符串时,我们通常不需要担心对它进行编码。然而,我们想要的是对加号进行百分比编码,以便将它们正确地作为加号而不是空格进行中继。
The rules are incredibly simple:
规则非常简单:
RewriteEngine On
RewriteRule ^script.php$ - [L]
# Move the path from the raw request into _rq
RewriteCond %{ENV:_rq} =""
RewriteCond %{THE_REQUEST} "^[^ ]+ (/path/[^/]+/[^? ]+)"
RewriteRule .* - [E=_rq:%1]
# encode the plus signs (%2B) (Loop with [N])
RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)\+(.*)$"
RewriteRule .* - [E=_rq:/path/%1/%2\%2B%3,N]
# finally, move it from the path to the query string
# ([NE] says to not re-code it)
RewriteCond %{ENV:_rq} "/path/([^/]+)/(.*)$"
RewriteRule .* /path/script.php?%1=%2 [NE]
This trivial script.php confirms that it works:
这个简单的 script.php 确认它有效:
<input readonly type="text" value="<?php echo $_GET['tag']; ?>" />
回答by yren
I meet the similar problem for mod_rewrite with + sign in url. The scenario like below:
我在使用 + 登录 url 时遇到了 mod_rewrite 的类似问题。场景如下:
we have a url with + sign need rewrite like http://deskdomain/2013/08/09/a+b+c.html
我们有一个带有 + 符号的 url 需要重写 http://deskdomain/2013/08/09/a+b+c.html
RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1
RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1
The struts action urlRedirect get url parameter, do some change and using the url for another redirect. But in req.getParameter("url") the + sign change to empty, parameter url content is
http://deskdomain/2013/08/09/a b c.html, that cause redirect 404 not found. For resolve it (get help from prior answer)we use rewrite flag B (escape backreferences), and NE (noescape)
struts 操作 urlRedirect 获取 url 参数,进行一些更改并使用该 url 进行另一个重定向。但是在 req.getParameter("url") 中 + 号变为空,参数 url 内容为
http://deskdomain/2013/08/09/a b c.html,导致重定向 404 未找到。为了解决它(从先前的答案中获得帮助)我们使用重写标志 B(转义反向引用)和 NE(noescape)
RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1 [B,NE]
RewriteRule ^/(.*) http://mobiledomain/do/urlRedirect?url=http://%{HTTP_HOST}/$1 [B,NE]
The B , will escape + to %2B , NE will prevent mod_write escape %2B to %252B (double escape + sign), so in req.getParameter("url")=http://deskdomain/2013/08/09/a+b+c.html
B ,将转义 + 到 %2B ,NE 将阻止 mod_write 转义 %2B 到 %252B(双转义 + 符号),所以在 req.getParameter("url")=http://deskdomain/2013/08/09/a+b+c.html
I think the reason is req.getParameter("url") will do a unescape for us, the + sign can unescape to empty. You can try unescape %2B one time to + , then unescape + again to empty.
我认为原因是 req.getParameter("url") 会为我们做一个 unescape,+ 号可以 unescape 为空。您可以尝试 unescape %2B 一次到 + ,然后 unescape + 再次清空。
"%2B" unescape-> "+" unescape-> " "
"%2B" unescape-> "+" unescape-> " "

