java Java从Servlet读取未解码的URL

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/966077/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 14:36:47  来源:igfitidea点击:

Java Reading Undecoded URL from Servlet

javaurlservletsencodedecode

提问by Slartibartfast

Let's presume that I have string like '=&?/;#+%' to be a part of my URL, let's say like this:

让我们假设我有像 '=&?/;#+%' 这样的字符串作为我的 URL 的一部分,让我们这样说:

example.com/servletPath/someOtherPath/myString/something.html?a=b&c=d#asdf

where myString is the above string. I've encoded critical part so URL looks like

其中 myString 是上面的字符串。我已经编码了关键部分,所以 URL 看起来像

example.com/servletPath/someOtherPath/%3D%26%3F%2F%3B%23%2B%25/something.html?a=b&c=d#asdf

So far so good.

到现在为止还挺好。

When I'm in the servlet and I read any of request.getRequestURI(), request.getRequestURL()or request.getPathInfo(), returned value is already decoded, so I get strilng like

当我在 servlet 中并读取任何request.getRequestURI(),request.getRequestURL()或 时request.getPathInfo(),返回的值已经被解码,所以我得到了像

someOtherPath/=&?/;#+%/something.html?a=b&c=d#asdf

and I can't differentiate between real special characters and encoded ones.

我无法区分真正的特殊字符和编码的字符。

I've solved particular problem by banning above chars altogether, which works in this situation, but I still wonder is there any way to get undecoded URL in servlet class.

我已经通过完全禁止上述字符解决了特定问题,这在这种情况下有效,但我仍然想知道有没有办法在 servlet 类中获取未解码的 URL。

YET ANOTHER EDIT: When I've hit this problem last evening I was too tired to notice what is really going on, which is even more bizarre!I have servlet mapped on, say /servletPath/* after that I can put whatever I want and get my servlet responding depending on the rest of a path, exceptwhen there is %2F in the path. In that case request never hits the servlet, and I get 404! If i put '/' instead of %2F it works OK. I'm running Tomcat 6.0.14 on Java 1.6.0-04 on Linux.

另一个编辑:当我昨晚遇到这个问题时,我太累了,没有注意到到底发生了什么,这更奇怪!我已经映射了 servlet,比如 /servletPath/* 之后我可以放任何我想要的东西并让我的 servlet 根据路径的其余部分做出响应,除非路径中有 %2F。在那种情况下request 永远不会命中 servlet,我得到 404!如果我把 '/' 而不是 %2F 它工作正常。我在 Linux 上的 Java 1.6.0-04 上运行 Tomcat 6.0.14。

回答by jcsahnwaldt says GoFundMonica

There is a fundamental difference between '%2F' and '/', both for the browser and the server.

'%2F' 和 '/' 之间存在根本区别,无论是浏览器还是服务器。

The HttpServletRequest specification says (without any logic, AFAICT):

HttpServletRequest 规范说(没有任何逻辑,AFAICT):

  • getContextPath: not decoded
  • getPathInfo: decoded
  • getPathTranslated: not decoded
  • getQueryString: not decoded
  • getRequestURI: not decoded
  • getServletPath: decoded
  • getContextPath:未解码
  • getPathInfo:解码
  • getPathTranslated:未解码
  • getQueryString:未解码
  • getRequestURI:未解码
  • getServletPath:解码

The result of getPathInfo() shouldbe decoded, but the result of getRequestURI() must notbe decoded. If it is, your Servlet container is breaking the spec (as Wouter Coekaerts and Francois Gravel correctly pointed out). Which Tomcat version are you running?

getPathInfo() 的结果应该被解码,但 getRequestURI() 的结果一定不能被解码。如果是,则您的 Servlet 容器违反了规范(正如 Wouter Coekaerts 和 Francois Gravel 正确指出的那样)。您运行的是哪个 Tomcat 版本?

Making matters even more confusing, current Tomcat versions reject paths that contain encodings of certain special characters, for security reasons.

更令人困惑的是,出于安全原因,当前的 Tomcat 版本拒绝包含某些特殊字符编码的路径。

回答by Powerlord

If there's a %2Fin the decodedurl, it means the encodedurl contained %252F.

如果有一个%2F解码的网址,这意味着编码包含URL %252F

Since %2Fis /Why not just split on "\/"and not worry about URL encoding?

既然%2F/为什么不直接拆分"\/"而不用担心 URL 编码呢?

回答by Francois Gravel

According to the Javadoc, getRequestURI should not decode the string. On the other hand, getServletPath return a decoded string. I tested this locally using Jetty and it behaves as described in the doc.

根据Javadoc, getRequestURI 不应解码字符串。另一方面, getServletPath 返回一个解码的字符串。我使用 Jetty 在本地对此进行了测试,它的行为如文档中所述。

So there might be something else at play in your situation since the behavior you're describing doesn't match the Sun documentation.

因此,由于您描述的行为与 Sun 文档不匹配,因此您的情况可能还有其他因素在起作用。

回答by stevedbrown

It seems like you are trying to do something RESTy (use Jersey). Can's you just parse off the leading and trailing parts of the URL to get the data you are looking for?

看起来您正在尝试做一些 RESTy(使用 Jersey)。您能否仅解析 URL 的前导和尾随部分以获取您要查找的数据?

url.substring(startLength, url.length - endLength);

url.substring(startLength, url.length - endLength);

回答by Wouter Coekaerts

Update:this answer was originally wrongly stating that '/' and '%2F' in a path should always be treated the same. They are in fact different because a path is a list of /-separated segments.

更新:这个答案最初错误地指出路径中的 '/' 和 '%2F' 应该始终被视为相同。它们实际上是不同的,因为路径是 / 分隔的段的列表。

You should not have to make a difference between an encoded and not encoded character in the path partof the URL. There is no character inside the path that can have a special meaning in a URL. E.g. '%2F' must be interpreted the same as '/', and a browser accessing such a URL is free to replace one by the other as it sees fit. Making a difference between them is breaking the standard of how URLs are encoded.

您不必在 URL的路径部分区分编码字符和未编码字符。路径中没有可以在 URL 中具有特殊含义的字符。例如,'%2F' 必须被解释为与 '/' 相同,并且访问这样的 URL 的浏览器可以自由地将一个替换为另一个,因为它认为合适。在它们之间进行区分是打破了 URL 编码方式的标准。

In the complete URL, you must make a difference between escaped and non-escape characters for different reasons, including:

在完整的 URL 中,出于不同的原因,您必须区分转义和非转义字符,包括:

  • To see where the path part ends. Because a ? encoded in the path should not be seen as the end.
  • Inside the query String. Because part of the value of a parameter could contain '&' or '=',...
  • Inside a path, a '/' separates two segments while '%2F' can be contained within a segment
  • 查看路径部分的结束位置。因为 ? 在路径中编码不应被视为结束。
  • 在查询字符串中。因为参数值的一部分可能包含“&”或“=”,...
  • 在路径内,'/' 分隔两个段,而 '%2F' 可以包含在一个段中

Java deals fine with the first two cases:

Java 可以很好地处理前两种情况:

  • getPathInfo()which returns only the path part, decoded
  • getParameter(String)to access parts of the query part
  • getPathInfo()只返回路径部分,解码
  • getParameter(String)访问查询部分的部分

It doesn't deal so well with the third case. If you want to make a difference between '/' as the separation of two path segments, and a '/' inside a path segment (%2F), then you cannot consistently represent the path as one decoded string. You can either represent it as one encoded string (eg "foo/bar%2Fbaz"), or as a list of decoded segments (eg "foo", "bar/baz"). But because getPathInfo() API promises to do just that (one decoded string), it has no choice but to treat '/' and '%2F' as the same.

对于第三种情况,它处理得不是很好。如果您想区分作为两个路径段的分隔的“/”和路径段内的“/”(%2F),那么您不能始终如一地将路径表示为一个解码字符串。您可以将其表示为一个编码字符串(例如“foo/bar%2Fbaz”),或者表示为已解码段的列表(例如“foo”、“bar/baz”)。但是因为 getPathInfo() API 承诺做到这一点(一个解码的字符串),它别无选择,只能将 '/' 和 '%2F' 视为相同。

For usual web applications, this is just fine. If you are in the rare case where you really need to make the difference, you can do your own parsing of the URL, getting the raw version with getRequestURI(). If that one gives the URL decoded as you claim, then that means there is a bug in the servlet implementation you're using.

对于通常的 Web 应用程序,这很好。如果您在极少数情况下确实需要有所作为,您可以自己解析 URL,使用getRequestURI(). 如果那个人提供了您声称的解码的 URL,那么这意味着您正在使用的 servlet 实现中存在错误。