java servlet 编码问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4296654/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 05:44:47  来源:igfitidea点击:

encoding problem in servlet

javaunicodeservletscharacter-encoding

提问by hguser

I have a servlet which receive some parameter from the client ,then do some job. And the parameter from the client is Chinese,so I often got some invalid characters in the servet. For exmaple: If I enter

我有一个 servlet,它从客户端接收一些参数,然后做一些工作。而且客户端的参数是中文的,所以我在服务端经常遇到一些无效的字符。例如:如果我输入

http://localhost:8080/Servlet?q=中文&type=test

Then in the servlet,the parameter of 'type' is correct(test),however the parameter of 'q' is not correctly encoding,they become invalid characters that can not parsed.

然后在servlet中,'type'的参数是正确的(test),但是'q'的参数没有正确编码,它们成为无法解析的无效字符。

However if I enter the adderss bar again,the url will changed to :

但是,如果我再次进入 adderss 栏,网址将更改为:

http://localhost:8080/Servlet?q=%D6%D0%CE%C4&type=test

Now my servlet will get the right parameter of 'q'.

现在我的 servlet 将获得正确的 'q' 参数。

What is the problem?

问题是什么?

UPDATE

UPDATE

BTW,it words well when I send the form with post. WHen I send them in the ajax,for example:

顺便说一句,当我用邮寄形式发送表格时,它的话很好。当我在 ajax 中发送它们时,例如:

url="http://..q='中文',
xmlhttp.open("POST",url,true); 

Then the server side also get the invalid characters.

然后服务器端也得到无效字符。

It seems that just when the Chinese character are encoded like %xx,the server side can get the right result.

好像只有当汉字被编码成 %xx 时,服务器端才能得到正确的结果。

That's to say http://.../q=中文does not work, http://.../q=%D6%D0%CE%C4work.

这就是说http://.../q=中文不起作用, http://.../q=%D6%D0%CE%C4工作。

But why "http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=" work? alt text

但为什么“ http://www.google.com.hk/search?hl=zh-CN&newwindow=1&safe=strict&q=%E4%B8%AD%E6%96%87&btnG=Google+%E6%90%9C%E7%B4%A2&aq=f&aqi=&aql=&oq=&gs_rfai=”有效? 替代文字

回答by BalusC

Ensure that the encoding of the page with the form itself is also UTF-8 and ensure that the browser is instructed to read the page as UTF-8. Assuming that it's JSP, just put this in very top of the page to achieve that:

确保带有表单本身的页面的编码也是 UTF-8,并确保指示浏览器将页面读取为 UTF-8。假设它是 JSP,只需将其放在页面的最顶部即可实现:

<%@ page pageEncoding="UTF-8" %>

Then, to process GET query string as UTF-8, ensure that the servletcontainer in question is configured to do so. It's unclear which one you're using, so here's a Tomcat example: set the URIEncodingattribute of the <Connector>element in /conf/server.xmlto UTF-8.

然后,要将 GET 查询字符串作为 UTF-8 处理,请确保相关的 servletcontainer 已配置为这样做。不清楚您使用的是哪个,所以这里有一个 Tomcat 示例:将元素的URIEncoding属性设置为.<Connector>/conf/server.xmlUTF-8

<Connector URIEncoding="UTF-8">

For the case that you'd like to use POST, then you need to ensure that the HttpServletRequestis instructed to parse the POST request body using UTF-8.

对于您想使用 POST 的情况,您需要确保HttpServletRequest指示使用 UTF-8 解析 POST 请求正文。

request.setCharacterEncoding("UTF-8");

Call this beforeyou access the first parameter. A Filteris the best place for this.

访问第一个参数之前调用它。AFilter是最好的地方。

See also:

也可以看看:

回答by Michael Borgwardt

Using non-ASCII characters as GET parameters (i.e. in URLs) is generally problematic. RFC 3986 recommends using UTF-8 and then percent encoding, but that's AFAIK not an official standard. And what you are using in the case where it works isn'tUTF-8!

使用非 ASCII 字符作为 GET 参数(即在 URL 中)通常是有问题的。RFC 3986 建议使用 UTF-8,然后使用百分比编码,但 AFAIK 不是官方标准。你在它工作的情况下使用的不是UTF-8!

It would probably be safest to switch to POST requests.

切换到 POST 请求可能是最安全的。

回答by AlexR

I believe that the problem is on sending side. As I understood from your description if you are writing the URL in browser you get "correctly" encoded request. This job is done by browser: it knows to convert unicode characters to sequence of codes like %xx.

我相信问题出在发送方。正如我从您的描述中了解到的,如果您在浏览器中编写 URL,您会得到“正确”编码的请求。这项工作由浏览器完成:它知道将 unicode 字符转换为诸如 %xx 之类的代码序列。

So, try to check how do you send the request. It should be encoded on sending.

因此,请尝试检查您如何发送请求。它应该在发送时编码。

Other possibility is to use POST method instead of GET.

另一种可能性是使用 POST 方法而不是 GET。

回答by thotheolh

Do read this article on URL encoding format "www.blooberry.com/indexdot/html/topics/urlencoding.htm".

请阅读有关 URL 编码格式“www.blooberry.com/indexdot/html/topics/urlencoding.htm”的这篇文章。

If you want, you could convert characters to hex or Base64 and put them in the parameters of the URL.

如果需要,您可以将字符转换为十六进制或 Base64 并将它们放入 URL 的参数中。

I think it's better to put them in the body (Post) then the URL (Get).

我认为最好将它们放在正文中 (Post) 然后放在 URL (Get) 中。