java servlet 请求参数字符编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11100107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
servlet request parameter character encoding
提问by Dónal
I have a Java servlet that receives data from an upstream system via a HTTP GET request. This request includes a parameter named "text". If the upstream system sets this parameter to:
我有一个 Java servlet,它通过 HTTP GET 请求从上游系统接收数据。该请求包括一个名为“text”的参数。如果上游系统将此参数设置为:
TEST3 please ignore:
It appears in the logs of the upstream system as:
它在上游系统的日志中显示为:
00 54 00 45 00 53 00 54 00 33 00 20 00 70 00 6c //TEST3 pl
00 65 00 61 00 73 00 65 00 20 00 69 00 67 00 6e //ease ign
00 6f 00 72 00 65 00 3a //ore:
(The // comments do not actually appear in the logs)
(// 注释实际上并未出现在日志中)
In my servlet I read this parameter with:
在我的 servlet 中,我使用以下命令读取了此参数:
String text = request.getParameter("text");
If I print the value of text
to the console, it appears as:
如果我将 的值打印text
到控制台,它显示为:
T E S T 3 p l e a s e i g n o r e :
If I inspect the value of text
in the debugger, it appears as:
如果我检查text
调试器中的值,它显示为:
\u000T\u000E\u000S\u000T\u0003\u0000 \u000p\u000l\u000e\u000a\u000s\u000e\u0000
\u000i\u000g\u000n\u000o\u000r\u000e\u000:
So it seems that there's a problem with the character encoding. The upstream system is supposed to use UTF-16. My guess is that the servlet is assuming UTF-8 and therefore is reading twice the number of characters it should be. For the message "TEST3 please ignore:" the first byte of each character is 00
. This is being interpreted as a space when read by the servlet, which explains the space that appears before each character when the message is logged by the servlet.
所以看起来是字符编码有问题。上游系统应该使用 UTF-16。我的猜测是 servlet 假设是 UTF-8,因此读取的字符数是它应该的两倍。对于消息“TEST3 请忽略:”每个字符的第一个字节是00
. 这在被 servlet 读取时被解释为一个空格,这解释了当 servlet 记录消息时出现在每个字符之前的空格。
Obviously my goal is simply to get the message "TEST3 please ignore:" when I read the text
request param. My guess is that I could achieve this by specifying the character encoding of the request parameter, but I don't know how to do this.
显然,我的目标只是在我阅读text
请求参数时获得消息“TEST3 请忽略:” 。我的猜测是我可以通过指定请求参数的字符编码来实现这一点,但我不知道如何做到这一点。
回答by letonai
Use like this
像这样使用
new String(req.getParameter("<my request value>").getBytes("ISO-8859-1"),"UTF-8")
回答by Petr Mensik
Try to use Filter for this
尝试为此使用过滤器
public class CustomCharacterEncodingFilter implements Filter {
public void init(FilterConfig config) throws ServletException {
}
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {
request.setCharacterEncoding("UTF-8");
response.setCharacterEncoding("UTF-8");
chain.doFilter(request, response);
}
public void destroy() {
}
This should set encoding right for whole application
这应该为整个应用程序设置编码权限
回答by epoch
Looks like it was encoded with UTF-16LE
(Little Endian) encoding, here is a class that successfully prints your string:
看起来它是用UTF-16LE
(Little Endian)编码进行编码的,这是一个成功打印字符串的类:
import java.io.UnsupportedEncodingException;
import java.math.BigInteger;
public class Test {
public static void main(String[] args) throws UnsupportedEncodingException {
String hex = "00 54 00 45 00 53 00 54 00 33 00 20 00 70 00 6c" +
"00 65 00 61 00 73 00 65 00 20 00 69 00 67 00 6e" +
"00 6f 00 72 00 65 00 3a"; // + " 00";
System.out.println(new String(new BigInteger(hex.replaceAll(" ", ""), 16).toByteArray(), "UTF-16LE"));
}
}
Output:
输出:
TEST3 please ignore?
Output with two zero's added to the input
将两个零添加到输入的输出
TEST3 please ignore:
UPDATE
更新
To get this working with your Servlet
you can try:
要与您一起使用,您Servlet
可以尝试:
String value = request.getParameter("text");
try {
value = new String(value.getBytes(), "UTF-16LE");
} catch(java.io.UnsupportedEncodingException ex) {}
UPDATE
更新
see the following link, it verifies that the hex produced is in fact UTF-16LE
请参阅以下链接,它验证生成的十六进制实际上是UTF-16LE