如何处理java中的字符串编码?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1365649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to handle string encoding in java?
提问by MemoryLeak
I was really discouraged by java's string encoding. There are many auto conversions in it. and I can't found the regular. Anyone have good idea? for example: In a jsp page, it has such link
java的字符串编码真的让我气馁。里面有很多自动转换。我找不到常规的。有人有好主意吗?例如:在一个jsp页面中,它有这样的链接
http://localhost:8080/helloworld/hello?world=凹ㄉ
And then we need to process it, so we do this:
然后我们需要处理它,所以我们这样做:
String a = new String(request.getParameter("world").toString().getBytes("ISO-8859-1"),
"UTF-8");
a = "http://localhost/" + a;
And when I debug it, I found a is right.
当我调试它时,我发现 a 是正确的。
And then I pass this to a session object: request.getSession().setAttribute("hello", a);
然后我将它传递给一个会话对象: request.getSession().setAttribute("hello", a);
Later in a jsp page with encoding "Big5", and i try to get the attribute and display, And i found the characters "凹ㄉ" are corrupted.
后来在一个编码为“Big5”的jsp页面中,我尝试获取属性并显示,我发现“凹ㄉ”字符已损坏。
How can I solve this?
我该如何解决这个问题?
采纳答案by Yishai
That is not how you convert between character sets. What you need to be worrying about is this part:
这不是您在字符集之间转换的方式。你需要担心的是这部分:
request.getParameter("world").toString().getBytes("ISO-8859-1")
Once you have it as a string, it is stored internally as 16 bit unicode. Getting it as bytes and then telling java to treat those bytes as if they were UTF-8 is not going to do anything good.
一旦你把它作为一个字符串,它就会在内部存储为 16 位 unicode。将其作为字节获取然后告诉 java 将这些字节视为 UTF-8 不会有任何好处。
If you found it to be fine, that is just a coincidence. Once you call that getParameter("world").toString() you have your unicode string. The further decoding and encoding will just break certain characters, it just happens to not break yours.
如果你发现它很好,那只是一个巧合。一旦你调用那个 getParameter("world").toString() 你就有了你的 unicode 字符串。进一步的解码和编码只会破坏某些字符,而不会破坏你的字符。
The question is how you get that attribute to display later? You say the jsp page's encoding is not unicode, but rather Big5, so what are you doing to get that string out of the attribute map and put it on that page? That is the likely source of the problem. Given the misunderstanding about how to handle the character conversion in getting the parameter, it would be likely that there are some mistakes on that Big5 page as well.
问题是如何让该属性稍后显示?你说jsp页面的编码不是unicode,而是Big5,那你怎么把那个字符串从属性映射中取出来放到那个页面上呢?这就是问题的可能根源。鉴于在获取参数时对如何处理字符转换的误解,很可能在那个Big5页面上也有一些错误。
By the way, do you really need to use Big5? Would UTF-16 work (if not UTF-8)? It could certainly remove some headaches.
顺便问一下,你真的需要使用Big5吗?UTF-16 会工作吗(如果不是 UTF-8)?它当然可以消除一些头痛。
回答by Shweta Bhat
The following code will work
以下代码将起作用
String a = new String(request.getParameter("world").toString().getBytes("ISO-8859-1"),
"UTF-16");
回答by mihai_f87
The way I handle encodings in Java is by not allowing text encoded in something other than UTF-8 to be uploaded to my site. This is how I do it:
我在 Java 中处理编码的方式是不允许以 UTF-8 以外的其他方式编码的文本上传到我的网站。这就是我的做法:
try {
CharsetDecoder charsetDecoder = StandardCharsets.UTF_8.newDecoder();
charsetDecoder.onMalformedInput(CodingErrorAction.REPORT);
return IOUtils.toString(new InputStreamReader(new FileInputStream(filePath), charsetDecoder));
}
catch (MalformedInputException e) {
// throw an exception saying the file was not saved with UTF-8 encoding.
}
I recommend reading https://www.baeldung.com/java-char-encoding. It contains a very good summary of what you need to know regarding String encoding in Java.
我建议阅读https://www.baeldung.com/java-char-encoding。它包含了您需要了解的有关 Java 中字符串编码的非常好的摘要。