java 如何将 UTF8 转换为 Unicode
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4049740/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert UTF8 to Unicode
提问by Rob Hufschmitt
I try to convert a UTF8 string to a Java Unicode string.
我尝试将 UTF8 字符串转换为 Java Unicode 字符串。
String question = request.getParameter("searchWord");
byte[] bytes = question.getBytes();
question = new String(bytes, "UTF-8");
The input are Chinese Characters and when I compare the hex code of each caracter it is the same Chinses character. So I'm pretty sure that the charset is UTF8.
输入的是汉字,当我比较每个字符的十六进制代码时,它是相同的汉字。所以我很确定字符集是UTF8。
Where do I go wrong?
我哪里出错了?
回答by Jon Skeet
There's no such thing as a "UTF-8 string" in Java. Everything is in Unicode.
Java 中没有“UTF-8 字符串”这样的东西。一切都是Unicode。
When you call String.getBytes()
without specifying an encoding, that uses the platform default encoding - that's almost always a bad idea.
当您在String.getBytes()
未指定编码的情况下调用时,将使用平台默认编码 - 这几乎总是一个坏主意。
You shouldn't have to do anything to get the right characters here - the request should be handling it all for you. If it's not doing so, then chances are it's lost data already.
你不应该做任何事情来获得正确的字符 - 请求应该为你处理这一切。如果它没有这样做,那么它很可能已经丢失了数据。
Could you give an example of what's actually going wrong? Specify the Unicode values of the charactersin the string you're receiving (e.g. by using toCharArray()
and then converting each char
to an int
) and what you expected to receive.
你能举一个例子来说明实际出了什么问题吗?指定您正在接收的字符串中字符的 Unicode 值(例如,通过使用toCharArray()
然后将每个字符转换char
为int
)以及您希望接收的内容。
EDIT: To diagnose this, use something like this:
编辑:要诊断这一点,请使用以下内容:
public static void dumpString(String text) {
for (int i = 0; i < text.length(); i++) {
System.out.println(i + ": " + (int) text.charAt(i));
}
}
Note that that will give the decimalvalue of each Unicode character. If you have a handy hex library method around, you may want to use that to give you the hex value. The main point is that it will dump the Unicodecharacters in the string.
请注意,这将给出每个 Unicode 字符的十进制值。如果你有一个方便的十六进制库方法,你可能想用它来给你十六进制值。主要的一点是它会转储字符串中的Unicode字符。
回答by Alex Jasmin
First make sure that the data is actually encoded as UTF-8.
首先确保数据实际上编码为 UTF-8。
There are some inconsistency between browsers regarding the encoding used when sending HTML form data. The safest way to send UTF-8 encoded data from a web form is to put that form on a page that is served with the Content-Type: text/html; charset=utf-8
header or contains a <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
meta tag.
关于发送 HTML 表单数据时使用的编码,浏览器之间存在一些不一致。从 Web 表单发送 UTF-8 编码数据的最安全方法是将该表单放在带有Content-Type: text/html; charset=utf-8
标题或包含<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
元标记的页面上。
Now to properly decode the data call request.setCharacterEncoding("UTF-8")
in your servlet before the first call to request.getParameter()
.
现在,request.setCharacterEncoding("UTF-8")
在第一次调用request.getParameter()
.
The servlet container takes care of the encoding for you. If you use setCharacterEncoding()
properly you can expect getParameter()
to return normal Java strings.
servlet 容器会为您处理编码。如果使用setCharacterEncoding()
得当,您可以期望getParameter()
返回正常的 Java 字符串。
回答by endryha
Also you may need a special filter which will take care of encoding of your requests. For example such filter exists in spring framework org.springframework.web.filter.CharacterEncodingFilter
此外,您可能需要一个特殊的过滤器来处理您的请求的编码。例如这种过滤器存在于 spring 框架中org.springframework.web.filter.CharacterEncodingFilter
回答by Michael Konietzka
String question = request.getParameter("searchWord");
is all you have to do in your servlet code. At this point you have not to deal with encodings, charsets etc. This is all handled by the servlet-infrastucture. When you notice problems like displaying ?, ?, ?? somewhere, there is maybe something wrong with request the client sent. But without knowing something of the infrastructure or the logged HTTP-traffic, it is hard to tell what is wrong.
这就是您在 servlet 代码中所需要做的全部工作。在这一点上,您不必处理编码、字符集等。这些都由 servlet-infrastucture 处理。当您发现显示 ?, ?, ?? 等问题时 某处,客户端发送的请求可能有问题。但是,如果不了解基础设施或记录的 HTTP 流量,就很难判断哪里出了问题。
回答by rogerdpack
possibly.
可能。
question = new String(bytes, "UNICODE");