Java 将 Windows-1252 转换为 UTF-8,有些字母是错误的
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23082522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java convert Windows-1252 to UTF-8, some letters are wrong
提问by Ramon
I receive data from external Microsoft SQL 2008 Data base (I make Queries with MyBatis). In theroy I receive data encoding on "Windows-1252".
我从外部 Microsoft SQL 2008 数据库接收数据(我使用 MyBatis 进行查询)。理论上,我在“Windows-1252”上接收数据编码。
I try decoded data with this code:
我尝试使用以下代码解码数据:
String textoFormado = ...value from MyBatis... ;
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");
String textoFormado = ...来自 MyBatis 的值... ;
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");
Almost all the String is correctly decoded. But some letter with acents not.
几乎所有的字符串都被正确解码。但有些字母没有重音符号。
For Example:
例如:
- I Receive from Data base this String: "??vila"
- I use the above code and this make this String: "??vila"
- I expected this String: "ávila"
- 我从数据库收到这个字符串:“??vila”
- 我使用上面的代码,这使得这个字符串:“??vila”
- 我期待这个字符串:“ávila”
采纳答案by Ramon
I solved it thanks to all.
谢谢大家,我解决了。
I have the next project structure:
我有下一个项目结构:
- MyBatisQueries: I have a query with a "select" which gives me the String
- Pojo to save the String (which gave me the String with conversion problems)
- The class which uses the query and the Pojo object with data (that showed me bad decoded)
- MyBatisQueries:我有一个带有“选择”的查询,它给了我字符串
- Pojo 保存字符串(这给了我转换问题的字符串)
- 使用查询和带有数据的 Pojo 对象的类(显示我解码不好)
at first I had(MyBatis and Spring inject dependencies and params):
起初我有(MyBatis 和 Spring 注入依赖项和参数):
public class Pojo {
private String params;
public void setParams(String params) {
try {
this.params = params;
}
}
}
The solution:
解决方案:
public class Pojo {
private String params;
public void setParams(byte[] params) {
try {
this.params = new String(params, "UTF-8");
} catch (UnsupportedEncodingException e) {
this.params = null;
}
}
}
回答by Seelenvirtuose
Obviously, textoFormado
is a variable of type String
. This means that the bytes were already decoded. Java then internally uses a 16-bit Unicode representation. What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding. That does not work.
显然,textoFormado
是一个类型的变量String
。这意味着字节已经被解码。然后 Java 在内部使用 16 位 Unicode 表示。你所做的是用 Windows-1252 编码你的字符串,然后用 UTF-8 编码读取结果字节。那行不通。
What you need is the correct encoding when reading the bytes:
您需要的是读取字节时的正确编码:
byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");
For using this string inside your program, you do not need to do anything. Simply use it. If - however - you want to write the data back to a file for example, you need to encode again:
要在您的程序中使用此字符串,您无需执行任何操作。只需使用它。如果 - 但是 - 例如,您想将数据写回文件,则需要再次编码:
byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here