Java 将 Windows-1252 转换为 UTF-8,有些字母是错误的

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23082522/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 20:34:35  来源:igfitidea点击:

Java convert Windows-1252 to UTF-8, some letters are wrong

javautf-8utf8-decodewindows-1252

提问by Ramon

I receive data from external Microsoft SQL 2008 Data base (I make Queries with MyBatis). In theroy I receive data encoding on "Windows-1252".

我从外部 Microsoft SQL 2008 数据库接收数据(我使用 MyBatis 进行查询)。理论上,我在“Windows-1252”上接收数据编码。

I try decoded data with this code:

我尝试使用以下代码解码数据:

String textoFormado = ...value from MyBatis... ;

String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

String textoFormado = ...来自 MyBatis 的值... ;

String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

Almost all the String is correctly decoded. But some letter with acents not.

几乎所有的字符串都被正确解码。但有些字母没有重音符号。

For Example:

例如:

  1. I Receive from Data base this String: "??vila"
  2. I use the above code and this make this String: "??vila"
  3. I expected this String: "ávila"
  1. 我从数据库收到这个字符串:“??vila”
  2. 我使用上面的代码,这使得这个字符串:“??vila”
  3. 我期待这个字符串:“ávila”

采纳答案by Ramon

I solved it thanks to all.

谢谢大家,我解决了。

I have the next project structure:

我有下一个项目结构

  • MyBatisQueries: I have a query with a "select" which gives me the String
  • Pojo to save the String (which gave me the String with conversion problems)
  • The class which uses the query and the Pojo object with data (that showed me bad decoded)
  • MyBatisQueries:我有一个带有“选择”的查询,它给了我字符串
  • Pojo 保存字符串(这给了我转换问题的字符串)
  • 使用查询和带有数据的 Pojo 对象的类(显示我解码不好)

at first I had(MyBatis and Spring inject dependencies and params):

起初我有(MyBatis 和 Spring 注入依赖项和参数):

public class Pojo {
    private String params;
    public void setParams(String params) {
        try {
            this.params = params;
        }
    }

}

The solution:

解决方案:

public class Pojo {
    private String params;
    public void setParams(byte[] params) {
        try {
            this.params = new String(params, "UTF-8");
        } catch (UnsupportedEncodingException e) {
            this.params = null;
        }
    }

}

回答by Seelenvirtuose

Obviously, textoFormadois a variable of type String. This means that the bytes were already decoded. Java then internally uses a 16-bit Unicode representation. What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding. That does not work.

显然,textoFormado是一个类型的变量String。这意味着字节已经被解码。然后 Java 在内部使用 16 位 Unicode 表示。你所做的是用 Windows-1252 编码你的字符串,然后用 UTF-8 编码读取结果字节。那行不通。

What you need is the correct encoding when reading the bytes:

您需要的是读取字节时的正确编码:

byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");

For using this string inside your program, you do not need to do anything. Simply use it. If - however - you want to write the data back to a file for example, you need to encode again:

要在您的程序中使用此字符串,您无需执行任何操作。只需使用它。如果 - 但是 - 例如,您想将数据写回文件,则需要再次编码:

byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here