Java 将 Windows-1252 转换为 UTF-8，有些字母是错误的

Question

提问by Ramon

I receive data from external Microsoft SQL 2008 Data base (I make Queries with MyBatis). In theroy I receive data encoding on "Windows-1252".

我从外部 Microsoft SQL 2008 数据库接收数据（我使用 MyBatis 进行查询）。理论上，我在“Windows-1252”上接收数据编码。

I try decoded data with this code:

我尝试使用以下代码解码数据：

String textoFormado = ...value from MyBatis... ;
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

String textoFormado = ...来自 MyBatis 的值... ;
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");

Almost all the String is correctly decoded. But some letter with acents not.

几乎所有的字符串都被正确解码。但有些字母没有重音符号。

For Example:

例如：

I Receive from Data base this String: "??vila"
I use the above code and this make this String: "??vila"
I expected this String: "ávila"

我从数据库收到这个字符串：“??vila”
我使用上面的代码，这使得这个字符串：“??vila”
我期待这个字符串：“ávila”

Answer 1

采纳答案by Ramon

I solved it thanks to all.

谢谢大家，我解决了。

I have the next project structure:

我有下一个项目结构：

MyBatisQueries: I have a query with a "select" which gives me the String
Pojo to save the String (which gave me the String with conversion problems)
The class which uses the query and the Pojo object with data (that showed me bad decoded)

MyBatisQueries：我有一个带有“选择”的查询，它给了我字符串
Pojo 保存字符串（这给了我转换问题的字符串）
使用查询和带有数据的 Pojo 对象的类（显示我解码不好）

at first I had(MyBatis and Spring inject dependencies and params):

起初我有（MyBatis 和 Spring 注入依赖项和参数）：

public class Pojo {
    private String params;
    public void setParams(String params) {
        try {
            this.params = params;
        }
    }

}

The solution:

解决方案：

public class Pojo {
    private String params;
    public void setParams(byte[] params) {
        try {
            this.params = new String(params, "UTF-8");
        } catch (UnsupportedEncodingException e) {
            this.params = null;
        }
    }

}

Answer 2

回答by Seelenvirtuose

Obviously, textoFormadois a variable of type String. This means that the bytes were already decoded. Java then internally uses a 16-bit Unicode representation. What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding. That does not work.

显然，textoFormado是一个类型的变量String。这意味着字节已经被解码。然后 Java 在内部使用 16 位 Unicode 表示。你所做的是用 Windows-1252 编码你的字符串，然后用 UTF-8 编码读取结果字节。那行不通。

What you need is the correct encoding when reading the bytes:

您需要的是读取字节时的正确编码：

byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");

For using this string inside your program, you do not need to do anything. Simply use it. If - however - you want to write the data back to a file for example, you need to encode again:

要在您的程序中使用此字符串，您无需执行任何操作。只需使用它。如果 - 但是 - 例如，您想将数据写回文件，则需要再次编码：

byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here

Java 将 Windows-1252 转换为 UTF-8，有些字母是错误的

提问by Ramon

采纳答案by Ramon

回答by Seelenvirtuose

相关推荐

最近更新

标签

Java 将 Windows-1252 转换为 UTF-8，有些字母是错误的

提问by Ramon

采纳答案by Ramon

回答by Seelenvirtuose

相关推荐

用于 if 条件的 Java Lambda 表达式 - 此处不期望

Java Android，如何在 OnClick 中从 TextView 获取文本

Java org.omg.CORBA.TRANSIENT：初始和转发的 IOR 无法访问 vmcid：IBM 次要代码：来自独立应用程序的 E07

Java 可以使用 iText 将 pdf 连接/合并在一起的功能 - 导致一些问题

相关推荐

最近更新

标签