MySQL 如何修复双重编码的 UTF8 字符（在 utf-8 表中）

Question

提问by vbence

A previous LOAD DATA INFILEwas run under the assumption that the CSV file is latin1-encoded. During this import the multibyte characters were interpreted as two single character and then encoded using utf-8 (again).

前一个LOAD DATA INFILE是在假设 CSV 文件已latin1编码的情况下运行的。在此导入过程中，多字节字符被解释为两个单个字符，然后使用 utf-8（再次）进行编码。

This double-encoding created anomalies like ???±instead of ?.

这种双重编码产生了像???±而不是?.

How to correct these strings?

如何纠正这些字符串？

Answer 1

回答by vbence

The following MySQL function will return the correct utf8 string after double-encoding:

以下 MySQL 函数将在双重编码后返回正确的 utf8 字符串：

CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)

It can be used with an UPDATEstatement to correct the fields:

它可以与UPDATE语句一起使用以更正字段：

UPDATE tablename SET
    field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);

Answer 2

回答by Eric

The above answer worked for some of my data, but resulted in a lot of NULL columns after running. My thought is if the conversion wasn't successful it returns null. To avoid that, I added a small check.

上面的答案适用于我的一些数据，但在运行后导致了很多 NULL 列。我的想法是，如果转换不成功，它会返回 null。为了避免这种情况，我添加了一个小支票。

UPDATE
    tbl

SET
    col =
    CASE
        WHEN CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8) IS NULL THEN col
        ELSE CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8)
    END

Answer 3

回答by smillien62

I meet this issue too, here a solution for Oracle:

我也遇到了这个问题，这里有一个针对 Oracle 的解决方案：

update tablename t set t.colname = convert(t.colname, 'WE8ISO8859P1', 'UTF8') where t.colname like '%?%'

And another one for Java:

另一个适用于 Java：

public static String fixDoubleEncoded(String text) {
    final Pattern pattern = Pattern.compile("^.*?[^0-9a-zA-Z\ \t].*$");
    try {
        while (pattern.matcher(text).matches())
            text = new String(text.getBytes("iso-8859-1"), "utf-8");
    }
    catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    return text;
}

Answer 4

回答by burkul

well it is very important to use "utf8mb4" instead of "utf8" since mysql will strip out all the data after an unrecognized character. So the safer method is;

好吧，使用“utf8mb4”而不是“utf8”非常重要，因为mysql会在无法识别的字符后删除所有数据。所以更安全的方法是；

UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8mb4);

be careful about this.

小心这一点。

MySQL 如何修复双重编码的 UTF8 字符（在 utf-8 表中）

提问by vbence

回答by vbence

回答by Eric

回答by smillien62

回答by burkul

相关推荐

最近更新

标签

MySQL 如何修复双重编码的 UTF8 字符（在 utf-8 表中）

提问by vbence

回答by vbence

回答by Eric

回答by smillien62

回答by burkul

相关推荐

MySQL MySQL开启与使用？

无法使用在 mysql 中创建的用户登录

在 MySQL Workbench EER 图中的多列上创建 UNIQUE 约束

MySQL SQL SELECT 按 2 列排序和分组依据

相关推荐

最近更新

标签