MySQL 如何修复双重编码的 UTF8 字符(在 utf-8 表中)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11436594/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 14:08:12  来源:igfitidea点击:

How to fix double-encoded UTF8 characters (in an utf-8 table)

mysqlstringutf-8character-encoding

提问by vbence

A previous LOAD DATA INFILEwas run under the assumption that the CSV file is latin1-encoded. During this import the multibyte characters were interpreted as two single character and then encoded using utf-8 (again).

前一个LOAD DATA INFILE是在假设 CSV 文件已latin1编码的情况下运行的。在此导入过程中,多字节字符被解释为两个单个字符,然后使用 utf-8(再次)进行编码。

This double-encoding created anomalies like ???±instead of ?.

这种双重编码产生了像???±而不是?.

How to correct these strings?

如何纠正这些字符串?

回答by vbence

The following MySQL function will return the correct utf8 string after double-encoding:

以下 MySQL 函数将在双重编码后返回正确的 utf8 字符串:

CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)

It can be used with an UPDATEstatement to correct the fields:

它可以与UPDATE语句一起使用以更正字段:

UPDATE tablename SET
    field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);

回答by Eric

The above answer worked for some of my data, but resulted in a lot of NULL columns after running. My thought is if the conversion wasn't successful it returns null. To avoid that, I added a small check.

上面的答案适用于我的一些数据,但在运行后导致了很多 NULL 列。我的想法是,如果转换不成功,它会返回 null。为了避免这种情况,我添加了一个小支票。

UPDATE
    tbl

SET
    col =
    CASE
        WHEN CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8) IS NULL THEN col
        ELSE CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8)
    END

回答by smillien62

I meet this issue too, here a solution for Oracle:

我也遇到了这个问题,这里有一个针对 Oracle 的解决方案:

update tablename t set t.colname = convert(t.colname, 'WE8ISO8859P1', 'UTF8') where t.colname like '%?%'

And another one for Java:

另一个适用于 Java:

public static String fixDoubleEncoded(String text) {
    final Pattern pattern = Pattern.compile("^.*?[^0-9a-zA-Z\ \t].*$");
    try {
        while (pattern.matcher(text).matches())
            text = new String(text.getBytes("iso-8859-1"), "utf-8");
    }
    catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    return text;
}

回答by burkul

well it is very important to use "utf8mb4" instead of "utf8" since mysql will strip out all the data after an unrecognized character. So the safer method is;

好吧,使用“utf8mb4”而不是“utf8”非常重要,因为mysql会在无法识别的字符后删除所有数据。所以更安全的方法是;

UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8mb4);

be careful about this.

小心这一点。