MySQL 如何修复双重编码的 UTF8 字符(在 utf-8 表中)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11436594/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to fix double-encoded UTF8 characters (in an utf-8 table)
提问by vbence
A previous LOAD DATA INFILE
was run under the assumption that the CSV file is latin1
-encoded. During this import the multibyte characters were interpreted as two single character and then encoded using utf-8 (again).
前一个LOAD DATA INFILE
是在假设 CSV 文件已latin1
编码的情况下运行的。在此导入过程中,多字节字符被解释为两个单个字符,然后使用 utf-8(再次)进行编码。
This double-encoding created anomalies like ???±
instead of ?
.
这种双重编码产生了像???±
而不是?
.
How to correct these strings?
如何纠正这些字符串?
回答by vbence
The following MySQL function will return the correct utf8 string after double-encoding:
以下 MySQL 函数将在双重编码后返回正确的 utf8 字符串:
CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8)
It can be used with an UPDATE
statement to correct the fields:
它可以与UPDATE
语句一起使用以更正字段:
UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8);
回答by Eric
The above answer worked for some of my data, but resulted in a lot of NULL columns after running. My thought is if the conversion wasn't successful it returns null. To avoid that, I added a small check.
上面的答案适用于我的一些数据,但在运行后导致了很多 NULL 列。我的想法是,如果转换不成功,它会返回 null。为了避免这种情况,我添加了一个小支票。
UPDATE
tbl
SET
col =
CASE
WHEN CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8) IS NULL THEN col
ELSE CONVERT(CAST(CONVERT(col USING latin1) AS BINARY) USING utf8)
END
回答by smillien62
I meet this issue too, here a solution for Oracle:
我也遇到了这个问题,这里有一个针对 Oracle 的解决方案:
update tablename t set t.colname = convert(t.colname, 'WE8ISO8859P1', 'UTF8') where t.colname like '%?%'
And another one for Java:
另一个适用于 Java:
public static String fixDoubleEncoded(String text) {
final Pattern pattern = Pattern.compile("^.*?[^0-9a-zA-Z\ \t].*$");
try {
while (pattern.matcher(text).matches())
text = new String(text.getBytes("iso-8859-1"), "utf-8");
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
return text;
}
回答by burkul
well it is very important to use "utf8mb4" instead of "utf8" since mysql will strip out all the data after an unrecognized character. So the safer method is;
好吧,使用“utf8mb4”而不是“utf8”非常重要,因为mysql会在无法识别的字符后删除所有数据。所以更安全的方法是;
UPDATE tablename SET
field = CONVERT(CAST(CONVERT(field USING latin1) AS BINARY) USING utf8mb4);
be careful about this.
小心这一点。