MySQL 将 latin1 数据转换为 UTF8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1440837/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL Convert latin1 data to UTF8
提问by Kibbee
I imported some data using LOAD DATA INFILE into a MySQL Database. The table itself and the columns are using the UTF8 character set, but the default character set of the database is latin 1. Because the default character type of the database is latin1, and I used LOAD DATA INFILE without specifying a character set, it interpreted the file as latin1, even though the data in the file was UTF8. Now I have a bunch of badly encoded data in my UTF8 colum. I found this articlewhich seems to address a similar problem, which is "UTF8 inserted in cp1251", but my problem is "Latin1 inserted in UTF8". I've tried editing the queries there to convert the latin1 data to UTF8, but can't get it to work. Either the data comes out the same, or even more mangled than before. Just as an example, the word Québec is showing as Qu??bec.
我使用 LOAD DATA INFILE 将一些数据导入到 MySQL 数据库中。表本身和列都是使用UTF8字符集,但是数据库的默认字符集是latin 1。因为数据库的默认字符类型是latin1,我用的LOAD DATA INFILE没有指定字符集,所以解释了文件为 latin1,即使文件中的数据是 UTF8。现在我的 UTF8 列中有一堆编码错误的数据。我找到了这篇文章这似乎解决了一个类似的问题,即“在 cp1251 中插入的 UTF8”,但我的问题是“在 UTF8 中插入的拉丁1”。我尝试编辑那里的查询以将 latin1 数据转换为 UTF8,但无法使其工作。数据要么相同,要么比以前更混乱。举个例子,Québec 这个词显示为 Qu??bec。
[ADDITIONAL INFO]
[附加信息]
When Selecting the data wrapped in HEX(), Qu??bec has the value 5175C383C2A9626563.
选择 HEX() 包装的数据时,Qu??bec 的值为 5175C383C2A9626563。
The Create Table (shortened) of this table is.
这个表的创建表(缩写)是。
CREATE TABLE MyDBName.`MyTableName`
(
`ID` INT NOT NULL AUTO_INCREMENT,
.......
`City` CHAR(32) NULL,
.......
`)) ENGINE InnoDB CHARACTER SET utf8;
回答by luison
I've had cases like this in old wordpress installations with the problem being that the data itself was already in UTF-8 within a Latin1 database (due to WP default charset). This means there was no real need for conversion of the data but the ddbb and table formats. In my experience things get messed up when doing the dump as I understand MySQL will use the client's default character set which in many cases is now UTF-8. Therefore making sure that exporting with the same coding of the data is very important. In case of Latin1 DDBB with UTF-8 coding:
我在旧的 wordpress 安装中遇到过这样的情况,问题是数据本身已经在 Latin1 数据库中的 UTF-8 中(由于 WP 默认字符集)。这意味着实际上不需要转换数据,而是转换 ddbb 和表格格式。根据我的经验,在进行转储时事情会变得一团糟,因为我知道 MySQL 将使用客户端的默认字符集,在许多情况下现在是 UTF-8。因此,确保以相同的数据编码导出非常重要。如果使用 UTF-8 编码的 Latin1 DDBB:
$ mysqldump –default-character-set=latin1 –databases wordpress > m.sql
Then replace the Latin1 references within the exported dump before reimporting to a new database in UTF-8. Sort of:
然后在以 UTF-8 格式重新导入到新数据库之前替换导出转储中的 Latin1 引用。有点:
$ replace "CHARSET=latin1" "CHARSET=utf8" \
"SET NAMES latin1" "SET NAMES utf8" < m.sql > m2.sql
In my case this linkwas of great help. Commented here in spanish.
回答by newtover
Though it is hardly still actual for the OP, I happen to have found a solution in MySQL documentation for ALTER TABLE. I post it here just for future reference:
尽管对于 OP 来说它几乎仍然是实际的,但我碰巧在 MySQL 文档中找到了ALTER TABLE的解决方案。我把它贴在这里只是为了将来参考:
Warning
The CONVERT TO operation converts column values between the character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8). In this case, you have to do the following for each such column:
警告
CONVERT TO 操作在字符集之间转换列值。如果您在一个字符集中(如 latin1)有一列,但存储的值实际上使用了其他一些不兼容的字符集(如 utf8),这不是您想要的。在这种情况下,您必须对每个此类列执行以下操作:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8;
The reason this works is that there is no conversion when you convert to or from BLOB columns.
这样做的原因是当您转换为 BLOB 列或从 BLOB 列转换时没有转换。
回答by Vladislav Rastrusny
LOAD DATA INFILE allows you to set an encoding file is supposed to be in:
LOAD DATA INFILE 允许您设置编码文件应该在:
回答by saturngod
I wrote that http://code.google.com/p/mysqlutf8convertor/for Latin Database to UTF-8 Database. All tables and field to change UTF-8.
我将拉丁语数据库的http://code.google.com/p/mysqlutf8convertor/写为 UTF-8 数据库。所有表和字段更改为 UTF-8。
回答by DigitalRoss
Converting latin1 to UTF8 is not what you want to do, you kind of need the opposite.
将 latin1 转换为 UTF8 不是你想要做的,你需要相反的东西。
If what really happened was this:
如果真的发生了这样的事情:
- UTF-8 strings were interpreted as Latin-1 and transcoded to UTF-8, mangling them.
- You are now, or could be, reading UTF-8 strings with no further interpretation
- UTF-8 字符串被解释为 Latin-1 并转码为 UTF-8,对它们进行处理。
- 您现在或可能正在阅读 UTF-8 字符串而无需进一步解释
What you must do now is:
你现在必须做的是:
- Read the "UTF-8" with no transcode.
- Convert it to Latin-1. Now you should actually have the original UTF-8.
- Now put it in your "UTF-8" column with no further conversion.
- 阅读没有转码的“UTF-8”。
- 将其转换为Latin-1。现在您实际上应该拥有原始的 UTF-8。
- 现在将它放在您的“UTF-8”列中,无需进一步转换。
回答by user2192857
I recently completed a shell script that automates the conversion process. It is also configurable to write custom filters for any text you wish to replace or remove. For example : stripping HTML characters etc. Table whitelists and blacklists are also possible. You can download it at sourceforge: https://sourceforge.net/projects/mysqltr/
我最近完成了一个自动转换过程的 shell 脚本。还可以配置为要替换或删除的任何文本编写自定义过滤器。例如:剥离 HTML 字符等。表格白名单和黑名单也是可能的。你可以在 sourceforge 下载:https://sourceforge.net/projects/mysqltr/
回答by Fran?ois
Try this:
尝试这个:
1) Dump your DB
1)转储你的数据库
mysqldump --default-character-set=latin1 -u username -p databasename < dump.sql
2) Open dump.sql in text editor and replace all occurences of "SET NAMES latin1" by "SET NAMES utf8"
2) 在文本编辑器中打开 dump.sql 并将所有出现的“SET NAMES latin1”替换为“SET NAMES utf8”
3) Create a new database and restore your dumpfile
3) 创建一个新数据库并恢复您的转储文件
cat dump.sql | mysql -u root -p newdbname