MySQL 如何在整个数据库中更改 CHARACTER SET(和 COLLATION)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5906585/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to change the CHARACTER SET (and COLLATION) throughout a database?
提问by Jeg Bagus
Our previous programmer set the wrong collation in a table (Mysql). He set it up with Latin collation, when it should be UTF8, and now I have issues. Every record with Chinese and Japan character turn to ??? character.
我们以前的程序员在表(Mysql)中设置了错误的排序规则。他用拉丁语排序设置它,当它应该是 UTF8 时,现在我有问题。中日文的每张唱片都转了???特点。
Is possible to change collation and get back the detail of character?
是否可以更改排序规则并取回字符的详细信息?
回答by Timo Huovinen
change database collation:
更改数据库排序规则:
ALTER DATABASE <database_name> CHARACTER SET utf8 COLLATE utf8_unicode_ci;
change table collation:
更改表排序规则:
ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
change column collation:
更改列排序规则:
ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;
More info:
更多信息:
回答by David Whittaker
Heres how to change all databases/tables/columns. Run these queries and they will output all of the subsequent queries necessary to convert your entire schema to utf8. Hope this helps!
这是更改所有数据库/表/列的方法。运行这些查询,它们将输出将整个架构转换为 utf8 所需的所有后续查询。希望这可以帮助!
-- Change DATABASE Default Collation
-- 更改数据库默认排序规则
SELECT DISTINCT concat('ALTER DATABASE `', TABLE_SCHEMA, '` CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.tables
where TABLE_SCHEMA like 'database_name';
-- Change TABLE Collation / Char Set
-- 更改表排序规则/字符集
SELECT concat('ALTER TABLE `', TABLE_SCHEMA, '`.`', table_name, '` CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.tables
where TABLE_SCHEMA like 'database_name';
-- Change COLUMN Collation / Char Set
-- 更改 COLUMN 排序规则/字符集
SELECT concat('ALTER TABLE `', t1.TABLE_SCHEMA, '`.`', t1.table_name, '` MODIFY `', t1.column_name, '` ', t1.data_type , '(' , t1.CHARACTER_MAXIMUM_LENGTH , ')' , ' CHARACTER SET utf8 COLLATE utf8_unicode_ci;')
from information_schema.columns t1
where t1.TABLE_SCHEMA like 'database_name' and t1.COLLATION_NAME = 'old_charset_name';
回答by bluecollarcoder
Beware that in Mysql, the utf8
character set is only a subset of the real UTF8 character set. In order to save one byte of storage, the Mysql team decided to store only three bytes of a UTF8 characters instead of the full four-bytes. That means that some east asian language and emoji aren't fully supported. To make sure you can store all UTF8 characters, use the utf8mb4
data type, and utf8mb4_bin
or utf8mb4_general_ci
in Mysql.
请注意,在 Mysql 中,utf8
字符集只是真正的 UTF8 字符集的一个子集。为了节省一个字节的存储空间,Mysql 团队决定只存储一个 UTF8 字符的三个字节,而不是完整的四个字节。这意味着不完全支持某些东亚语言和表情符号。为确保您可以存储所有 UTF8 字符,请使用Mysql 中的utf8mb4
数据类型和utf8mb4_bin
或utf8mb4_general_ci
。
回答by Jacob Hundley
Adding to what David Whittaker posted, I have created a query that generates the complete table and columns alter statement that will convert each table. It may be a good idea to run
除了 David Whittaker 发布的内容之外,我还创建了一个查询,该查询生成完整的表和列的 alter 语句,该语句将转换每个表。运行可能是个好主意
SET SESSION group_concat_max_len = 100000;
设置会话 group_concat_max_len = 100000;
first to make sure your group concat doesn't go over the very small limit as seen here.
首先要确保您的组连接不会超过此处所见的非常小的限制。
SELECT a.table_name, concat('ALTER TABLE ', a.table_schema, '.', a.table_name, ' DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci, ',
group_concat(distinct(concat(' MODIFY ', column_name, ' ', column_type, ' CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci ', if (is_nullable = 'NO', ' NOT', ''), ' NULL ',
if (COLUMN_DEFAULT is not null, CONCAT(' DEFAULT \'', COLUMN_DEFAULT, '\''), ''), if (EXTRA != '', CONCAT(' ', EXTRA), '')))), ';') as alter_statement
FROM information_schema.columns a
INNER JOIN INFORMATION_SCHEMA.TABLES b ON a.TABLE_CATALOG = b.TABLE_CATALOG
AND a.TABLE_SCHEMA = b.TABLE_SCHEMA
AND a.TABLE_NAME = b.TABLE_NAME
AND b.table_type != 'view'
WHERE a.table_schema = ? and (collation_name = 'latin1_swedish_ci' or collation_name = 'utf8mb4_general_ci')
GROUP BY table_name;
A difference here between the previous answer is it was using utf8 instead of ut8mb4 and using t1.data_type with t1.CHARACTER_MAXIMUM_LENGTH didn't work for enums. Also, my query excludes views since those will have to altered separately.
上一个答案之间的区别在于它使用 utf8 而不是 ut8mb4 并且使用 t1.data_type 和 t1.CHARACTER_MAXIMUM_LENGTH 不适用于枚举。此外,我的查询不包括视图,因为这些视图必须单独更改。
I simply used a Perl script to return all these alters as an array and iterated over them, fixed the columns that were too long (generally they were varchar(256) when the data generally only had 20 characters in them so that was an easy fix).
我只是使用 Perl 脚本将所有这些更改作为数组返回并对其进行迭代,修复太长的列(通常它们是 varchar(256) 当数据中通常只有 20 个字符时,这是一个简单的修复)。
I found some data was corrupted when altering from latin1 -> utf8mb4. It appeared to be utf8 encoded latin1 characters in columns would get goofed in the conversion. I simply held data from the columns I knew was going to be an issue in memory from before and after the alter and compared them and generated update statements to fix the data.
我发现从 latin1 -> utf8mb4 更改时,某些数据已损坏。列中的 utf8 编码 latin1 字符似乎会在转换中出错。我只是从我知道在更改前后将成为内存问题的列中保存数据,然后比较它们并生成更新语句来修复数据。
回答by MJB
heredescribes the process well. However, some of the characters that didn't fit in latin space are gone forever. UTF-8 is a SUPERSET of latin1. Not the reverse. Most will fit in single byte space, but any undefined ones will not (check a list of latin1 - not all 256 characters are defined, depending on mysql's latin1 definition)
这里很好地描述了这个过程。然而,一些不适合拉丁空间的字符已经一去不复返了。UTF-8 是 latin1 的 SUPERSET。不是反过来。大多数将适合单字节空间,但任何未定义的将不适合(检查 latin1 列表 - 并非所有 256 个字符都已定义,具体取决于 mysql 的 latin1 定义)