MySQL：SELECT INTO 使用的字符编码？

Question

提问by David Wolever

I'm trying to export some data from a MySQL database, but weird and wonderful things are happening to unicode in that table.

我正在尝试从 MySQL 数据库中导出一些数据，但是该表中的 unicode 发生了奇怪而奇妙的事情。

I will focus on one character, the left smartquote: “

我将专注于一个字符，左边的 smartquote：“

When I use SELECTfrom the console, it is printed without issue:

当我SELECT从控制台使用时，它的打印没有问题：

mysql> SELECT text FROM posts;
+-------+
| text  |
+-------+
| “foo” |
+-------+

This means the data are being sent to my terminal as utf-8[0] (which is correct).

这意味着数据将作为 utf-8[0]（这是正确的）发送到我的终端。

However, when I use SELECT * FROM posts INTO OUTFILE '/tmp/x.csv' …;, the output file is notcorrectly encoded:

但是，当我使用时SELECT * FROM posts INTO OUTFILE '/tmp/x.csv' …;，输出文件未正确编码：

$ cat /tmp/x.csv
a?fooa

Specifically, the “is encoded with seven (7!) bytes: \xc3\xa2\xe2\x82\xac\xc5\x93.

具体来说，“用七（7！）个字节编码：\xc3\xa2\xe2\x82\xac\xc5\x93。

What encoding is this? Or how could I tell MySQL to use a less unreasonable encoding?

这是什么编码？或者我怎么能告诉 MySQL 使用不那么不合理的编码？

Also, some miscellaneous facts:

此外，一些杂项事实：

SELECT @@character_set_databasereturns latin1

The textcolumn is a VARCHAR(42):

mysql> DESCRIBE posts;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| text  | varchar(42) | NO   | MUL |         |       |
+-------+-------------+------+-----+---------+-------+

“encoded as utf-8 yields \xe2\x80\x9c
\xe2\x80\x9cdecoded as latin1then re-encoded as utf-8yields \xc3\xa2\xc2\x80\xc2\x9c(6 bytes).
Another data point: …(utf-8: \xe2\x80\xa6) is encoded to \xc3\xa2\xe2\x82\xac\xc2\xa6

SELECT @@character_set_database返回 latin1

该text列是一个VARCHAR(42)：

mysql> DESCRIBE posts;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| text  | varchar(42) | NO   | MUL |         |       |
+-------+-------------+------+-----+---------+-------+

“编码为 utf-8 产生 \xe2\x80\x9c
\xe2\x80\x9c解码为latin1然后重新编码为utf-8收益\xc3\xa2\xc2\x80\xc2\x9c（6 个字节）。
另一个数据点：…(utf-8:)\xe2\x80\xa6被编码为\xc3\xa2\xe2\x82\xac\xc2\xa6

[0]: as smart quotes aren't included in any 8-bit encoding, and my terminal correctly renders utf-8 characters.

[0]：因为智能引号不包含在任何 8 位编码中，并且我的终端正确呈现 utf-8 字符。

Answer 1

采纳答案by taavi

Many programs/standards (including MySQL) assume that "latin1" means "cp1252", so the 0x80 byte is interpreted as a Euro symbol, which is where that \xe2\x82\xacbit (U+20AC) comes from in the middle.

许多程序/标准（包括 MySQL）假定“latin1”表示“cp1252”，因此 0x80 字节被解释为欧元符号，这就是该\xe2\x82\xac位 (U+20AC) 来自中间的地方。

When I try this, it works properly (but note how I put data in, and the variables set on the db server):

当我尝试这个时，它工作正常（但请注意我如何放入数据以及在数据库服务器上设置的变量）：

mysql> set names utf8; -- http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
mysql> create table sq (c varchar(10)) character set utf8;
mysql> show create table sq\G
*************************** 1. row ***************************
       Table: sq
Create Table: CREATE TABLE `sq` (
  `c` varchar(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.19 sec)

mysql> insert into sq values (unhex('E2809C'));
Query OK, 1 row affected (0.00 sec)

mysql> select hex(c), c from sq;
+--------+------+
| hex(c) | c    |
+--------+------+
| E2809C | “  |
+--------+------+
1 row in set (0.00 sec)

mysql> select * from sq into outfile '/tmp/x.csv';
Query OK, 1 row affected (0.02 sec)

mysql> show variables like "%char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       | 
| character_set_connection | utf8                       | 
| character_set_database   | utf8                       | 
| character_set_filesystem | binary                     | 
| character_set_results    | utf8                       | 
| character_set_server     | latin1                     | 
| character_set_system     | utf8                       | 
| character_sets_dir       | /usr/share/mysql/charsets/ | 
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

And from the shell:

从外壳：

/tmp$ hexdump -C x.csv
00000000  e2 80 9c 0a                                       |....|
00000004

Hopefully there's a useful tidbit in there…

希望那里有一个有用的花絮......

Answer 2

回答by mvd

Newer versions of MySQL have an option to set the character set in the outfile clause:

较新版本的 MySQL 可以选择在 outfile 子句中设置字符集：

SELECT col1,col2,col3 
FROM table1 
INTO OUTFILE '/tmp/out.txt' 
CHARACTER SET utf8
FIELDS TERMINATED BY ','

Answer 3

回答by dhruvbird

I've found that this works well.

我发现这很有效。

SELECT convert(col_name USING latin1) FROM posts INTO OUTFILE '/tmp/x.csv' …;

Answer 4

回答by paczor

As you can see my MySQL database use latin1and system is utf-8.

如您所见，我的 MySQL 数据库使用latin1和系统是utf-8.

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_client     | latin1 |
| character_set_connection | latin1 |
| character_set_database   | latin1 |
| character_set_filesystem | binary |
| character_set_results    | latin1 |
| character_set_server     | latin1 |
| character_set_system     | utf8   |
+--------------------------+--------+
7 rows in set (0.00 sec)

Every time I tried to export table I got strange encoded CSV file. So, I put:

每次我尝试导出表时，我都会得到奇怪的编码 CSV 文件。所以，我把：

mysql_query("SET NAMES CP1252");
header('Content-Type: text/csv; charset=cp1252');
header('Content-Disposition: attachment;filename=output.csv');

as in my export script.

就像在我的导出脚本中一样。

Then I have pure UTF-8 output.

然后我有纯 UTF-8 输出。

Answer 5

回答by deceze

To specifically address your question "What is this?", you have answered it yourself:

为了具体解决您的问题“这是什么？”，您已经自己回答了：

I suspect this is because “Column values are dumped using the binary character set. In effect, there is no character set conversion.” - dev.mysql.com/doc/refman/5.0/en/select-into.html

我怀疑这是因为“列值是使用二进制字符集转储的。实际上，没有字符集转换。” - dev.mysql.com/doc/refman/5.0/en/select-into.html

That is the way MySQL stores utf8encoded data internally. It's a terribly inefficient variation of Unicode storage, apparently using a full three bytes for most characters, and not supporting four byte UTF-8 sequences.

这就是 MySQL 在utf8内部存储编码数据的方式。这是 Unicode 存储的一种非常低效的变体，显然大多数字符使用了完整的三个字节，并且不支持四字节的 UTF-8 序列。

As for how to convert it to real UTF-8 using INTO OUTFILE... I don't know. Using other mysqldumpmethods will do it though.

至于如何将其转换为真正的 UTF-8 使用INTO OUTFILE......我不知道。使用其他mysqldump方法会做到这一点。

Answer 6

回答by singpolyma

You can execute MySQL queries using the CLI tool (I believe even with an output format so it prints out CSV) and redirect to a file. Should do charset conversion and still give you access to do joins, etc.

您可以使用 CLI 工具执行 MySQL 查询（我相信即使使用输出格式，它也会打印出 CSV）并重定向到文件。应该进行字符集转换，并且仍然可以让您访问连接等。

Answer 7

回答by Hardeep

Try SET CHARACTER SET <blah>before your select, <blah>=utf8or latin1etc... See: http://dev.mysql.com/doc/refman/5.6/en/charset-connection.html

SET CHARACTER SET <blah>在您选择之前尝试，<blah>=utf8或latin1等等...请参阅：http: //dev.mysql.com/doc/refman/5.6/en/charset-connection.html

Or SET NAMES utf8;might work...

或者SET NAMES utf8;可能工作...

Answer 8

回答by Burhan Khalid

You need to issue charset utf8at the MySQL prompt before running the SELECT. This tells the server what to output the results as.

你需要的问题charset utf8在MySQL的运行前提示SELECT。这告诉服务器将结果输出为什么。

MySQL：SELECT INTO 使用的字符编码？

提问by David Wolever

采纳答案by taavi

回答by mvd

回答by dhruvbird

回答by paczor

回答by deceze

回答by singpolyma

回答by Hardeep

回答by Burhan Khalid

相关推荐

最近更新

标签

MySQL：SELECT INTO 使用的字符编码？

提问by David Wolever

采纳答案by taavi

回答by mvd

回答by dhruvbird

回答by paczor

回答by deceze

回答by singpolyma

回答by Hardeep

回答by Burhan Khalid

相关推荐

MYSQL - 无法连接到“本地主机”上的 MYSQL 服务器 (10061)

MySQL 在 SQL 查询中加入 4 个表

如何从 MySQL 5.1 发送电子邮件

MySQL 如何使用 INTO OUTFILE 附加到文件？

相关推荐

最近更新

标签