MySQL utf8mb4,保存表情符号时出错

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35125933/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 21:35:00  来源:igfitidea点击:

MySQL utf8mb4, Errors when saving Emojis

mysqlemojiutf8mb4

提问by Loki

I try to save names from users from a service in my MySQL database. Those names can contain emojis like (just for examples)

我尝试从我的 MySQL 数据库中的服务中保存用户的姓名。这些名称可以包含表情符号(仅作为示例)

After searching a little bit I found this stackoverflowlinking to this tutorial. I followed the steps and it looks like everything is configured properly.

稍微搜索后,我发现这个stackoverflow链接到本教程。我按照步骤操作,看起来一切都配置正确。

I have a Database (charset and collation set to utf8mb4 (_unicode_ci)), a Table called TestTable, also configured this way, as well as a "Text" column, configured this way (VARCHAR(191) utf8mb4_unicode_ci).

我有一个数据库(字符集和排序规则设置为 utf8mb4 (_unicode_ci)),一个名为 TestTable 的表,也以这种方式配置,以及一个“Text”列,以这种方式配置(VARCHAR(191) utf8mb4_unicode_ci)。

When I try to save emojis I get an error:

当我尝试保存表情符号时,出现错误:

Example of error for shortcake ():
    Warning: #1300 Invalid utf8 character string: 'F09F8D'
    Warning: #1366 Incorrect string value: '\xF0\x9F\x8D\xB0' for column 'Text' at row 1

The only Emoji that I was able to save properly was the sun ??

我唯一能够正确保存的表情符号是太阳??

Though I didn't try all of them to be honest.

虽然老实说我没有尝试所有这些。

Is there something I'm missing in the configuration?

我在配置中缺少什么吗?

Please note:All tests of saving didn't involve a client side. I use phpmyadmin to manually change the values and save the data. So the proper configuration of the client side is something that I will take care of afterthe server properly saves emojis.

请注意:所有保存测试均不涉及客户端。我使用 phpmyadmin 手动更改值并保存数据。所以客户端的正确配置是在服务器正确保存表情符号我会处理的。

Another Sidenote: Currently, when saving emojis I either get the error like above, or get no error and the data of Username will be stored as Username ????. Error or no error depends on the way I save. When creating/saving via SQL Statement I save with question marks, when editing inline I save with question marks, when editing using the edit button I get the error.

另一个旁注:目前,在保存表情符号时,我要么得到如上的错误,要么没有得到错误并且数据Username 将存储为Username ????. 错误或没有错误取决于我的保存方式。通过 SQL 语句创建/保存时,我用问号保存,内联编辑时,我用问号保存,使用编辑按钮编辑时,出现错误。

thank you

谢谢你

EDIT 1:Alright so I think I found out the problem, but not the solution. It looks like the Database specific variables didn't change properly.

编辑 1:好的,所以我想我发现了问题,但没有找到解决方案。看起来数据库特定变量没有正确更改。

When I'm logged in as root on my server and read out the variables (global):
Query used: SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

当我在我的服务器上以 root 身份登录并读出变量(全局)时:
使用的查询:SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)

For my Database (in phpmyadmin, the same query) it looks like the following:

对于我的数据库(在 phpmyadmin 中,相同的查询),它如下所示:

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8               |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8               |
| character_set_server     | utf8               |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

How can I adjust these settings on the specific database? Also even though I have the first shown settings as default, when creating a new database I get the second one as settings.

如何在特定数据库上调整这些设置?此外,即使我将第一个显示的设置作为默认设置,在创建新数据库时,我也会将第二个设置作为设置。

Edit 2:

编辑2:

Here is my my.cnffile:

这是我的my.cnf文件:

[client]
port=3306
socket=/var/run/mysqld/mysqld.sock
default-character-set = utf8mb4

[mysql]
default-character-set = utf8mb4

[mysqld_safe]
socket=/var/run/mysqld/mysqld.sock

[mysqld]
user=mysql
pid-file=/var/run/mysqld/mysqld.pid
socket=/var/run/mysqld/mysqld.sock
port=3306
basedir=/usr
datadir=/var/lib/mysql
tmpdir=/tmp
lc-messages-dir=/usr/share/mysql
log_error=/var/log/mysql/error.log
max_connections=200
max_user_connections=30
wait_timeout=30
interactive_timeout=50
long_query_time=5
innodb_file_per_table
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci

!includedir /etc/mysql/conf.d/

回答by Rick James

character_set_client, _connection, and _resultsmust all be utf8mb4for that shortcake to be eatable.

character_set_client, _connection, 和_results必须都是utf8mb4为了让那个脆饼可以吃。

Something, somewhere, is setting a subset of those individually. Rummage through my.cnf and phpmyadmin's settings -- something is not setting all three.

某处的某些东西正在单独设置这些子集。翻遍 my.cnf 和 phpmyadmin 的设置——有些东西没有设置所有三个。

If SET NAMES utf8mb4is executed, all three set correctly.

如果SET NAMES utf8mb4执行,所有三个设置正确。

The sun shone because it is only 3-bytes - E2 98 80; utf8 is sufficient for 3-byte utf8 encodings of Unicode characters.

阳光明媚,因为它只有 3 个字节 - E2 98 80;utf8 对于 Unicode 字符的 3 字节 utf8 编码就足够了。

回答by Pierce

It is likely that your service/application is connecting with "utf8" instead of "utf8mb4" for the client character set. That's up to the client application.

对于客户端字符集,您的服务/应用程序很可能使用“utf8”而不是“utf8mb4”进行连接。这取决于客户端应用程序。

For a PHP application see http://php.net/manual/en/function.mysql-set-charset.phpor http://php.net/manual/en/mysqli.set-charset.php

对于 PHP 应用程序,请参阅http://php.net/manual/en/function.mysql-set-charset.phphttp://php.net/manual/en/mysqli.set-charset.php

For a Python application see https://github.com/PyMySQL/PyMySQL#exampleor http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#mysql-unicode

对于 Python 应用程序,请参阅https://github.com/PyMySQL/PyMySQL#examplehttp://docs.sqlalchemy.org/en/latest/dialects/mysql.html#mysql-unicode

Also, check that your columns really are utf8mb4. One direct way is like this:

另外,请检查您的列是否真的是 utf8mb4。一种直接的方法是这样的:

mysql> SELECT character_set_name FROM information_schema.`COLUMNS`  WHERE table_name = "user"   AND column_name = "displayname";
+--------------------+
| character_set_name |
+--------------------+
| utf8mb4            |
+--------------------+
1 row in set (0.00 sec)

回答by user3624198

For me, it turned out that the problem lied in mysql client.

对我来说,原来问题出在 mysql 客户端。

mysql client updates my.cnf's char setting on a server, and resulted in unintended character setting.

mysql 客户端更新my.cnf了服务器上的字符设置,并导致了意外的字符设置。

So, What I needed to do is just to add character-set-client-handshake = FALSE. It disables client setting from disturbing my char setting.

所以,我需要做的就是添加character-set-client-handshake = FALSE. 它禁止客户端设置干扰我的字符设置。

my.cnfwould be like this.

my.cnf会是这样。

[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
...

Hope it helps.

希望能帮助到你。

回答by Saurabh Mistry

ALTER TABLE table_nameCHANGE column_namecolumn_nameVARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;

ALTER TABLE table_nameCHANGE column_namecolumn_nameVARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;

example query :

示例查询:

ALTER TABLE `reactions` CHANGE `emoji` `emoji` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;

enter image description here

在此处输入图片说明

after that , successful able to store emoji in table :

之后,成功能够将表情符号存储在表中:

enter image description here

在此处输入图片说明

回答by druid62

Consider adding

考虑添加

init_connect = 'SET NAMES utf8mb4'

to all of your your db-servers' my.cnf-s.

到您所有的数据库服务器的 my.cnf-s。

(still, clients can (so will) overrule it)

(仍然,客户可以(因此将)否决它)

回答by Nicolas Giszpenc

I'm not proud of this answer, because it uses brute-force to clean the input. It's brutal, but it works

我对这个答案并不感到自豪,因为它使用蛮力来清理输入。这是残酷的,但它有效

function cleanWord($string, $debug = false) {
    $new_string = "";

    for ($i=0;$i<strlen($string);$i++) {
        $letter = substr($string, $i, 1);
        if ($debug) {
            echo "Letter: " . $letter . "<BR>";
            echo "Code: " . ord($letter) . "<BR><BR>";
        }
        $blnSkip = false;
        if (ord($letter)=="146") {
            $letter = "&acute;";
            $blnSkip = true;
        }
        if (ord($letter)=="233") {
            $letter = "&eacute;";
            $blnSkip = true;
        }
        if (ord($letter)=="147" || ord($letter)=="148") {
            $letter = "&quot;";
            $blnSkip = true;
        }
        if (ord($letter)=="151") {
            $letter = "&#8211;";
            $blnSkip = true;
        }
        if ($blnSkip) {
            $new_string .= $letter;
            break;
        }

        if (ord($letter) > 127) {
            $letter = "&#0" . ord($letter) . ";";
        }

        $new_string .= $letter;
    }
    if ($new_string!="") {
        $string = $new_string;
    }
    //optional
    $string = str_replace("\r\n", "<BR>", $string);

    return $string;
}

//clean up the input
$message = cleanWord($message);

//now you can insert it as part of SQL statement
$sql = "INSERT INTO tbl_message (`message`)
VALUES ('" . addslashes($message) . "')";