MySQL 如何修复“不正确的字符串值”错误?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1168036/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to fix "Incorrect string value" errors?
提问by Brian
After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8
column charset and the default column collate (utf8_general_ci
) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
在注意到应用程序由于不正确的字符串值错误而倾向于丢弃随机电子邮件后,我继续并切换了许多文本列以使用utf8
列字符集和默认列整理 ( utf8_general_ci
) 以便它接受它们。这修复了大部分错误,并使应用程序在遇到非拉丁电子邮件时也停止接收 sql 错误。
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
尽管如此,一些电子邮件仍然导致程序遇到不正确的字符串值错误: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT
datatybe which uses the utf8
column charset and the utf8_general_ci
column collate. There are no flags that I can toggle in this column.
内容列是一个MEDIUMTEXT
使用utf8
列字符集和utf8_general_ci
列整理的数据类型。在此列中没有我可以切换的标志。
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
请记住,除非绝对必要,否则我不想接触甚至查看应用程序源代码:
- What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
- How can I fix it?
- What are the likely effects of such a fix?
- 是什么导致了这个错误?(是的,我知道电子邮件中充满了随机垃圾,但我认为 utf8 会非常宽松)
- 我该如何解决?
- 这种修复可能会产生什么影响?
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.
我考虑的一件事是在打开二进制标志的情况下切换到 utf8 varchar([some large number]),但我对 MySQL 相当不熟悉,并且不知道这样的修复是否有意义。
采纳答案by RichieHindle
"\xE4\xC5\xCC\xC9\xD3\xD8"
isn't valid UTF-8. Tested using Python:
"\xE4\xC5\xCC\xC9\xD3\xD8"
不是有效的 UTF-8。使用 Python 测试:
>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data
If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.
如果您正在寻找一种方法来避免数据库中的解码错误,cp1252 编码(又名“Windows-1252”又名“Windows 西欧”)是最宽松的编码 - 每个字节值都是一个有效的代码点。
Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?
当然,它不会再理解真正的 UTF-8,也不会再理解任何其他非 cp1252 编码,但听起来您不太关心这一点?
回答by nico gawenda
I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.
我不建议 Richies 回答,因为你搞砸了数据库中的数据。您不会解决您的问题,而是尝试“隐藏”它,并且无法使用乱七八糟的数据执行基本的数据库操作。
If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) reallyis UTF-8.
如果您遇到此错误,要么您发送的数据不是 UTF-8 编码的,要么您的连接不是 UTF-8。首先,验证数据源(文件,...)确实是 UTF-8。
Then, check your database connection, you should do this after connecting:
然后,检查您的数据库连接,您应该在连接后执行此操作:
SET NAMES 'utf8';
SET CHARACTER SET utf8;
Next, verify that the tables where the data is stored have the utf8 character set:
接下来,验证存储数据的表是否具有 utf8 字符集:
SELECT
`tables`.`TABLE_NAME`,
`collations`.`character_set_name`
FROM
`information_schema`.`TABLES` AS `tables`,
`information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
`tables`.`table_schema` = DATABASE()
AND `collations`.`collation_name` = `tables`.`table_collation`
;
Last, check your database settings:
最后,检查您的数据库设置:
mysql> show variables like '%colla%';
mysql> show variables like '%charac%';
If source, transport and destination are UTF-8, your problem is gone;)
如果源、传输和目的地是 UTF-8,你的问题就没有了;)
回答by moeffju
MySQL's utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).
MySQL 的 utf-8 类型实际上并不是正确的 utf-8——它每个字符最多只使用三个字节,并且只支持基本多语言平面(即没有表情符号、没有星体平面等)。
If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.
如果您需要存储来自更高 Unicode 平面的值,则需要utf8mb4 编码。
回答by Jiayu Wang
The table and fields have the wrong encoding; however, you can convert them to UTF-8.
表和字段编码错误;但是,您可以将它们转换为 UTF-8。
ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;
回答by frankshaka
I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.
我今天通过将列更改为“LONGBLOB”类型来解决这个问题,该类型存储原始字节而不是 UTF-8 字符。
The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.
这样做的唯一缺点是您必须自己处理编码。如果您的应用程序的一个客户端使用 UTF-8 编码而另一个客户端使用 CP1252,则您的电子邮件可能会使用不正确的字符发送。为避免这种情况,请始终在所有应用程序中使用相同的编码(例如 UTF-8)。
Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.htmlfor more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.
有关 TEXT/LONGTEXT 和 BLOB/LONGBLOB 之间差异的更多详细信息,请参阅此页面http://dev.mysql.com/doc/refman/5.0/en/blob.html。网络上还有许多其他论点讨论这两者。
回答by Babacar Gningue
First check if your default_character_set_name is utf8.
首先检查您的 default_character_set_name 是否为 utf8。
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";
If the result is not utf8 you must convert your database. At first you must save a dump.
如果结果不是 utf8,则必须转换数据库。首先,您必须保存转储。
To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:
要将指定数据库中所有表的字符集编码更改为 UTF-8,请在命令行中键入以下命令。将 DBNAME 替换为数据库名称:
mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", , "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:
要将数据库本身的字符集编码更改为 UTF-8,请在mysql> 提示符下键入以下命令。将 DBNAME 替换为数据库名称:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.
您现在可以重试将 utf8 字符写入数据库。当我尝试将 200000 行 csv 文件上传到我的数据库时,此解决方案对我有所帮助。
回答by Ondra ?i?ka
In general, this happens when you insert strings to columns with incompatible encoding/collation.
通常,当您将字符串插入编码/排序规则不兼容的列时,就会发生这种情况。
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf
:
当我有 TRIGGERs 时,我遇到了这个错误,它出于某种原因继承了服务器的排序规则。并且 mysql 的默认值是(至少在 Ubuntu 上)带有瑞典语校对规则的 latin-1。即使我将数据库和所有表都设置为 UTF-8,我还没有设置my.cnf
:
/etc/mysql/my.cnf :
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
这必须用 utf8-* 列出所有触发器:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
并且这里列出的一些变量也应该有 utf-8-* (没有 latin-1 或其他编码):
show variables like 'char%';
回答by Sameera Prasad Jayasinghe
Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.
虽然你的collation设置为utf8_general_ci,但我怀疑数据库、表甚至列的字符编码可能不同。
ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
回答by AVKurov
I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1
). I have tried to change character set of column to utf8mb4
and after that the error has changed to 'Data too long for column 'content' at row 1'
.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8
and changed type of the column to MEDIUMTEXT
. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.
我遇到了类似的错误 ( Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1
)。我试图将列的字符集更改为utf8mb4
,之后错误已更改为'Data too long for column 'content' at row 1'
.
原来,mysql 向我显示了错误的错误。我将列的字符集转回,并将列的utf8
类型更改为MEDIUMTEXT
. 之后,错误消失了。
我希望它可以帮助某人。
顺便说一句,MariaDB 在相同的情况下(我在那里测试了相同的 INSERT)只是剪切了一个文本而没有错误。
回答by Mikko Rantalainen
That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.
该错误意味着您的字符串编码不正确(例如,您尝试将 ISO-8859-1 编码字符串输入到 UTF-8 编码列中),或者该列不支持您尝试输入的数据。
In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?for details.
实际上,后一个问题是由 MySQL UTF-8 实现引起的,该实现仅支持以 UTF-8 表示时需要 1-3 个字节的 UNICODE 字符。尝试通过 JDBC 将 UTF-8 插入 MySQL 时,请参阅“不正确的字符串值”?详情。