对 mysql 中的“非法混合排序规则”错误进行故障排除

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3029321/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:19:06  来源:igfitidea点击:

Troubleshooting "Illegal mix of collations" error in mysql

mysqlcollation

提问by user355562

Am getting the below error when trying to do a select through a stored procedure in MySQL.

尝试通过 MySQL 中的存储过程进行选择时出现以下错误。

Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '='

操作 '=' 的排序规则 (latin1_general_cs,IMPLICIT) 和 (latin1_general_ci,IMPLICIT) 的非法混合

Any idea on what might be going wrong here?

关于这里可能出现什么问题的任何想法?

The collation of the table is latin1_general_ciand that of the column in the where clause is latin1_general_cs.

表的排序规则是latin1_general_ci,where 子句中的列的排序规则是latin1_general_cs

回答by defines

This is generally caused by comparing two strings of incompatible collation or by attempting to select data of different collation into a combined column.

这通常是由于比较两个不兼容排序规则的字符串或尝试将不同排序规则的数据选择到组合列中而引起的。

The clause COLLATEallows you to specify the collation used in the query.

该子句COLLATE允许您指定查询中使用的排序规则。

For example, the following WHEREclause will always give the error you posted:

例如,以下WHERE子句将始终给出您发布的错误:

WHERE 'A' COLLATE latin1_general_ci = 'A' COLLATE latin1_general_cs

Your solution is to specify a shared collation for the two columns within the query. Here is an example that uses the COLLATEclause:

您的解决方案是为查询中的两列指定共享排序规则。这是一个使用该COLLATE子句的示例:

SELECT * FROM table ORDER BY key COLLATE latin1_general_ci;

Another option is to use the BINARYoperator:

另一种选择是使用BINARY运算符:

BINARY str is the shorthand for CAST(str AS BINARY).

BINARY str 是 CAST(str AS BINARY) 的简写。

Your solution might look something like this:

您的解决方案可能如下所示:

SELECT * FROM table WHERE BINARY a = BINARY b;

or,

或者,

SELECT * FROM table ORDER BY BINARY a;

回答by eggyal

TL;DR

TL; 博士

Either change the collation of one (or both) of the strings so that they match, or else add a COLLATEclause to your expression.

要么更改一个(或两个)字符串的排序规则,使它们匹配,要么COLLATE在表达式中添加一个子句。



  1. What is this "collation" stuff anyway?

    As documented under Character Sets and Collations in General:

    A character setis a set of symbols and encodings. A collationis a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.

    Suppose that we have an alphabet with four letters: “A”, “B”, “a”, “b”. We give each letter a number: “A” = 0, “B” = 1, “a” = 2, “b” = 3. The letter “A” is a symbol, the number 0 is the encodingfor “A”, and the combination of all four letters and their encodings is a character set.

    Suppose that we want to compare two string values, “A” and “B”. The simplest way to do this is to look at the encodings: 0 for “A” and 1 for “B”. Because 0 is less than 1, we say “A” is less than “B”. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binarycollation.

    But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters “a” and “b” as equivalent to “A” and “B”; (2) then compare the encodings. We call this a case-insensitivecollation. It is a little more complex than a binary collation.

    In real life, most character sets have many characters: not just “A” and “B” but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German “?”), and for multiple-character mappings (such as the rule that “?” = “OE” in one of the two German collations).

    Further examples are given under Examples of the Effect of Collation.

  2. Okay, but how does MySQL decide which collation to use for a given expression?

    As documented under Collation of Expressions:

    In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column charset_name:

    SELECT x FROM T ORDER BY x;
    SELECT x FROM T WHERE x = x;
    SELECT DISTINCT x FROM T;
    

    However, with multiple operands, there can be ambiguity. For example:

    SELECT x FROM T WHERE x = 'Y';
    

    Should the comparison use the collation of the column x, or of the string literal 'Y'? Both xand 'Y'have collations, so which collation takes precedence?

    Standard SQL resolves such questions using what used to be called “coercibility” rules.

    [ deletia ]

    MySQL uses coercibility values with the following rules to resolve ambiguities:

    • Use the collation with the lowest coercibility value.

    • If both sides have the same coercibility, then:

      • If both sides are Unicode, or both sides are not Unicode, it is an error.

      • If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement does not return an error:

        SELECT CONCAT(utf8_column, latin1_column) FROM t1;
        

        It returns a result that has a character set of utf8and the same collation as utf8_column. Values of latin1_columnare automatically converted to utf8before concatenating.

      • For an operation with operands from the same character set but that mix a _bincollation and a _cior _cscollation, the _bincollation is used. This is similar to how operations that mix nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations rather than data types.

  3. So what is an "illegal mix of collations"?

    An "illegal mix of collations" occurs when an expression compares two strings of different collations but of equal coercibility and the coercibility rules cannot help to resolve the conflict. It is the situation described under the third bullet-point in the above quotation.

    The particular error given in the question, Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '=', tells us that there was an equality comparison between two non-Unicode strings of equal coercibility. It furthermore tells us that the collations were not given explicitly in the statement but rather were implied from the strings' sources (such as column metadata).

  4. That's all very well, but how does one resolve such errors?

    As the manual extracts quoted above suggest, this problem can be resolved in a number of ways, of which two are sensible and to be recommended:

    • Change the collation of one (or both) of the strings so that they match and there is no longer any ambiguity.

      How this can be done depends upon from where the string has come: Literal expressions take the collation specified in the collation_connectionsystem variable; values from tables take the collation specified in their column metadata.

    • Force one string to not be coercible.

      I omitted the following quote from the above:

      MySQL assigns coercibility values as follows:

      • An explicit COLLATEclause has a coercibility of 0. (Not coercible at all.)

      • The concatenation of two strings with different collations has a coercibility of 1.

      • The collation of a column or a stored routine parameter or local variable has a coercibility of 2.

      • A “system constant” (the string returned by functions such as USER()or VERSION()) has a coercibility of 3.

      • The collation of a literal has a coercibility of 4.

      • NULLor an expression that is derived from NULLhas a coercibility of 5.

      Thus simply adding a COLLATEclause to one of the strings used in the comparison will force use of that collation.

    Whilst the others would be terribly bad practice if they were deployed merely to resolve this error:

    • Force one (or both) of the strings to have some other coercibility value so that one takes precedence.

      Use of CONCAT()or CONCAT_WS()would result in a string with a coercibility of 1; and (if in a stored routine) use of parameters/local variables would result in strings with a coercibility of 2.

    • Change the encodings of one (or both) of the strings so that one is Unicode and the other is not.

      This could be done via transcoding with CONVERT(exprUSING transcoding_name); or via changing the underlying character set of the data (e.g. modifying the column, changing character_set_connectionfor literal values, or sending them from the client in a different encoding and changing character_set_client/ adding a character set introducer). Note that changing encoding will lead to other problems if some desired characters cannot be encoded in the new character set.

    • Change the encodings of one (or both) of the strings so that they are both the same and change one string to use the relevant _bincollation.

      Methods for changing encodings and collations have been detailed above. This approach would be of little use if one actually needs to apply more advanced collation rules than are offered by the _bincollation.

  1. 无论如何,这个“整理”的东西是什么?

    字符集和排序规则中所述

    字符集是一组符号和编码。甲归类为在一个字符集的字符进行比较的一组规则。让我们通过一个虚构字符集的例子来明确区分。

    假设我们有一个包含四个字母的字母表:“ A”、“ B”、“ a”、“ b”。我们给每个字母一个数字:“ A” = 0,“ B” = 1,“ a” = 2,“ b” = 3。字母“ A”是一个符号,数字0是“ ”的编码A,所有的组合四个字母及其编码是一个字符集

    假设我们要比较两个字符串值“ A”和“ B”。最简单的方法是查看编码:0 代表“ A”,1 代表“ B”。因为 0 小于 1,所以我们说“ A”小于“ B”。我们刚刚做的是对我们的字符集应用排序规则。排序规则是一组规则(在这种情况下只有一个规则):“比较编码”。我们称这种最简单的归类为二进制归类。

    但是如果我们想说小写字母和大写字母是等价的呢?那么我们至少有两条规则:(1)将小写字母“ a”和“ b”等同于“ A”和“ B”;(2) 然后比较编码。我们称之为不区分大小写的排序规则。它比二进制排序规则稍微复杂一些。

    在现实生活中,大多数字符集都有很多字符:不仅仅是“ A”和“ B”,而是整个字母表,有时是多个字母表或具有数千个字符的东方书写系统,以及许多特殊符号和标点符号。同样在现实生活中,大多数排序规则都有很多规则,不仅是区分字母大小写,还有是否区分重音(“重音”是附加在字符上的标记,如德语“ ?”),以及多字符映射(例如在两个德语排序规则之一中“ ?” = “ OE”的规则)。

    更多的例子在排序效果的例子下给出。

  2. 好的,但是 MySQL 如何决定对给定表达式使用哪种排序规则?

    正如在表达式排序规则下所记录

    在绝大多数语句中,很明显 MySQL 使用什么排序规则来解析比较操作。例如,在以下情况下,应该明确排序规则是 column 的排序规则charset_name

    SELECT x FROM T ORDER BY x;
    SELECT x FROM T WHERE x = x;
    SELECT DISTINCT x FROM T;
    

    但是,对于多个操作数,可能会有歧义。例如:

    SELECT x FROM T WHERE x = 'Y';
    

    比较应该使用列的排序规则x还是字符串文字的排序规则'Y'?双方x'Y'有排序规则,所以其整理的优先级?

    标准 SQL 使用以前称为“强制”规则来解决此类问题。

    [ deletia ]

    MySQL 使用 coercibility 值和以下规则来解决歧义:

    • 使用具有最低 coercibility 值的排序规则。

    • 如果双方具有相同的强制力,则:

      • 如果两边都是Unicode,或者两边都不是Unicode,那就是错误。

      • 如果一侧有Unicode字符集,另一侧有非Unicode字符集,则有Unicode字符集的一方获胜,对非Unicode一侧应用自动字符集转换。例如,以下语句不会返回错误:

        SELECT CONCAT(utf8_column, latin1_column) FROM t1;
        

        它返回一个字符集为utf8并且与utf8_column. 的值在连接之前latin1_column自动转换为utf8

      • 对于操作数来自相同字符集但混合_bin排序规则和 a_ci_cs排序规则的操作,使用_bin排序规则。这类似于混合非二进制和二进制字符串的操作如何将操作数评估为二进制字符串,除了它用于排序规则而不是数据类型。

  3. 那么什么是“非法混合排序规则”?

    当一个表达式比较两个不同排序规则但具有相同强制力的字符串并且强制规则无法帮助解决冲突时,就会发生“非法混合排序规则”。这就是上述引文中第三个要点所描述的情况。

    问题中给出的特定错误Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '='告诉我们,在两个具有相等可强制性的非 Unicode 字符串之间存在相等比较。它进一步告诉我们,排序规则没有在语句中明确给出,而是从字符串的来源(例如列元数据)中隐含的。

  4. 这一切都很好,但是如何解决此类错误呢?

    正如上面引用的手册摘录所暗示的,这个问题可以通过多种方式解决,其中两种是明智的,值得推荐:

    • 更改一个(或两个)字符串的排序规则,使它们匹配并且不再有任何歧义。

      如何做到这一点取决于字符串的来源:文字表达式采用collation_connection系统变量中指定的排序规则;表中的值采用其列元数据中指定的排序规则。

    • 强制一根弦不可强制。

      我省略了上面的以下引用:

      MySQL 分配 coercibility 值如下:

      • 显式COLLATE子句的可强制性为 0。(根本不强制。)

      • 具有不同排序规则的两个字符串的串联具有 1 的可强制性。

      • 列或存储的例程参数或局部变量的排序规则的可强制性为 2。

      • “系统常量”(由USER()或等函数返回的字符串VERSION())的可强制性为 3。

      • 文字的排序规则的可强制性为 4。

      • NULL或派生自的表达式NULL的可强制性为 5。

      因此,简单地向COLLATE比较中使用的字符串之一添加一个子句将强制使用该排序规则。

    而其他人如果仅仅为了解决这个错误而部署它们将是非常糟糕的做法:

    • 强制一个(或两个)字符串具有其他一些强制值,以便一个优先。

      使用CONCAT()orCONCAT_WS()会导致字符串的可强制性为 1;并且(如果在存储例程中)使用参数/局部变量会导致字符串的可强制性为 2。

    • 更改一个(或两个)字符串的编码,使一个是 Unicode,另一个不是。

      这可以通过转码来完成;或通过更改数据的基础字符集(例如,修改列、更改文字值,或以不同的编码从客户端发送它们并更改/添加字符集介绍器)。请注意,如果某些所需字符无法在新字符集中编码,则更改编码会导致其他问题。CONVERT(exprUSING transcoding_name)character_set_connectioncharacter_set_client

    • 更改一个(或两个)字符串的编码,使它们都相同,并更改一个字符串以使用相关_bin排序规则。

      上面已经详细介绍了更改编码和排序规则的方法。如果实际上需要应用比整理提供的更高级的整理规则,则这种方法将没有多大用处_bin

回答by Ariel T

Adding my 2c to the discussion for future googlers.

将我的 2c 添加到未来谷歌员工的讨论中。

I was investigating a similar issue where I got the following error when using custom functionsthat recieved a varchar parameter:

我正在调查一个类似的问题,在使用收到 varchar 参数的自定义函数时出现以下错误:

Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and 
(utf8_general_ci,IMPLICIT) for operation '='

Using the following query:

使用以下查询:

mysql> show variables like "collation_database";
    +--------------------+-----------------+
    | Variable_name      | Value           |
    +--------------------+-----------------+
    | collation_database | utf8_general_ci |
    +--------------------+-----------------+

I was able to tell that the DB was using utf8_general_ci, while the tables were defined using utf8_unicode_ci:

我能够知道数据库使用的是utf8_general_ci,而表是使用utf8_unicode_ci定义的:

mysql> show table status;
    +--------------+-----------------+
    | Name         | Collation       |
    +--------------+-----------------+
    | my_view      | NULL            |
    | my_table     | utf8_unicode_ci |
    ...

Notice that the views have NULLcollation. It appears that views and functions have collation definitions even though this query shows null for one view. The collation used is the DB collation that was defined when the view/function were created.

请注意,视图具有NULL归类。即使此查询为一个视图显示空值,视图和函数似乎也具有排序规则定义。使用的排序规则是创建视图/函数时定义的 DB 排序规则。

The sad solution was to both change the db collation and recreate the views/functions to force them to use the current collation.

可悲的解决方案是更改数据库排序规则并重新创建视图/函数以强制它们使用当前排序规则。

  • Changing the db's collation:

    ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;
    
  • Changing the table collation:

    ALTER TABLE mydb CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    
  • 更改数据库的排序规则:

    ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;
    
  • 更改表排序规则:

    ALTER TABLE mydb CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
    

I hope this will help someone.

我希望这会帮助某人。

回答by Justin Vincent

Sometimes it can be dangerous to convert charsets, specially on databases with huge amounts of data. I think the best option is to use the "binary" operator:

有时转换字符集可能很危险,特别是在具有大量数据的数据库上。我认为最好的选择是使用“二元”运算符:

e.g : WHERE binary table1.column1 = binary table2.column1

回答by nkatsar

I had a similar problem, was trying to use the FIND_IN_SET procedure with a string variable.

我有一个类似的问题,试图将 FIND_IN_SET 过程与字符串变量一起使用

SET @my_var = 'string1,string2';
SELECT * from my_table WHERE FIND_IN_SET(column_name,@my_var);

and was receiving the error

并收到错误

Error Code: 1267. Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'

错误代码:1267。 'find_in_set' 操作的排序规则 (utf8_unicode_ci,IMPLICIT) 和 (utf8_general_ci,IMPLICIT) 非法混合

Short answer:

简短的回答:

No need to change any collation_YYYY variables, just add the correct collation next to your variable declaration, i.e.

无需更改任何 collat​​ion_YYYY 变量,只需在变量声明旁边添加正确的排序规则,即

SET @my_var = 'string1,string2' COLLATE utf8_unicode_ci;
SELECT * from my_table WHERE FIND_IN_SET(column_name,@my_var);

Long answer:

长答案:

I first checked the collation variables:

我首先检查了整理变量:

mysql> SHOW VARIABLES LIKE 'collation%';
    +----------------------+-----------------+
    | Variable_name        | Value           |
    +----------------------+-----------------+
    | collation_connection | utf8_general_ci |
    +----------------------+-----------------+
    | collation_database   | utf8_general_ci |
    +----------------------+-----------------+
    | collation_server     | utf8_general_ci |
    +----------------------+-----------------+

Then I checked the table collation:

然后我检查了表格整理:

mysql> SHOW CREATE TABLE my_table;

CREATE TABLE `my_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `column_name` varchar(40) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=125 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

This means that my variable was configured with the default collation of utf8_general_ciwhile my table was configured as utf8_unicode_ci.

这意味着我的变量配置为utf8_general_ci的默认排序规则,而我的表配置为utf8_unicode_ci

By adding the COLLATE command next to the variable declaration, the variable collation matched the collation configured for the table.

通过在变量声明旁边添加 COLLATE 命令,变量排序规则与为表配置的排序规则匹配。

回答by Mirat Can Bayrak

You can try this script, that converts all of your databases and tables to utf8.

你可以试试这个脚本,它将你所有的数据库和表转换为 utf8。

回答by jc508

Solution if literals are involved.

如果涉及文字,解决方案。

I am using Pentaho Data Integration and dont get to specify the sql syntax. Using a very simple DB lookup gave the error "Illegal mix of collations (cp850_general_ci,COERCIBLE) and (latin1_swedish_ci,COERCIBLE) for operation '='"

我正在使用 Pentaho 数据集成,但没有指定 sql 语法。使用非常简单的数据库查找给出了错误“非法混合排序规则 (cp850_general_ci,COERCIBLE) 和 (latin1_swedish_ci,COERCIBLE) for operation '='”

The generated code was "SELECT DATA_DATE AS latest_DATA_DATE FROM hr_cc_normalised_data_date_v WHERE PSEUDO_KEY = ?"

生成的代码是“SELECT DATA_DATE AS latest_DATA_DATE FROM hr_cc_normalised_data_date_v WHERE PSEUDO_KEY = ?”

Cutting the story short the lookup was to a view and when I issued

简而言之,查找是一个视图,当我发布

mysql> show full columns from hr_cc_normalised_data_date_v;
+------------+------------+-------------------+------+-----+
| Field      | Type       | Collation         | Null | Key |
+------------+------------+-------------------+------+-----+
| PSEUDO_KEY | varchar(1) | cp850_general_ci  | NO   |     |
| DATA_DATE  | varchar(8) | latin1_general_cs | YES  |     |
+------------+------------+-------------------+------+-----+

which explains where the 'cp850_general_ci' comes from.

这解释了“cp850_general_ci”的来源。

The view was simply created with 'SELECT 'X',......' According to the manual literals like this should inherit their character set and collation from server settings which were correctly defined as 'latin1' and 'latin1_general_cs' as this clearly did not happen I forced it in the creation of the view

该视图只是使用 'SELECT 'X',......' 创建的,根据这样的手动文字应该从服务器设置继承他们的字符集和排序规则,这些设置正确定义为 'latin1' 和 'latin1_general_cs' 作为这个显然没有发生我在创建视图时强迫它

CREATE OR REPLACE VIEW hr_cc_normalised_data_date_v AS
SELECT convert('X' using latin1) COLLATE latin1_general_cs        AS PSEUDO_KEY
    ,  DATA_DATE
FROM HR_COSTCENTRE_NORMALISED_mV
LIMIT 1;

now it shows latin1_general_cs for both columns and the error has gone away. :)

现在它显示两列的 latin1_general_cs 并且错误消失了。:)

回答by Rick James

If the columns that you are having trouble with are "hashes", then consider the following...

如果您遇到问题的列是“散列”,请考虑以下事项...

If the "hash" is a binary string, you should really use BINARY(...)datatype.

如果“哈希”是二进制字符串,则您应该真正使用BINARY(...)数据类型。

If the "hash" is a hex string, you do not need utf8, and should avoid such because of character checks, etc. For example, MySQL's MD5(...)yields a fixed-length 32-byte hex string. SHA1(...)gives a 40-byte hex string. This could be stored into CHAR(32) CHARACTER SET ascii(or 40 for sha1).

如果“hash”是十六进制字符串,则不需要 utf8,并且由于字符检查等应避免这种情况。例如,MySQLMD5(...)生成固定长度的 32 字节十六进制字符串。 SHA1(...)给出一个 40 字节的十六进制字符串。这可以存储到CHAR(32) CHARACTER SET ascii(或 40 为 sha1)。

Or, better yet, store UNHEX(MD5(...))into BINARY(16). This cuts in half the size of the column. (It does, however, make it rather unprintable.) SELECT HEX(hash) ...if you want it readable.

或者,更好的是,存储UNHEX(MD5(...))BINARY(16). 这将列的大小减少了一半。(但是,它确实使其无法打印。) SELECT HEX(hash) ...如果您希望它具有可读性。

Comparing two BINARYcolumns has no collation issues.

比较两BINARY列没有整理问题。

回答by Nya Nguyen

Very interesting... Now, be ready. I looked at all of the "add collate" solutions and to me, those are band aid fixes. The reality is the database design was "bad". Yes, standard changes and new things gets added, blah blah, but it does not change the bad database design fact. I refuse to go with the route of adding "collate" all over the SQL statements just to get my query to work. The only solution that works for me and will virtually eliminate the need to tweak my code in the future is to re-design the database/tables to match the character set that I will live with and embrace for the long term future. In this case, I choose to go with the character set "utf8mb4".

非常有趣... 现在,准备好。我查看了所有“添加整理”解决方案,对我来说,这些都是创可贴修复。现实是数据库设计是“糟糕的”。是的,标准更改和新事物被添加,等等,但它并没有改变糟糕的数据库设计事实。我拒绝采用在 SQL 语句中添加“整理”的方法,只是为了让我的查询工作。唯一对我有用并且将来几乎不需要调整我的代码的解决方案是重新设计数据库/表以匹配我将长期使用并接受的字符集。在这种情况下,我选择使用字符集“ utf8mb4”。

So the solution here when you encounter that "illegal" error message is to re-design your database and tables. It is much easier and quicker then it sounds. Exporting your data and re-importing it from a CSV may not even be required. Change the character set of the database and make sure all the character set of your tables matches.

因此,当您遇到“非法”错误消息时,这里的解决方案是重新设计您的数据库和表。它比听起来要容易和快捷得多。甚至可能不需要导出您的数据并从 CSV 重新导入它。更改数据库的字符集并确保表的所有字符集都匹配。

Use these commands to guide you:

使用这些命令来指导您:

SHOW VARIABLES LIKE "collation_database";
SHOW TABLE STATUS;

Now, if you enjoy adding "collate" here and there and beef up your code with forces fulls "overrides", be my guess.

现在,如果您喜欢在这里和那里添加“整理”并通过强制完整“覆盖”来增强您的代码,那我猜。

回答by Alex Martelli

MySQL really dislikes mixing collations unless it can coerce them to the same one (which clearly is not feasible in your case). Can't you just force the same collation to be used via a COLLATE clause? (or the simpler BINARYshortcut if applicable...).

MySQL 真的不喜欢混合排序规则,除非它可以将它们强制使用相同的排序规则(这在您的情况下显然不可行)。您不能通过COLLATE 子句强制使用相同的排序规则吗?(或更简单的BINARY快捷方式,如果适用...)。