MySQL 如何在 UTF-8 中工作“不区分大小写”和“不区分重音”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10831076/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:35:22  来源:igfitidea点击:

How to MySQL work "case insensitive" and "accent insensitive" in UTF-8

mysqlutf-8case-insensitivenon-ascii-characters

提问by Lightworker

I have a schema in "utf8 -- UTF-8 Unicode" as charset and a collation of "utf8_spanish_ci".

我在“utf8 -- UTF-8 Unicode”中有一个模式作为字符集和一个“utf8_spanish_ci”的排序规则。

All the inside tables are InnoDB with same charset and collation as mentioned.

所有内部表都是 InnoDB,具有与上述相同的字符集和排序规则。

Here comes the problem:

问题来了:

with a query like

像这样的查询

SELECT *
FROM people p
WHERE p.NAME LIKE '%jose%';

I get 83 result rows. I should have 84 results, because I know it.

我得到 83 个结果行。我应该有 84 个结果,因为我知道。

Changing where for:

更改地点:

WHERE p.NAME LIKE '%JOSE%';

I get the exact same 83 rows. With combinations like JoSe, Jose, JOSe, etc. All the same 83 rows are reported.

我得到完全相同的 83 行。使用 JoSe、Jose、JOSe 等组合。报告所有相同的 83 行。

The problem comes when accents play in game. If do:

当口音在游戏中发挥作用时,问题就出现了。如果这样做:

WHERE p.NAME LIKE '%josé%';

I get no results. 0 rows.

我没有结果。0 行。

But if I do:

但如果我这样做:

WHERE p.NAME LIKE '%JOSé%';

I get just one resulting row, so 1 row. This is the only row wich hace accented "jose" and capitalized.

我只得到一个结果行,所以 1 行。这是唯一带有重音“jose”并大写的行。

I've tried with josé, or JoSé or whatever combination I do, as long as the accented letter stays capitalized or not, as it really is stored in the database and it stills returning the only row. If I suddenly change "é" for "é" in whatever combination I do with the capitalization in JOSE, it returns no rows.

我已经尝试过使用 josé 或 JoSé 或我所做的任何组合,只要重音字母保持大写,因为它确实存储在数据库中并且仍然返回唯一的行。如果我突然将“é”更改为“é”,无论我用 JOSE 中的大写字母做什么组合,它都不会返回任何行。

So conclusions:

所以结论:

  • Case insensitive if no latin characters plays in game.
  • Case sensitive if latin characters appears.
  • Accent sensitive, as if I search JOSE or jose, I only get 83 rows, instead of the 84 rows I need.
  • 如果游戏中没有拉丁字符,则不区分大小写。
  • 如果出现拉丁字符,则区分大小写。
  • 口音敏感,就像我搜索 JOSE 或 jose 一样,我只得到 83 行,而不是我需要的 84 行。

What I want?

我想要的是?

  • To search "jose", "JOSE", "José", "JOSé", "JòSE", "j?se", "JoSè", ... have to return the 84 rows I know that exists. I what to turn my searchs to case insensitive and "latin" insensitive.
  • 要搜索“jose”、“JOSE”、“José”、“JOSé”、“JòSE”、“j?se”、“JoSè”……必须返回我知道存在的 84 行。我如何将我的搜索转为不区分大小写和“拉丁语”不敏感。

Solutions like COLLATIONon LIKEdoesn't work for me, don't know why...

COLLATIONonLIKE这样的解决方案对我不起作用,不知道为什么...

What can I do?

我能做什么?

Thanks in advance!

提前致谢!

EDIT:

编辑:

If I do sometingh like:

如果我做这样的事情:

WHERE p.NAME LIKE '%jose%' COLLATE utf8_general_ci;

I get the error:

我收到错误:

COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

And I've changed all the possible collations on the columns too!

而且我也更改了列上所有可能的排序规则!

And if I do something like:

如果我做这样的事情:

WHERE p.NAME LIKE _utf8 '%jose%' COLLATE utf8_general_ci;

The same 83 rows are reported, as if I've made nothing...

报告了相同的 83 行,好像我什么也没做...

回答by O. Jones

You have already tried to use an accent-insensitive collation for your search and ordering.

您已经尝试使用不区分重音的排序规则进行搜索和排序。

http://dev.mysql.com/doc/refman/5.0/en/charset-collation-implementations.html

http://dev.mysql.com/doc/refman/5.0/en/charset-collat​​ion-implementations.html

The thing is, your NAMEcolumn seems to be stored in the latin1 (8-bit) character set. That's why mySQL is grumbling at you like this:

问题是,您的NAME列似乎存储在 latin1(8 位)字符集中。这就是为什么 mySQL 像这样向你抱怨:

  COLLATION 'utf8_general_ci' is not valid for CHARACTER SET 'latin1'

You may get the results you want if you try

如果你尝试,你可能会得到你想要的结果

 WHERE CONVERT(p.NAME USING utf8) LIKE _utf8 '%jose%' COLLATE utf8_general_ci;

But, be careful!

不过要小心!

When you use any kind of function (in this example, CONVERT) on the column in a WHERE statement, you defeat MySQL's attempts to optimize your search with indexes. If this project is going to get large (that is, if you will have lots of rows in your tables) you need to store your data in utf8 format, not latin1. (You probably already know that your LIKE '%whatever%'search term also defeats MySQL's indexing.)

当您在 WHERE 语句中的列上使用任何类型的函数(在此示例中为 CONVERT)时,您会挫败 MySQL 使用索引优化搜索的尝试。如果这个项目会变大(也就是说,如果你的表中有很多行),你需要以 utf8 格式存储数据,而不是 latin1。(你可能已经知道你的LIKE '%whatever%'搜索词也打败了 MySQL 的索引。)

回答by Miguel Mesquita Alfaiate

Just in case someone else stumbles upon this issue, I have found a way that solves the problem, at least for me.

以防万一其他人偶然发现这个问题,我找到了一种解决问题的方法,至少对我来说是这样。

I am using PHP to insert and retrieve records from the database. Even though my Database, tables and columns are utf8, as well as the encoding of the PHP files, the truth is that the encoding used in the connection between PHP and MySQL is being made using latin1. I managed to find this using

我正在使用 PHP 从数据库中插入和检索记录。尽管我的数据库、表和列是 utf8,以及 PHP 文件的编码,但事实是 PHP 和 MySQL 之间的连接中使用的编码是使用 latin1 进行的。我设法找到了这个使用

$mysqli->character_set_name();

where $mysqliis your object.

$mysqli你的对象在哪里。

For the searches to start working as expected, returning accent insensitive and case insentive records for characters with accents or not, I have to explicitly set the character set of the connection.

为了使搜索按预期开始工作,为带重音或不带重音的字符返回不区分重音和不区分大小写的记录,我必须明确设置连接的字符集。

To do this, you just have to do the following:

为此,您只需执行以下操作:

$mysqli->set_charset('utf8');

where $mysqli is your mysqli object. If you have a database management class that wraps your database functionality, this is easy to apply to a complete app. If not, you have to set this explicitly everywhere you open a connection.

其中 $mysqli 是您的 mysqli 对象。如果您有一个包含数据库功能的数据库管理类,这很容易应用于完整的应用程序。如果没有,您必须在打开连接的任何地方明确设置它。

I hope this helps someone out, as I was already freaking out about this!

我希望这可以帮助某人,因为我已经为此感到害怕了!