MySQL utf8_bin 与 utf_unicode_ci

Question

提问by Delan Azabani

My table Website

我的桌子网站

Website_Name//column name
Google
Facebook
Twitter
Orkut
Frype
Skype
Yahoo
Wikipedia

I i use utf8_bin collation then my query to search wikipedia in Website is

我使用 utf8_bin 整理然后我在网站中搜索维基百科的查询是

Select Website_Name from Website where lower(Website_Name)='wikipedia'

And if i use utf8_unicode_ci then my select query to search wikipedia in Website is

如果我使用 utf8_unicode_ci 那么我在网站中搜索维基百科的选择查询是

Select Website_Name from Website where Website_Name='wikipedia'

Now I want to know which collation is best depending upon the following queries

现在我想知道哪种排序规则最好取决于以下查询

Answer 1

回答by Delan Azabani

It depends on what you need.

这取决于你需要什么。

The utf8_bincollation compares strings based purely on their Unicode code pointvalues. If all of the code points have the same values, then the strings are equal. However, this falls apart when you have strings with different composition for combining marks (composed vs. decomposed) or characters that are canonically equivalent but don't have the same code point value. In some cases, using utf8_binwill result in strings not matching when you expect them to. Theoretically, utf8_binis the fastest because no Unicode normalization is applied to the strings, but it may not be what you want.

在utf8_bin整理比较了它们的Unicode纯粹基于字符串的代码点值。如果所有代码点都具有相同的值，则字符串相等。但是，当您有不同组合的字符串用于组合标记（组合与分解）或规范等价但不具有相同代码点值的字符时，这会崩溃。在某些情况下，使用utf8_bin会导致字符串与您期望的不匹配。理论上，utf8_bin是最快的，因为没有对字符串应用 Unicode 规范化，但这可能不是您想要的。

utf8_general_ciapplies Unicode normalization using language-specific rules and compares strings case-insensitively. utf8_general_csdoes the same, but compares strings case-sensitively.

utf8_general_ci使用特定于语言的规则应用 Unicode 规范化，并且不区分大小写地比较字符串。utf8_general_cs做同样的事情，但比较字符串区分大小写。

Answer 2

回答by Roland Bouman

Personally I would go with utf8_unicode_ci, if you expect that lettercase is generally not important for the results you want to find.

我个人会选择utf8_unicode_ci，如果您认为字母大写对于您想要查找的结果通常并不重要。

Collations aren't only used at runtime, but also when MySQL builds indexes. So if any of these columns appear in an index, finding data according to the comparison rules of that collation will be pretty much as fast as it ever gets.

排序规则不仅在运行时使用，而且在 MySQL 构建索引时也使用。因此，如果这些列中的任何一个出现在索引中，根据该排序规则的比较规则查找数据的速度将与以往一样快。

In those cases where you do not want case insensitive matching, then do not apply upper or lower. Instead, apply the BINARYkeyword in front of the utf8 column to force a literal code-point comparison rather than one according to the collation.

在您不希望不区分大小写匹配的情况下，不要应用上限或下限。相反，BINARY在 utf8 列前面应用关键字以强制进行文字代码点比较，而不是根据排序规则进行比较。

mysql> create table utf8 (name varchar(24) charset utf8 collate utf8_general_ci, primary key (name));
Query OK, 0 rows affected (0.14 sec)

mysql> insert into utf8 values ('Roland');
Query OK, 1 row affected (0.00 sec)

mysql> insert into utf8 values ('roland');
ERROR 1062 (23000): Duplicate entry 'roland' for key 'PRIMARY'
mysql> select * from utf8 where name = 'roland';
+--------+
| name   |
+--------+
| Roland |
+--------+
1 row in set (0.00 sec)

mysql> select * from utf8 where binary name = 'roland';
Empty set (0.01 sec)

This should be much faster than using lower or upper, since in those cases, MySQL first needs to make a copy of the column value and modify its lettercase, and then apply the comparison. With BINARY in place it will simply use the index first to find matches, and then do a code-point by code-point comparison untill it finds the values are not equal, which will generally be faster.

这应该比使用 lower 或 upper 快得多，因为在这些情况下，MySQL 首先需要复制列值并修改其字母大小写，然后应用比较。使用 BINARY 后，它将简单地首先使用索引来查找匹配项，然后通过代码点比较进行代码点，直到发现值不相等，这通常会更快。

Answer 3

回答by Jiro Matchonson

I was using 'utf8_unicode_ci' which is default by doctrine , i had to change it to :

我使用的是默认的“utf8_unicode_ci”，我不得不将其更改为：

 * @ORM\Table(name = "Table", options={"collate"="utf8_bin"})

Since some of my composite primary keys consisted of text fields. Sadly 'utf8_unicode_ci' resolved "poistny" and "poistny" as same primary key value and ended with crash at doctrine inserting flush. I could not simply change collation of one part of composite primary key, had to drop table and recreate. Hope it saves time to someone else..

因为我的一些复合主键由文本字段组成。遗憾的是，'utf8_unicode_ci' 将 "poistny" 和 "poistny" 解析为相同的主键值，并以插入刷新的原则崩溃而告终。我不能简单地更改复合主键的一部分的排序规则，必须删除表并重新创建。希望它为其他人节省时间..

MySQL utf8_bin 与 utf_unicode_ci

提问by Delan Azabani

回答by Delan Azabani

回答by Roland Bouman

回答by Jiro Matchonson

相关推荐

最近更新

标签

MySQL utf8_bin 与 utf_unicode_ci

提问by Delan Azabani

回答by Delan Azabani

回答by Roland Bouman

回答by Jiro Matchonson

相关推荐

MySQL 如何找出MySQL root密码

MySQL LIKE IN()？

MySQL 为 SELECT 查询合并 2 个表？

MySQL 测试远程数据库访问主机

相关推荐

最近更新

标签