MySQL 使用二进制排序规则有什么影响?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5526334/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What effects does using a binary collation have?
提问by Pekka
While answering this question, I became uncertain about something that I didn't manage to find a sufficient answer to.
在回答这个问题时,我对一些我没有找到足够答案的事情变得不确定。
What are the practical differences between using the binary utf8_bin
and the case insensitive utf8_general_ci
collations?
使用二进制排序规则utf8_bin
和不区分大小写utf8_general_ci
排序规则之间的实际区别是什么?
I can see three:
我可以看到三个:
Both have a different sorting order;
_bin
's sorting order is likely to put any umlauts to the end of the alphabet, because byte values are compared (right?)Only case sensitive searches in
_bin
No
A = ?
equality in_bin
两者都有不同的排序顺序;
_bin
的排序顺序可能会将任何变音符号放在字母表的末尾,因为比较字节值(对吗?)仅区分大小写的搜索
_bin
没有
A = ?
平等_bin
Are there any other differences or side-effects to be aware of?
是否有任何其他差异或副作用需要注意?
Reference:
参考:
- 9.1.2. Character Sets and Collations in MySQL
- 9.1.7.6. The _bin and binary Collationsin the mySQL manual
- 9.1.7.7. The BINARY Operator
Similar questions that don't address the issue:
没有解决问题的类似问题:
回答by Vladislav Vaintroub
Binary collation compares your string exactly as strcmp() in C would do, if characters are different (be it just case or diacritics difference). The downside of it that the sort order is not natural.
如果字符不同(只是大小写或变音符号不同),二进制排序规则会完全像 C 中的 strcmp() 一样比较您的字符串。它的缺点是排序顺序不自然。
An example of unnatural sort order (as in "binary" is) : A,B,a,b Natural sort order would be in this case e.g : A,a,B,b (small and capital variations of the sme letter are sorted next to each other)
非自然排序顺序的一个示例(如“二进制”中的那样):A,B,a,b 在这种情况下自然排序顺序将是:A,a,B,b(对 sme 字母的小写和大写变体进行排序紧挨着)
The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.
二进制整理的实际优势在于它的速度,因为字符串比较非常简单/快速。在一般情况下,二进制索引可能不会产生预期的排序结果,但是对于精确匹配它们可能很有用。
回答by HaloWebMaster
utf8_bin
: Compares strings by the binary value of each character in the string.
utf8_bin
:通过字符串中每个字符的二进制值比较字符串。
utf8_general_ci
: Compares strings using general language rules and using case-insensitive comparisons.
utf8_general_ci
:使用通用语言规则和不区分大小写的比较来比较字符串。
utf8_general_cs
: Compares strings using general language rules and using case-sensitive comparisons.
utf8_general_cs
:使用通用语言规则和区分大小写的比较来比较字符串。
For example, the following will evaluate at true with either of the UTF8_general
collations, but not with the utf8_bin
collation:
例如,以下将使用任一UTF8_general
归类计算为 true ,但不适用于utf8_bin
归类:
? = A
? = O
ü = U
? = A
? = O
ü = U
With the utf8_general_ci
collation, they would also return true
even if not the same case.
http://www.phpbuilder.com/board/showpost.php?s=2e642ac7dc5fceca2dbca1e2b9c424fd&p=10820221&postcount=2
通过utf8_general_ci
整理,true
即使情况不同,它们也会返回。
http://www.phpbuilder.com/board/showpost.php?s=2e642ac7dc5fceca2dbca1e2b9c424fd&p=10820221&postcount=2
回答by peufeu
The other answers explain the differences well.
其他答案很好地解释了差异。
Binary collation can be useful in some cases :
二进制整理在某些情况下很有用:
- column contains hexadecimal data like password hashes
- you are only interested in exact matches, not sorting
- for identifiers with only [a-z0-9_] characters, you can even use it for sorting
- for some reason you store numbers in CHAR() or VARCHAR columns (like telephones)
- zipcodes
- UUIDs
- etc
- 列包含十六进制数据,如密码哈希
- 你只对精确匹配感兴趣,而不是排序
- 对于只有 [a-z0-9_] 字符的标识符,您甚至可以使用它进行排序
- 出于某种原因,您将数字存储在 CHAR() 或 VARCHAR 列中(如电话)
- 邮政编码
- UUID
- 等等
In all those cases you can save a (little) bit of cpu cycles with a binary collation.
在所有这些情况下,您可以使用二进制排序规则节省(一点)cpu 周期。
回答by Mathieu Rodic
With utf8_general_ci, matches occur without taking case and accentuation into account. It may be a good thing when you need to perform queries on words.
使用utf8_general_ci,匹配发生时不考虑大小写和重音。当您需要对单词执行查询时,这可能是一件好事。
In utf8_bin, the match only occurs when strings are strictly the same. Queries are faster this way.
在utf8_bin 中,匹配仅在字符串严格相同时发生。这样查询速度更快。