MySQL 使用二进制排序规则有什么影响?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5526334/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 19:23:13  来源:igfitidea点击:

What effects does using a binary collation have?

mysqlcollation

提问by Pekka

While answering this question, I became uncertain about something that I didn't manage to find a sufficient answer to.

在回答这个问题时,我对一些我没有找到足够答案的事情变得不确定。

What are the practical differences between using the binary utf8_binand the case insensitive utf8_general_cicollations?

使用二进制排序规则utf8_bin和不区分大小写utf8_general_ci排序规则之间的实际区别是什么?

I can see three:

我可以看到三个:

  1. Both have a different sorting order; _bin's sorting order is likely to put any umlauts to the end of the alphabet, because byte values are compared (right?)

  2. Only case sensitive searches in _bin

  3. No A = ?equality in _bin

  1. 两者都有不同的排序顺序;_bin的排序顺序可能会将任何变音符号放在字母表的末尾,因为比较字节值(对吗?)

  2. 仅区分大小写的搜索 _bin

  3. 没有A = ?平等_bin

Are there any other differences or side-effects to be aware of?

是否有任何其他差异或副作用需要注意?

Reference:

参考:

Similar questions that don't address the issue:

没有解决问题的类似问题:

回答by Vladislav Vaintroub

Binary collation compares your string exactly as strcmp() in C would do, if characters are different (be it just case or diacritics difference). The downside of it that the sort order is not natural.

如果字符不同(只是大小写或变音符号不同),二进制排序规则会完全像 C 中的 strcmp() 一样比较您的字符串。它的缺点是排序顺序不自然。

An example of unnatural sort order (as in "binary" is) : A,B,a,b Natural sort order would be in this case e.g : A,a,B,b (small and capital variations of the sme letter are sorted next to each other)

非自然排序顺序的一个示例(如“二进制”中的那样):A,B,a,b 在这种情况下自然排序顺序将是:A,a,B,b(对 sme 字母的小写和大写变体进行排序紧挨着)

The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.

二进制整理的实际优势在于它的速度,因为字符串比较非常简单/快速。在一般情况下,二进制索引可能不会产生预期的排序结果,但是对于精确匹配它们可能很有用。

回答by HaloWebMaster

utf8_bin: Compares strings by the binary value of each character in the string.

utf8_bin:通过字符串中每个字符的二进制值比较字符串。

utf8_general_ci: Compares strings using general language rules and using case-insensitive comparisons.

utf8_general_ci:使用通用语言规则和不区分大小写的比较来比较字符串。

utf8_general_cs: Compares strings using general language rules and using case-sensitive comparisons.

utf8_general_cs:使用通用语言规则和区分大小写的比较来比较字符串。

For example, the following will evaluate at true with either of the UTF8_generalcollations, but not with the utf8_bincollation:

例如,以下将使用任一UTF8_general归类计算为 true ,但不适用于utf8_bin归类:

? = A? = Oü = U

? = A? = Oü = U

With the utf8_general_cicollation, they would also return trueeven if not the same case. http://www.phpbuilder.com/board/showpost.php?s=2e642ac7dc5fceca2dbca1e2b9c424fd&p=10820221&postcount=2

通过utf8_general_ci整理,true即使情况不同,它们也会返回。 http://www.phpbuilder.com/board/showpost.php?s=2e642ac7dc5fceca2dbca1e2b9c424fd&p=10820221&postcount=2

回答by peufeu

The other answers explain the differences well.

其他答案很好地解释了差异。

Binary collation can be useful in some cases :

二进制整理在某些情况下很有用:

  • column contains hexadecimal data like password hashes
  • you are only interested in exact matches, not sorting
  • for identifiers with only [a-z0-9_] characters, you can even use it for sorting
  • for some reason you store numbers in CHAR() or VARCHAR columns (like telephones)
  • zipcodes
  • UUIDs
  • etc
  • 列包含十六进制数据,如密码哈希
  • 你只对精确匹配感兴趣,而不是排序
  • 对于只有 [a-z0-9_] 字符的标识符,您甚至可以使用它进行排序
  • 出于某种原因,您将数字存储在 CHAR() 或 VARCHAR 列中(如电话)
  • 邮政编码
  • UUID
  • 等等

In all those cases you can save a (little) bit of cpu cycles with a binary collation.

在所有这些情况下,您可以使用二进制排序规则节省(一点)cpu 周期。

回答by Mathieu Rodic

With utf8_general_ci, matches occur without taking case and accentuation into account. It may be a good thing when you need to perform queries on words.

使用utf8_general_ci,匹配发生时不考虑大小写和重音。当您需要对单词执行查询时,这可能是一件好事。

In utf8_bin, the match only occurs when strings are strictly the same. Queries are faster this way.

utf8_bin 中,匹配仅在字符串严格相同时发生。这样查询速度更快。