php 不区分大小写的字符串比较

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5473542/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 21:33:11  来源:igfitidea点击:

Case insensitive string comparison

phpif-statementcase-insensitive

提问by Deniz Zoeteman

I would like to compare two variables to see if they are the same, but I want this comparison to be case-insensitive.

我想比较两个变量以查看它们是否相同,但我希望这种比较不区分大小写。

For example, this would be case sensitive:

例如,这将区分大小写:

if($var1 == $var2){
   ...
}

But I want this to be case insensitive, how would I approach this?

但我希望这不区分大小写,我将如何处理?

回答by asthasr

This is fairly simple; you just need to call strtolower()on both variables.

这相当简单;你只需要调用strtolower()这两个变量。

If you need to deal with Unicode or international character sets, you can use mb_strtolower().

如果需要处理 Unicode 或国际字符集,可以使用mb_strtolower().

Please note that other answers suggest using strcasecmp()—that function does not handle multibyte characters,so results for any UTF-8 string will be bogus.

请注意,其他答案建议使用strcasecmp()- 该函数不处理多字节字符,因此任何 UTF-8 字符串的结果都是假的。

回答by Ramon

strcasecmp()returns 0 if the strings are the same (apart from case variations) so you can use:

strcasecmp()如果字符串相同(大小写变化除外),则返回 0,因此您可以使用:

if (strcasecmp($var1, $var2) == 0) {
}

回答by Beat

If your string is in a single byte encoding, it's simple:

如果您的字符串采用单字节编码,则很简单:

if(strtolower($var1) === strtolower($var2))

If your string is UTF-8, you have to consider the complexity of Unicode: to-lower-case and to-upper-case are not bijective functions, i.e. if you have a lower case character, transform it to upper case, and transform it back to lower case, you may not end up with the same code point (and the same holds true if you start with an upper case character).

如果你的字符串是UTF-8,你就得考虑Unicode的复杂性:to-lower-case和to-upper-case不是双射函数,即如果你有小写字符,把它转换成大写,然后再转换回到小写,您可能不会得到相同的代码点(如果您以大写字符开头,情况也是如此)。

E.g.

例如

  • "?" (Latin Capital Letter I with Dot Above, U+0130) is an upper case character, with "i" (Latin Small Letter I, U+0069) as its lower case variant – and "i"'s upper case variant is "I" (Latin Capital Letter I, U+0049).
  • "?" (Latin Small Letter Dotless I, U+0131) is a lower case character, with "I" (Latin Capital Letter I, U+0049) as its upper case variant – and "I"'s lower case variant is "i" (Latin Small Letter I, U+0069)
  • “?” ( Latin Capital Letter I with Dot Above, U+0130) 是一个大写字符,“i”( Latin Small Letter I, U+0069) 作为它的小写变体——而“i”的大写变体是“I”( Latin Capital Letter I, U+0049)。
  • “?” ( Latin Small Letter Dotless I, U+0131) 是一个小写字符,“I” ( Latin Capital Letter I, U+0049) 作为它的大写变体——而“I”的小写变体是“i” ( Latin Small Letter I, U+0069)

So mb_strtolower('?') === mb_strtolower('i')returns false, even though they have the same upper case character. If you really want a case-insensitive string comparison function, you have to compare to upper case AND the lower case version:

因此mb_strtolower('?') === mb_strtolower('i')返回 false,即使它们具有相同的大写字符。如果你真的想要一个不区分大小写的字符串比较函数,你必须比较大写和小写版本:

if(mb_strtolower($string1) === mb_strtolower($string2)
  || mb_strtoupper($string1) === mb_strtoupper($string2))

I've run a query against the Unicode database from https://codepoints.net(https://dumps.codepoints.net) and I've found 180 code point for which I found a different character when taking a lower case characters's upper case's lower case, and 8 code point for which I found a different character when taking an upper case characters's lower case's upper case

我已经从https://codepoints.net( https://dumps.codepoints.net)对 Unicode 数据库运行了一个查询,我发现了 180 个代码点,当我使用小写字符时,我发现了一个不同的字符大写的小写,以及 8 个代码点,我在取大写字符的小写大写时发现了不同的字符

But it gets worse: the same grapheme cluster seen by the user, may have multiple ways of encoding it: "?" may be represented as Latin Small Letter a with Diaeresis (U+00E4)or as Latin Small Letter A (U+0061)and Combining Diaeresis (U+0308)– and if you compare them at a byte level, this won't return true!

但情况变得更糟:用户看到的同一个字素簇,可能有多种编码方式:“?” 可以表示为Latin Small Letter a with Diaeresis (U+00E4)或表示为Latin Small Letter A (U+0061)Combining Diaeresis (U+0308)——如果你在字节级别比较它们,这不会返回真!

But there is a solution for this in Unicode: Normalization! There are four different forms: NFC, NFD, NFKC, NFKD. For string comparison, NFC and NFD are equivalent and NFKC and NFKD are equivalent. I'd take NFKC as it is shorter than NFKD, and "?" (Latin Small Ligature ff, U+FB00) will be transformed to two normal "f" (but 2? will also be expanded to 25…).

但是在 Unicode 中有一个解决方案:规范化!有四种不同的形式:NFC、NFD、NFKC、NFKD。对于字符串比较,NFC 和 NFD 是等效的,NFKC 和 NFKD 是等效的。我会选择 NFKC,因为它比 NFKD 短,并且“?” ( Latin Small Ligature ff, U+FB00) 将被转换为两个正常的“f”(但 2? 也会被扩展为 25…)。

The resulting function becomes:

结果函数变为:

function mb_is_string_equal_ci($string1, $string2) {
    $string1_normalized = Normalizer::normalize($string1, Normalizer::FORM_KC);
    $string2_normalized = Normalizer::normalize($string2, Normalizer::FORM_KC);
    return mb_strtolower($string1_normalized) === mb_strtolower($string2_normalized)
            || mb_strtoupper($string1_normalized) === mb_strtoupper($string2_normalized);
}

Please note:

请注意:

  • you need the intlpackage for the Normalizer
  • you should optimize this function by first checking if they're just equal^^
  • you may want to use NFC instead of NFKC, because NFKC removes too many formatting distinctions for your taste
  • you have to decide for yourself, if you really need all this complexity or if you prefer a simpler variant of this function
  • 您需要用于Normalizerintl
  • 您应该首先检查它们是否相等来优化此功能^^
  • 您可能想要使用 NFC 而不是 NFKC,因为 NFKC 删除了太多格式差异以满足您的口味
  • 您必须自己决定,是否真的需要所有这些复杂性,或者您是否更喜欢此功能的更简单的变体

回答by Shakti Singh

if(strtolower($var1) == strtolower($var2)){
}

回答by jpea

Why not:

为什么不:

if(strtolower($var1) == strtolower($var2)){
}

回答by Oswald

Use strcasecmp.

使用strcasecmp