C语言 C 中不区分大小写的字符串组合

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5820810/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 08:31:24  来源:igfitidea点击:

Case Insensitive String comp in C

cstringstandard-library

提问by bond425

I have two postcodes char*that I want to compare, ignoring case. Is there a function to do this?

我有两个char*要比较的邮政编码,忽略大小写。有没有一个功能可以做到这一点?

Or do I have to loop through each use the tolower function and then do the comparison?

或者我是否必须遍历每个使用 tolower 函数然后进行比较?

Any idea how this function will react with numbers in the string

知道这个函数将如何对字符串中的数字做出反应

Thanks

谢谢

回答by Fred Foo

There is no function that does this in the C standard. Unix systems that comply with POSIX are required to have strcasecmpin the header strings.h; Microsoft systems have stricmp. To be on the portable side, write your own:

C 标准中没有执行此操作的函数。符合 POSIX 的 Unix 系统需要strcasecmp在头文件中包含strings.h;微软系统有stricmp. 为了便于携带,请编写自己的:

int strcicmp(char const *a, char const *b)
{
    for (;; a++, b++) {
        int d = tolower((unsigned char)*a) - tolower((unsigned char)*b);
        if (d != 0 || !*a)
            return d;
    }
}

But note that none of these solutions will work with UTF-8 strings, only ASCII ones.

但请注意,这些解决方案都不适用于 UTF-8 字符串,只能使用 ASCII 字符串。

回答by Mihran Hovsepyan

Take a look to strcasecmp()in strings.h.

看一看strcasecmp()中的strings.h.

回答by Zohar81

I've found built-in such method named from which contains additional string functions to the standard header .

我发现内置的此类方法命名为 from ,其中包含标准 header 的附加字符串函数。

Here's the relevant signatures :

这是相关的签名:

int  strcasecmp(const char *, const char *);
int  strncasecmp(const char *, const char *, size_t);

I also found it's synonym in xnu kernel (osfmk/device/subrs.c) and it's implemented in the following code, so you wouldn't expect to have any change of behavior in number compared to the original strcmp function.

我还在 xnu 内核 (osfmk/device/subrs.c) 中发现了它的同义词,它在以下代码中实现,因此与原始 strcmp 函数相比,您不会期望在数量上有任何变化。

tolower(unsigned char ch) {
    if (ch >= 'A' && ch <= 'Z')
        ch = 'a' + (ch - 'A');
    return ch;
 }

int strcasecmp(const char *s1, const char *s2) {
    const unsigned char *us1 = (const u_char *)s1,
                        *us2 = (const u_char *)s2;

    while (tolower(*us1) == tolower(*us2++))
        if (*us1++ == '
int strcicmpL(char const *a, char const *b) {
  while (*a) {
    int d = tolower(*a) - tolower(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return 0;
}

int strcicmpU(char const *a, char const *b) {
  while (*a) {
    int d = toupper(*a) - toupper(*b);
    if (d) {
        return d;
    } 
    a++;
    b++;
  } 
  return 0;
}
') return (0); return (tolower(*us1) - tolower(*--us2)); }

回答by Jonathan Wood

I would use stricmp(). It compares two strings without regard to case.

我会使用stricmp(). 它比较两个字符串而不考虑大小写。

Note that, in some cases, converting the string to lower case can be faster.

请注意,在某些情况下,将字符串转换为小写会更快。

回答by chux - Reinstate Monica

Additional pitfalls to watch out for when doing case insensitive compares:

进行不区分大小写的比较时要注意的其他陷阱:



Comparing as lower or as upper case? (common enough issue)

比较小写还是大写?(足够常见的问题)

Both below will return 0 with strcicmpL("A", "a")and strcicmpU("A", "a").
Yet strcicmpL("A", "_")and strcicmpU("A", "_")can return different signed results as '_'is often between the upper and lower case letters.

下面两个都将返回 0strcicmpL("A", "a")strcicmpU("A", "a")
然而,strcicmpL("A", "_")andstrcicmpU("A", "_")可以返回不同的签名结果,'_'通常在大写和小写字母之间。

This affects the sort order when used with qsort(..., ..., ..., strcicmp). Non-standard library C functions like the commonly available stricmp()or strcasecmp()tend to be well defined and favor comparing via lowercase. Yet variations exist.

这会影响与 一起使用时的排序顺序qsort(..., ..., ..., strcicmp)。非标准库 C 函数,如常用的 stricmp()strcasecmp()倾向于定义良好的函数,并且倾向于通过小写字母进行比较。然而,变化是存在的。

tolower(*a); // Potential UB
tolower((unsigned char) *a); // Correct


charcan have a negative value. (not rare)

char可以有负值。(并不罕见)

touppper(int)and tolower(int)are specified for unsigned charvalues and the negative EOF. Further, strcmp()returns results as if each charwas converted to unsigned char, regardless if charis signedor unsigned.

touppper(int)tolower(int)unsigned charvalues 和负数指定EOF。此外,strcmp()返回结果就像每个都char被转换为unsigned char,无论charsigned还是unsigned

int d = tolower(toupper(*a)) - tolower(toupper(*b));


Locale (less common)

语言环境(不太常见)

Although character sets using ASCII code (0-127) are ubiquitous, the remainder codes tend to have localespecific issues. So strcasecmp("\xE4", "a")might return a 0 on one system and non-zero on another.

尽管使用 ASCII 代码 (0-127) 的字符集无处不在,但其余代码往往具有特定于语言环境的问题。所以strcasecmp("\xE4", "a")可能在一个系统上返回 0,在另一个系统上返回非零。



Unicode (the way of the future)

Unicode(未来的方式)

If a solution needs to handle more than ASCII consider a unicode_strcicmp(). As C lib does not provide such a function, a pre-coded function from some alternate library is recommended. Writing your own unicode_strcicmp()is a daunting task.

如果解决方案需要处理的不仅仅是 ASCII,请考虑使用unicode_strcicmp(). 由于 C lib 不提供这样的函数,因此建议使用某个替代库中的预编码函数。编写自己的代码 unicode_strcicmp()是一项艰巨的任务。



Do all letters map one lower to one upper? (pedantic)

所有字母都从低到高映射吗?(迂腐)

[A-Z] maps one-to-one with [a-z], yet various localesmap various lower case chracters to one upper and visa-versa. Further, some uppercase characters may lack a lower case equivalent and again, visa-versa.

[AZ] 与 [az] 一对一映射,但各种语言环境将各种小写字符映射到一个大写字符,反之亦然。此外,某些大写字符可能缺少等效的小写字符,反之亦然。

This obliges code to covert through both tolower()and tolower().

这迫使代码通过tolower()和 进行转换tolower()

static unsigned char low1[UCHAR_MAX + 1] = {
  0, 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...  // @ABC... Z[...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...  // `abc... z{...
}
static unsigned char low2[UCHAR_MAX + 1] = {
// v--- Not zero, but A which matches none in `low1[]`
  'A', 1, 2, 3, ...
  '@', 'a', 'b', 'c', ... 'z', `[`, ...
  '`', 'a', 'b', 'c', ... 'z', `{`, ...
}

int strcicmp_ch(char const *a, char const *b) {
  // compare using tables that differ slightly.
  while (low1[(unsigned char)*a] == low2[(unsigned char)*b]) {
    a++;
    b++;
  }
  // Either strings differ or null character detected.
  // Perform subtraction using same table.
  return (low1[(unsigned char)*a] - low1[(unsigned char)*b]);
}

Again, potential different results for sorting if code did tolower(toupper(*a))vs. toupper(tolower(*a)).

同样,如果代码tolower(toupper(*a))toupper(tolower(*a)).



Portability

可移植性

@B. Nadolsonrecommends to avoid rolling your own strcicmp()and this is reasonable, except when code needs high equivalent portable functionality.

@B。Nadolson建议避免自己滚动strcicmp(),这是合理的,除非代码需要高度等效的可移植功能。

Below is an approach that even performed faster than some system provided functions. It does a single compare per loop rather than two by using 2 different tables that differ with '\0'. Your results may vary.

下面是一种甚至比某些系统提供的函数执行得更快的方法。它通过使用 2 个不同的表对每个循环进行一次比较而不是两次比较'\0'。您的结果可能会有所不同。

#include <ctype.h> // for `tolower()`
#include <limits.h> // for `INT_MIN`

// Case-insensitive `strncmp()`
static inline int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = INT_MIN;
    size_t chars_compared = 0;

    if (!str1 || !str2)
    {
        goto done;
    }

    while ((*str1 || *str2) && (chars_compared < num))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

done:
    return ret_code;
}

回答by Gabriel Staples

I'm not really a fan of the most-upvoted answer here(in part because it isn't correct since it should continue if it reads a null terminator in either string--but not both strings at once--and it doesn't do this), so I wrote my own.

我不是这里最受支持的答案的粉丝(部分原因是它不正确,因为如果它在任一字符串中读取空终止符,它应该继续——但不是同时读取两个字符串——而且它不不这样做),所以我写了我自己的。

This is a direct drop-in replacement for strncmp(), and has been fully tested with numerous test cases, as shown below:

这是 的直接替代品strncmp(),并且已经过大量测试用例的全面测试,如下所示:

The code only:

仅代码:

#include <ctype.h> // for `tolower()`
#include <limits.h> // for `INT_MIN`

/*

Case-insensitive string compare (strncmp case-insensitive)
- Identical to strncmp except case-insensitive. See: http://www.cplusplus.com/reference/cstring/strncmp/
- Aided/inspired, in part, by: https://stackoverflow.com/a/5820991/4561887

str1    C string 1 to be compared
str2    C string 2 to be compared
num     max number of chars to compare

return:
(essentially identical to strncmp)
INT_MIN  invalid arguments (one or both of the input strings is a NULL pointer)
<0       the first character that does not match has a lower value in str1 than in str2
 0       the contents of both strings are equal
>0       the first character that does not match has a greater value in str1 than in str2

*/
static inline int strncmpci(const char * str1, const char * str2, size_t num)
{
    int ret_code = INT_MIN;

    size_t chars_compared = 0;

    // Check for NULL pointers
    if (!str1 || !str2)
    {
        goto done;
    }

    // Continue doing case-insensitive comparisons, one-character-at-a-time, of str1 to str2, 
    // as long as at least one of the strings still has more characters in it, and we have
    // not yet compared num chars.
    while ((*str1 || *str2) && (chars_compared < num))
    {
        ret_code = tolower((int)(*str1)) - tolower((int)(*str2));
        if (ret_code != 0)
        {
            // The 2 chars just compared don't match
            break;
        }
        chars_compared++;
        str1++;
        str2++;
    }

done:
    return ret_code;
}

Fully-commented version:

完整评论版:

int main()
{
    printf("Hello World\n\n");

    const char * str1;
    const char * str2;
    size_t n;

    str1 = "hey";
    str2 = "HEY";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "heY";
    str2 = "HeY";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "HEdY";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "heY";
    str2 = "HeYd";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));   
    printf("\n");

    str1 = "heY";
    str2 = "HeYd";
    n = 6;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "hey";
    n = 6;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "heyd";
    n = 6;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    str1 = "hey";
    str2 = "heyd";
    n = 3;
    printf("strncmpci(%s, %s, %u) = %i\n", str1, str2, n, strncmpci(str1, str2, n));
    printf("strncmp(%s, %s, %u) = %i\n", str1, str2, n, strncmp(str1, str2, n));
    printf("\n");

    return 0;
}

Test code: (run it online here): https://onlinegdb.com/B1Qoj0W_N

测试代码:(在此在线运行):https: //onlinegdb.com/B1Qoj0W_N

#include <stdio.h>

#ifdef _WIN32
#include <string.h>
#define strcasecmp _stricmp
#else // assuming POSIX or BSD compliant system
#include <strings.h>
#endif

int main() {
    printf("%d", strcasecmp("teSt", "TEst"));
}

Sample output:

示例输出:

Hello World

strncmpci(hey, HEY, 3) = 0
strncmp(hey, HEY, 3) = 32

strncmpci(heY, HeY, 3) = 0
strncmp(heY, HeY, 3) = 32

strncmpci(hey, HEdY, 3) = 21
strncmp(hey, HEdY, 3) = 32

strncmpci(heY, HeYd, 3) = 0
strncmp(heY, HeYd, 3) = 32

strncmpci(heY, HeYd, 6) = -100
strncmp(heY, HeYd, 6) = 32

strncmpci(hey, hey, 6) = 0
strncmp(hey, hey, 6) = 0

strncmpci(hey, heyd, 6) = -100
strncmp(hey, heyd, 6) = -100

strncmpci(hey, heyd, 3) = 0
strncmp(hey, heyd, 3) = 0

你好,世界

strncmpci(嘿,嘿,3) = 0
strncmp(嘿,嘿,3) = 32

strncmpci(heY, HeY, 3) = 0
strncmp(heY, HeY, 3) = 32

strncmpci(hey, HEdY, 3) = 21
strncmp(hey, HEdY, 3) = 32

strncmpci(heY, HeYd, 3) = 0
strncmp(heY, HeYd, 3) = 32

strncmpci(heY, HeYd, 6) = -100
strncmp(heY, HeYd, 6) = 32

strncmpci(嘿,嘿,6) = 0
strncmp(嘿,嘿,6) = 0

strncmpci(hey, heyd, 6) = -100
strncmp(hey, heyd, 6) = -100

strncmpci(嘿,嘿,3)= 0
strncmp(嘿,嘿,3)= 0

References:

参考:

  1. This question & other answers here served as inspiration and gave some insight (Case Insensitive String comp in C)
  2. http://www.cplusplus.com/reference/cstring/strncmp/
  3. https://en.wikipedia.org/wiki/ASCII
  4. https://en.cppreference.com/w/c/language/operator_precedence
  1. 这个问题和这里的其他答案起到了启发作用,并提供了一些见解(C 中不区分大小写的字符串组合
  2. http://www.cplusplus.com/reference/cstring/strncmp/
  3. https://en.wikipedia.org/wiki/ASCII
  4. https://en.cppreference.com/w/c/language/operator_precedence

回答by Miljen Mikic

As others have stated, there is no portable function that works on all systems. You can partially circumvent this with simple ifdef:

正如其他人所说,没有适用于所有系统的便携式功能。您可以通过简单的方法部分规避这一点ifdef

int str_case_ins_cmp(const char* a, const char* b) {
  int rc;

  while (1) {
    rc = tolower((unsigned char)*a) - tolower((unsigned char)*b);
    if (rc || !*a) {
      break;
    }

    ++a;
    ++b;
  }

  return rc;
}

回答by ericcurtin

Simple solution:

简单的解决方案:

const char *cm = charmap,
        *us1 = (const char *)s1,
        *us2 = (const char *)s2;
while (cm[*us1] == cm[*us2++])
    if (*us1++ == '
static int ignoreCaseComp (const char *str1, const char *str2, int length)
{
    int k;
    for (k = 0; k < length; k++)
    {

        if ((str1[k] | 32) != (str2[k] | 32))
            break;
    }

    if (k != length)
        return 1;
    return 0;
}
') return (0); return (cm[*us1] - cm[*--us2]);

回答by Andrey Suvorov

You can get an idea, how to implement an efficient one, if you don't have any in the library, from here

你可以得到一个想法,如何实现一个有效率的,如果你没有任何在图书馆,从这里

It use a table for all 256 chars.

它对所有 256 个字符使用一个表格。

  • in that table for all chars, except letters - used its ascii codes.
  • for upper case letter codes - the table list codes of lower cased symbols.
  • 在该表中,除字母外的所有字符都使用其 ascii 代码。
  • 对于大写字母代码 - 表列出了小写符号的代码。

then we just need to traverse a strings and compare our table cells for a given chars:

然后我们只需要遍历一个字符串并比较给定字符的表格单元格:

##代码##

回答by S. M. AMRAN

##代码##

Reference

参考