C++中不区分大小写的字符串比较

Question

提问by Adam

What is the best way of doing case-insensitive string comparison in C++ without transforming a string to all uppercase or all lowercase?

在 C++ 中进行不区分大小写的字符串比较而不将字符串转换为全部大写或全部小写的最佳方法是什么？

Please indicate whether the methods are Unicode-friendly and how portable they are.

请说明这些方法是否对 Unicode 友好以及它们的可移植性如何。

Answer 1

采纳答案by Rob

Boost includes a handy algorithm for this:

Boost 包含一个方便的算法：

#include <boost/algorithm/string.hpp>
// Or, for fewer header dependencies:
//#include <boost/algorithm/string/predicate.hpp>

std::string str1 = "hello, world!";
std::string str2 = "HELLO, WORLD!";

if (boost::iequals(str1, str2))
{
    // Strings are identical
}

Answer 2

回答by wilhelmtell

Take advantage of the standard char_traits. Recall that a std::stringis in fact a typedef for std::basic_string<char>, or more explicitly, std::basic_string<char, std::char_traits<char> >. The char_traitstype describes how characters compare, how they copy, how they cast etc. All you need to do is typedef a new string over basic_string, and provide it with your own custom char_traitsthat compare case insensitively.

充分利用标准char_traits。回想一下， astd::string实际上是的 typedef std::basic_string<char>，或者更明确地说，是std::basic_string<char, std::char_traits<char> >。char_traits类型描述了字符如何比较、如何复制、如何转换等。您需要做的就是在上键入一个新字符串basic_string，并为它提供您自己的char_traits不区分大小写比较的自定义。

struct ci_char_traits : public char_traits<char> {
    static bool eq(char c1, char c2) { return toupper(c1) == toupper(c2); }
    static bool ne(char c1, char c2) { return toupper(c1) != toupper(c2); }
    static bool lt(char c1, char c2) { return toupper(c1) <  toupper(c2); }
    static int compare(const char* s1, const char* s2, size_t n) {
        while( n-- != 0 ) {
            if( toupper(*s1) < toupper(*s2) ) return -1;
            if( toupper(*s1) > toupper(*s2) ) return 1;
            ++s1; ++s2;
        }
        return 0;
    }
    static const char* find(const char* s, int n, char a) {
        while( n-- > 0 && toupper(*s) != toupper(a) ) {
            ++s;
        }
        return s;
    }
};

typedef std::basic_string<char, ci_char_traits> ci_string;

The details are on Guru of The Week number 29.

详情请见《周刊大师》第 29 号。

Answer 3

回答by Timmmm

The trouble with boost is that you have to link with and depend on boost. Not easy in some cases (e.g. android).

boost 的问题在于你必须链接并依赖于 boost。在某些情况下并不容易（例如 android）。

And using char_traits means allyour comparisons are case insensitive, which isn't usually what you want.

使用 char_traits 意味着您的所有比较都不区分大小写，这通常不是您想要的。

This should suffice. It should be reasonably efficient. Doesn't handle unicode or anything though.

这应该足够了。它应该是相当有效的。不处理 unicode 或任何东西。

bool iequals(const string& a, const string& b)
{
    unsigned int sz = a.size();
    if (b.size() != sz)
        return false;
    for (unsigned int i = 0; i < sz; ++i)
        if (tolower(a[i]) != tolower(b[i]))
            return false;
    return true;
}

Update: Bonus C++14 version (#include <algorithm>):

更新：奖励 C++14 版本 ( #include <algorithm>)：

bool iequals(const string& a, const string& b)
{
    return std::equal(a.begin(), a.end(),
                      b.begin(), b.end(),
                      [](char a, char b) {
                          return tolower(a) == tolower(b);
                      });
}

Answer 4

回答by Derek Park

If you are on a POSIX system, you can use strcasecmp. This function is not part of standard C, though, nor is it available on Windows. This will perform a case-insensitive comparison on 8-bit chars, so long as the locale is POSIX. If the locale is not POSIX, the results are undefined (so it might do a localized compare, or it might not). A wide-character equivalent is not available.

如果您使用的是 POSIX 系统，则可以使用strcasecmp。但是，此函数不是标准 C 的一部分，在 Windows 上也不可用。这将对 8 位字符执行不区分大小写的比较，只要语言环境是 POSIX。如果语言环境不是 POSIX，则结果未定义（因此它可能会进行本地化比较，也可能不会）。宽字符等效项不可用。

Failing that, a large number of historic C library implementations have the functions stricmp() and strnicmp(). Visual C++ on Windows renamed all of these by prefixing them with an underscore because they aren't part of the ANSI standard, so on that system they're called _stricmp or _strnicmp. Some libraries may also have wide-character or multibyte equivalent functions (typically named e.g. wcsicmp, mbcsicmp and so on).

否则，大量历史悠久的 C 库实现都具有函数 stricmp() 和 strnicmp()。Windows 上的 Visual C++ 通过在它们前面加上下划线来重命名所有这些，因为它们不是 ANSI 标准的一部分，因此在该系统上它们被称为_stricmp 或 _strnicmp。某些库也可能具有宽字符或多字节等效函数（通常命名为 wcsicmp、mbcsicmp 等）。

C and C++ are both largely ignorant of internationalization issues, so there's no good solution to this problem, except to use a third-party library. Check out IBM ICU (International Components for Unicode)if you need a robust library for C/C++. ICU is for both Windows and Unix systems.

C 和 C++ 在很大程度上都对国际化问题一无所知，因此除了使用第三方库之外，没有很好的解决方案。如果您需要一个健壮的 C/C++ 库，请查看IBM ICU（Unicode 国际组件）。ICU 适用于 Windows 和 Unix 系统。

Answer 5

回答by Coincoin

Are you talking about a dumb case insensitive compare or a full normalized Unicode compare?

你是在谈论一个愚蠢的不区分大小写的比较还是一个完全规范化的 Unicode 比较？

A dumb compare will not find strings that might be the same but are not binary equal.

哑比较不会找到可能相同但二进制不相等的字符串。

Example:

例子：

U212B (ANGSTROM SIGN)
U0041 (LATIN CAPITAL LETTER A) + U030A (COMBINING RING ABOVE)
U00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE).

Are all equivalent but they also have different binary representations.

都是等价的，但它们也有不同的二进制表示。

That said, Unicode Normalizationshould be a mandatory read especially if you plan on supporting Hangul, Tha? and other asian languages.

也就是说，Unicode 规范化应该是必读的，尤其是如果您计划支持韩文，Tha？和其他亚洲语言。

Also, IBM pretty much patented most optimized Unicode algorithms and made them publicly available. They also maintain an implementation : IBM ICU

此外，IBM 几乎为最优化的 Unicode 算法申请了专利，并将它们公开可用。他们还维护一个实现：IBM ICU

Answer 6

回答by Igor Milyakov

boost::iequals is not utf-8 compatible in the case of string. You can use boost::locale.

boost::iequals 在字符串的情况下与 utf-8 不兼容。您可以使用boost::locale。

comparator<char,collator_base::secondary> cmpr;
cout << (cmpr(str1, str2) ? "str1 < str2" : "str1 >= str2") << endl;

Primary -- ignore accents and character case, comparing base letters only. For example "facade" and "Fa?ade" are the same.
Secondary -- ignore character case but consider accents. "facade" and "fa?ade" are different but "Fa?ade" and "fa?ade" are the same.
Tertiary -- consider both case and accents: "Fa?ade" and "fa?ade" are different. Ignore punctuation.
Quaternary -- consider all case, accents, and punctuation. The words must be identical in terms of Unicode representation.
Identical -- as quaternary, but compare code points as well.

主要 - 忽略重音和字符大小写，仅比较基本字母。例如“facade”和“Fa?ade”是一样的。
次要 - 忽略字符大小写，但考虑重音。“facade”和“fa?ade”不同但“Fa?ade”和“fa?ade”是一样的。
第三级——同时考虑大小写和重音：“Fa?ade”和“fa?ade”是不同的。忽略标点符号。
Quaternary - 考虑所有大小写、重音和标点符号。这些词在 Unicode 表示方面必须相同。
相同 - 作为四元，但也比较代码点。

Answer 7

回答by Shadow2531

My first thought for a non-unicode version was to do something like this:

我对非 unicode 版本的第一个想法是做这样的事情：


bool caseInsensitiveStringCompare(const string& str1, const string& str2) {
    if (str1.size() != str2.size()) {
        return false;
    }
    for (string::const_iterator c1 = str1.begin(), c2 = str2.begin(); c1 != str1.end(); ++c1, ++c2) {
        if (tolower(*c1) != tolower(*c2)) {
            return false;
        }
    }
    return true;
}

Answer 8

回答by bradtgmurray

You can use strcasecmpon Unix, or stricmpon Windows.

您可以strcasecmp在 Unix 或stricmpWindows 上使用。

One thing that hasn't been mentioned so far is that if you are using stl strings with these methods, it's useful to first compare the length of the two strings, since this information is already available to you in the string class. This could prevent doing the costly string comparison if the two strings you are comparing aren't even the same length in the first place.

到目前为止还没有提到的一件事是，如果您将 stl 字符串与这些方法一起使用，首先比较两个字符串的长度会很有用，因为您已经可以在字符串类中获得这些信息。如果您要比较的两个字符串一开始甚至不相同，这可以防止进行昂贵的字符串比较。

Answer 9

回答by Darren Kopp

Visual C++ string functions supporting unicode: http://msdn.microsoft.com/en-us/library/cc194799.aspx

支持 unicode 的 Visual C++ 字符串函数：http: //msdn.microsoft.com/en-us/library/cc194799.aspx

the one you are probably looking for is _wcsnicmp

您可能正在寻找的是 _wcsnicmp

Answer 10

回答by Adam

I'm trying to cobble together a good answer from all the posts, so help me edit this:

我试图从所有帖子中拼凑出一个好的答案，所以请帮我编辑：

Here is a method of doing this, although it does transforming the strings, and is not Unicode friendly, it should be portable which is a plus:

这是一种这样做的方法，虽然它确实转换了字符串，并且不是 Unicode 友好的，但它应该是可移植的，这是一个优点：

bool caseInsensitiveStringCompare( const std::string& str1, const std::string& str2 ) {
    std::string str1Cpy( str1 );
    std::string str2Cpy( str2 );
    std::transform( str1Cpy.begin(), str1Cpy.end(), str1Cpy.begin(), ::tolower );
    std::transform( str2Cpy.begin(), str2Cpy.end(), str2Cpy.begin(), ::tolower );
    return ( str1Cpy == str2Cpy );
}

From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors.

从我读到的内容来看，这比 stricmp() 更可移植，因为 stricmp() 实际上不是 std 库的一部分，而仅由大多数编译器供应商实现。

To get a truly Unicode friendly implementation it appears you must go outside the std library. One good 3rd party library is the IBM ICU (International Components for Unicode)

要获得真正的 Unicode 友好实现，您似乎必须脱离 std 库。一个不错的 3rd 方库是IBM ICU（Unicode 国际组件）

Also boost::iequalsprovides a fairly good utility for doing this sort of comparison.

另外的boost :: iequals提供了做这种比较的一个相当不错的实用性。

C++中不区分大小写的字符串比较

提问by Adam

采纳答案by Rob

回答by wilhelmtell

回答by Timmmm

回答by Derek Park

回答by Coincoin

回答by Igor Milyakov

回答by Shadow2531

回答by bradtgmurray

回答by Darren Kopp

回答by Adam

相关推荐

最近更新

标签

C++中不区分大小写的字符串比较

提问by Adam

采纳答案by Rob

回答by wilhelmtell

回答by Timmmm

回答by Derek Park

回答by Coincoin

回答by Igor Milyakov

回答by Shadow2531

回答by bradtgmurray

回答by Darren Kopp

回答by Adam

相关推荐

Ajax/jQuery - 在页面加载时将网页内容加载到 div 中？

仅使用 jQuery 添加一次 OnClick 处理程序 - 即使再次调用

如何使用 jQuery 或 JavaScript 在 JSON 中添加新对象？

jQuery 更改页面滚动上的活动菜单项？

相关推荐

最近更新

标签