如何在 C++ 中有效地检查字符串是否具有特殊字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6605282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 20:24:56  来源:igfitidea点击:

How can I check if a string has special characters in C++ effectively?

c++stringwhitelistc-strings

提问by Praveen

I am trying to find if there is better way to check if the string has special characters. In my case, anything other than alphanumeric and a '_' is considered a special character. Currently, I have a string that contains special characters such as std::string = "!@#$%^&". I then use the std::find_first_of () algorithm to check if any of the special characters are present in the string.

我试图找到是否有更好的方法来检查字符串是否包含特殊字符。就我而言,除字母数字和“_”以外的任何内容都被视为特殊字符。目前,我有一个包含特殊字符的字符串,例如 std::string = "!@#$%^&"。然后我使用 std::find_first_of () 算法来检查字符串中是否存在任何特殊字符。

I was wondering how to do it based on whitelisting. I want to specify the lowercase/uppercase characters, numbers and an underscore in a string ( I don't want to list them. Is there any way I can specify the ascii range of some sort like [a-zA-Z0-9_]). How can I achieve this? Then I plan to use the std::find_first_not_of(). In this way I can mention what I actually want and check for the opposite.

我想知道如何基于白名单来做到这一点。我想在字符串中指定小写/大写字符、数字和下划线(我不想列出它们。有什么办法可以指定某种类型的 ascii 范围,例如 [a-zA-Z0-9_] )。我怎样才能做到这一点?然后我打算使用 std::find_first_not_of()。通过这种方式,我可以提及我真正想要的内容并检查相反的内容。

回答by Martin York

Try:

尝试:

std::string  x(/*Load*/);
if (x.find_first_not_of("abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890_") != std::string::npos)
{
    std::cerr << "Error\n";
}

Or try boost regular expressions:

或者尝试提升正则表达式:

// Note: \w matches any word character `alphanumeric plus "_"`
boost::regex test("\w+", re,boost::regex::perl);
if (!boost::regex_match(x.begin(), x.end(), test)
{
    std::cerr << "Error\n";
}

// The equivalent to \w should be:
boost::regex test("[A-Za-z0-9_]+", re,boost::regex::perl);   

回答by Jerry Coffin

I think I'd do the job just a bit differently, treating the std::stringas a collection, and using an algorithm. Using a C++0x lambda, it would look something like this:

我想我会以稍微不同的方式完成这项工作,将std::string视为一个集合,并使用算法。使用 C++0x lambda,它看起来像这样:

bool has_special_char(std::string const &str) {
    return std::find_if(str.begin(), str.end(),
        [](char ch) { return !(isalnum(ch) || ch == '_'); }) != str.end();
}

At least when you're dealing with char(not wchar_t), isalnumwill typically use a table look up, so it'll usually be (quite a bit) faster than anything based on find_first_of(which will normally use a linear search instead). IOW, this is O(N) (N=str.size()), where something based on find_first_ofwill be O(N*M), (N=str.size(), M=pattern.size()).

至少当您处理char(not wchar_t) 时,isalnum通常会使用表查找,因此它通常会(相当)比基于的任何内容find_first_of(通常会使用线性搜索)快。IOW,这是 O(N) (N=str.size()),其中基于的东西find_first_of将是 O(N*M), (N=str.size(), M=pattern.size())。

If you want to do the job with pure C, you can use scanfwith a scanset conversion that's theoretically non-portable, but supported by essentially all recent/popular compilers:

如果你想用纯 C 来完成这项工作,你可以使用scanf理论上不可移植的扫描集转换,但基本上所有最近/流行的编译器都支持:

char junk;
if (sscanf(str, "%*[A-Za-z0-9_]%c", &junk))
    /* it has at least one "special" character
else
    /* no special characters */

The basic idea here is pretty simple: the scanset skips across all consecutive non-special characters (but doesn't assign the result to anything, because of the *), then we try to read one more character. If that succeeds, it means there was at least one character that was notskipped, so we must have at least one special character. If it fails, it means the scanset conversion matched the whole string, so all the characters were "non-special".

这里的基本思想非常简单:扫描集跳过所有连续的非特殊字符(但不会将结果分配给任何东西,因为*),然后我们尝试再读取一个字符。如果成功,这意味着有是在至少一个字符不能跳过,所以我们必须至少有一个特殊字符。如果失败,则意味着扫描集转换匹配整个字符串,因此所有字符都是“非特殊”字符。

Officially, the C standard says that trying to put a range in a scanset conversion like this isn't portable (a '-' anywhere but the beginning or end of the scanset gives implementation defined behavior). There have even been a few compilers (from Borland) that would fail for this -- they would treat A-Zas matching exactly three possible characters, 'A', '-' and 'Z'. Most current compilers (or, more accurately, standard library implementations) take the approach this assumes: "A-Z" matches any upper-case character.

正式地,C 标准表示,尝试将范围放入这样的扫描集转换中是不可移植的(“-”除了扫描集的开头或结尾之外的任何地方都给出了实现定义的行为)。甚至有一些编译器(来自 Borland)会因为这个而失败——他们会认为A-Z匹配三个可能的字符,'A'、'-' 和 'Z'。大多数当前的编译器(或更准确地说,标准库实现)采用这种假设的方法:“AZ”匹配任何大写字符。

回答by Adam Rosenfield

There's no way using standard C or C++ to do that using character ranges, you have to list out all of the characters. For C strings, you can use strspn(3)and strcspn(3)to find the first character in a string that is a member of or is not a member of a given character set. For example:

使用标准 C 或 C++ 无法使用字符范围来做到这一点,您必须列出所有字符。对于 C 字符串,您可以使用strspn(3)strcspn(3)查找字符串中属于或不是给定字符集成员的第一个字符。例如:

// Test if the given string has anything not in A-Za-z0-9_
bool HasSpecialCharacters(const char *str)
{
    return str[strspn(str, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_")] != 0;
}

For C++ strings, you can equivalently use the find_first_ofand find_first_not_ofmember functions.

对于 C++ 字符串,您可以等效地使用find_first_offind_first_not_of成员函数。

Another option is to use the isalnum(3)and related functionsfrom the <ctype.h>to test if a given character is alphanumeric or not; note that these functions are locale-dependent, so their behavior can (and does) change in other locales. If you do not want that behavior, then don't use them. If you do choose to use them, you'll have to also test for underscores separately, since there's no function that tests "alphabetic, numeric, or underscore", and you'll also have to code your own loop to search the string (or use std::findwith an appropriate function object).

另一种选择是使用isalnum(3)和相关的功能<ctype.h>测试如果给定的字符是字母数字与否; 请注意,这些函数依赖于语言环境,因此它们的行为可以(并且确实)在其他语言环境中发生变化。如果您不想要这种行为,请不要使用它们。如果您选择使用它们,您还必须单独测试下划线,因为没有测试“字母、数字或下划线”的函数,您还必须编写自己的循环来搜索字符串(或std::find与适当的函数对象一起使用)。

回答by feathj

The first thing that you need to consider is "is this ASCII only"? If you answer is yes, I would encourage you to really consider whether or not you should allow ASCII only. I currently work for a company that is really having some headaches getting into foreign markets because we didn't think to support unicode from the get-go.

您需要考虑的第一件事是“仅此 ASCII 码”?如果您的回答是肯定的,我会鼓励您真正考虑是否应该只允许使用 ASCII。我目前在一家公司工作,该公司在进入国外市场时确实有些头疼,因为我们从一开始就没有考虑支持 unicode。

That being said, ASCII makes it really easy to check for non alpha numerics. Take a look at the ascii chart.

话虽如此,ASCII 使检查非字母数字变得非常容易。看一下 ascii 图表。

http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters

http://en.wikipedia.org/wiki/ASCII#ASCII_printable_characters

  • Iterate through each character
  • Check if the character is decimal value 48 - 57, 65 - 90, 97 - 122, or 95 (underscore)
  • 遍历每个字符
  • 检查字符是否为十进制值 48 - 57、65 - 90、97 - 122 或 95(下划线)

回答by Jonathan Leffler

The functions (macros) are subject to locale settings, but you should investigate isalnum()and relatives from <ctype.h>or <cctype>.

函数(宏)受语言环境设置的影响,但您应该调查isalnum()和来自<ctype.h>或 的亲属<cctype>

回答by Mark B

I would just use the built-in C facility here. Iterate over each character in the string and check if it's _or if isalpha(ch)is true. If so then it's valid, otherwise it's a special character.

我只想在这里使用内置的 C 工具。迭代字符串中的每个字符并检查它_是否isalpha(ch)为真。如果是,那么它是有效的,否则它是一个特殊字符。

回答by Tony Delroy

If you want this, but don't want to go the whole hog and use regexps, and given you're test is for ASCII chars - just create a function to generate the string for find_first_not_of...

如果你想要这个,但不想全力以赴并使用正则表达式,并且假设你测试的是 ASCII 字符 - 只需创建一个函数来生成find_first_not_of......

#include <iostream>
#include <string>

std::string expand(const char* p)
{
    std::string result;
    while (*p)
        if (p[1] == '-' && p[2])
        {
            for (int c = p[0]; c <= p[2]; ++c)
                result += (char)c;
            p += 3;
        }
        else
            result += *p++;
    return result;
}

int main()
{
    std::cout << expand("A-Za-z0-9_") << '\n';
}

回答by Bhavya Agarwal

Using

使用

    s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());

    bool my_predicate(char c)
    {
     return !(isalpha(c) || c=='_');
    }

will get you a clean string s.

会给你一个干净的字符串s

Erase will strip it off all the special characters and is highly customisable with the my_predicatefunction.

擦除将去除所有特殊字符,并且该my_predicate功能可高度自定义。