C++ 如何从C++中的字符串中去除所有非字母数字字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6319872/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 19:54:29  来源:igfitidea点击:

How to strip all non alphanumeric characters from a string in c++?

c++stringlibcurlstripalphanumeric

提问by Austin Witherspoon

I am writing a piece of software, and It require me to handle data I get from a webpage with libcurl. When I get the data, for some reason it has extra line breaks in it. I need to figure out a way to only allow letters, numbers, and spaces. And remove everything else, including line breaks. Is there any easy way to do this? Thanks.

我正在编写一个软件,它要求我使用 libcurl 处理从网页获取的数据。当我获得数据时,由于某种原因,它有额外的换行符。我需要想办法只允许字母、数字和空格。并删除其他所有内容,包括换行符。有什么简单的方法可以做到这一点吗?谢谢。

回答by James McNellis

Write a function that takes a charand returns trueif you want to remove that character or falseif you want to keep it:

编写一个接受 achar并返回的函数,true如果您想删除该字符或false要保留它:

bool my_predicate(char c);

Then use the std::remove_ifalgorithm to remove the unwanted characters from the string:

然后使用std::remove_if算法从字符串中删除不需要的字符:

std::string s = "my data";
s.erase(std::remove_if(s.begin(), s.end(), my_predicate), s.end());

Depending on your requirements, you may be able to use one of the Standard Library predicates, like std::isalnum, instead of writing your own predicate (you said you needed to match alphanumeric characters and spaces, so perhaps this doesn't exactly fit what you need).

根据您的要求,您可以使用标准库谓词之一,例如std::isalnum,而不是编写自己的谓词(您说您需要匹配字母数字字符和空格,所以这可能不完全符合您的需要) .

If you want to use the Standard Library std::isalnumfunction, you will need a cast to disambiguate between the std::isalnumfunction in the C Standard Library header <cctype>(which is the one you want to use) and the std::isalnumin the C++ Standard Library header <locale>(which is not the one you want to use, unless you want to perform locale-specific string processing):

如果你想使用标准库std::isalnum函数,你将需要之间铸造的歧义std::isalnum在C标准库头功能<cctype>(这是您要使用的)和std::isalnum在C ++标准库头<locale>(这是不是一个您想使用,除非您想执行特定于语言环境的字符串处理):

s.erase(std::remove_if(s.begin(), s.end(), (int(*)(int))std::isalnum), s.end());

This works equally well with any of the sequence containers (including std::string, std::vectorand std::deque). This idiom is commonly referred to as the "erase/remove" idiom. The std::remove_ifalgorithm will also work with ordinary arrays. The std::remove_ifmakes only a single pass over the sequence, so it has linear time complexity.

这同样适用于任何序列容器(包括std::stringstd::vectorstd::deque)。此习语通常称为“擦除/删除”习语。该std::remove_if算法也适用于普通数组。在std::remove_if使得仅在序列的单通,所以它有线性时间复杂度。

回答by Dado

Previous uses of std::isalnumwon't compile with std::ptr_funwithout passing the unaryargument is requires, hence this solution with a lambda function should encapsulate the correct answer:

以前使用的std::isalnum不会在std::ptr_fun不传递一元参数的情况下编译是需要的,因此这个带有 lambda 函数的解决方案应该封装正确的答案:

s.erase(std::remove_if(s.begin(), s.end(), 
[]( auto const& c ) -> bool { return !std::isalnum(c); } ), s.end());

回答by Seth Carnegie

You could always loop through and just eraseall non alphanumeric characters if you're using string.

你可以通过始终循环只是erase,如果你正在使用的所有非字母数字字符string

#include <cctype>

size_t i = 0;
size_t len = str.length();
while(i < len){
    if (!isalnum(str[i]) || str[i] == ' '){
        str.erase(i,1);
        len--;
    }else
        i++;
}

Someone better with the Standard Lib can probably do this without a loop.

更好地使用标准库的人可能可以在没有循环的情况下做到这一点。

If you're using just a charbuffer, you can loop through and if a character is not alphanumeric, shift all the characters after it backwards one (to overwrite the offending character):

如果您只使用char缓冲区,则可以循环遍历,如果字符不是字母数字,则将其后的所有字符向后移动一个(以覆盖有问题的字符):

#include <cctype>

size_t buflen = something;
for (size_t i = 0; i < buflen; ++i)
    if (!isalnum(buf[i]) || buf[i] != ' ')
        memcpy(buf[i], buf[i + 1], --buflen - i);

回答by Eugen Constantin Dinca

The remove_copy_ifstandard algorithm would be very appropriate for your case.

remove_copy_if标准算法将非常适合您的情况。

回答by TankorSmash

#include <cctype>
#include <string>
#include <functional>

std::string s = "Hello World!";
s.erase(std::remove_if(s.begin(), s.end(),
    std::not1(std::ptr_fun(std::isalnum)), s.end()), s.end());
std::cout << s << std::endl;

Results in:

结果是:

"HelloWorld"

You use isalnumto determine whether or not each character is alpha numeric, then use ptr_funto pass the function to not1which NOTs the returned value, leaving you with only the alphanumeric stuff you want.

您用于isalnum确定每个字符是否为字母数字,然后用于将返回值非ptr_fun的函数传递给not1您,只留下您想要的字母数字内容。

回答by Ali Eren ?elik

Just extending James McNellis's code a little bit more. His function is deleting alnum characters instead of non-alnum ones.

只是稍微扩展了 James McNellis 的代码。他的功能是删除alnum字符而不是非alnum字符。

To delete non-alnum characters from a string. (alnum = alphabetical or numeric)

从字符串中删除非 anum 字符。(alnum = 字母或数字)

  • Declare a function (isalnum returns 0 if passed char is not alnum)

    bool isNotAlnum(char c) {
        return isalnum(c) == 0;
    }
    
  • And then write this

    s.erase(remove_if(s.begin(), s.end(), isNotAlnum), s.end());
    
  • 声明一个函数(如果传入的字符不是 alnum,则 isalnum 返回 0)

    bool isNotAlnum(char c) {
        return isalnum(c) == 0;
    }
    
  • 然后写这个

    s.erase(remove_if(s.begin(), s.end(), isNotAlnum), s.end());
    

then your string is only with alnum characters.

那么你的字符串只有 alnum 字符。

回答by Dhruv Kakadiya

Below code should work just fine for given string s. It's utilizing <algorithm>and <locale>libraries.

下面的代码对于给定的 string 应该可以正常工作s。它正在利用<algorithm><locale>图书馆。

std::string s("He!!llo  Wo,@rld! 12 453");
s.erase(std::remove_if(s.begin(), s.end(), [](char c) { return !std::isalnum(c); }), s.end());

回答by Andres Hurtis

The mentioned solution

提到的解决方案

s.erase( std::remove_if(s.begin(), s.end(), &std::ispunct), s.end());

is very nice, but unfortunately doesn't work with characters like '?' in Visual Studio (debug mode), because of this line:

非常好,但不幸的是不适用于像“?”这样的字符 在 Visual Studio(调试模式)中,因为这一行:

_ASSERTE((unsigned)(c + 1) <= 256)

in isctype.c

在 isctype.c 中

So, I would recommend something like this:

所以,我会推荐这样的东西:

inline int my_ispunct( int ch )
{
    return std::ispunct(unsigned char(ch));
}
...
s.erase( std::remove_if(s.begin(), s.end(), &my_ispunct), s.end());

回答by akritaag

You can use the remove-erase algorithm this way -

您可以通过这种方式使用删除擦除算法 -

// Removes all punctuation       
s.erase( std::remove_if(s.begin(), s.end(), &ispunct), s.end());

回答by Pabitra Dash

The following works for me.

以下对我有用。

str.erase(std::remove_if(str.begin(), str.end(), &ispunct), str.end());
str.erase(std::remove_if(str.begin(), str.end(), &isspace), str.end());