删除 C++ 中多余的空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35301432/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 14:30:39  来源:igfitidea点击:

Remove extra white spaces in C++

c++stringalgorithm

提问by Damian

I tried to write a script that removes extra white spaces but I didn't manage to finish it.

我试图编写一个删除多余空格的脚本,但我没能完成它。

Basically I want to transform abc sssd g g sdg gg gfinto abc sssd g g sdg gg gf.

基本上我想转换abc sssd g g sdg gg gfabc sssd g g sdg gg gf.

In languages like PHP or C#, it would be very easy, but not in C++, I see. This is my code:

在像 PHP 或 C# 这样的语言中,这会很容易,但在 C++ 中,我明白了。这是我的代码:

#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <cstring>
#include <unistd.h>
#include <string.h>

char* trim3(char* s) {
    int l = strlen(s);

    while(isspace(s[l - 1])) --l;
    while(* s && isspace(* s)) ++s, --l;

    return strndup(s, l);
}

char *str_replace(char * t1, char * t2, char * t6)
{
    char*t4;
    char*t5=(char *)malloc(10);
    memset(t5, 0, 10);
    while(strstr(t6,t1))
    {
        t4=strstr(t6,t1);
        strncpy(t5+strlen(t5),t6,t4-t6);
        strcat(t5,t2);
        t4+=strlen(t1);
        t6=t4;
    }

    return strcat(t5,t4);
}

void remove_extra_whitespaces(char* input,char* output)
{
    char* inputPtr = input; // init inputPtr always at the last moment.
    int spacecount = 0;
    while(*inputPtr != '
#include <cstdio>

void remove_extra_whitespaces(char* input, char* output)
{
    int inputIndex = 0;
    int outputIndex = 0;
    while(input[inputIndex] != '
input: asfa sas    f f dgdgd  dg   ggg
output: asfa sas f f dgdgd dg ggg
') { output[outputIndex] = input[inputIndex]; if(input[inputIndex] == ' ') { while(input[inputIndex + 1] == ' ') { // skip over any extra spaces inputIndex++; } } outputIndex++; inputIndex++; } // null-terminate output output[outputIndex] = '
void remove_extra_whitespaces(const string &input, string &output)
{
    output.clear();  // unless you want to add at the end of existing sring...
    unique_copy (input.begin(), input.end(), back_insert_iterator<string>(output),
                                     [](char a,char b){ return isspace(a) && isspace(b);});  
    cout << output<<endl; 
}
'; } int main(int argc, char **argv) { char input[0x255] = "asfa sas f f dgdgd dg ggg"; char output[0x255] = "NO_OUTPUT_YET"; remove_extra_whitespaces(input,output); printf("input: %s\noutput: %s\n", input, output); return 1; }
') { char* substr; strncpy(substr, inputPtr+0, 1); if(substr == " ") { spacecount++; } else { spacecount = 0; } printf("[%p] -> %d\n",*substr,spacecount); // Assume the string last with
std::istringstream stream(input);
// some code inputPtr++; // After "some code" (instead of what you wrote). } } int main(int argc, char **argv) { printf("testing 2 ..\n"); char input[0x255] = "asfa sas f f dgdgd dg ggg"; char output[0x255] = "NO_OUTPUT_YET"; remove_extra_whitespaces(input,output); return 1; }

It doesn't work. I tried several methods. What I am trying to do is to iterate the string letter by letter and dump it in another string as long as there is only one space in a row; if there are two spaces, don't write the second character to the new string.

它不起作用。我尝试了几种方法。我想要做的是逐个字母地迭代字符串并将其转储到另一个字符串中,只要一行中只有一个空格;如果有两个空格,则不要将第二个字符写入新字符串。

How can I solve this?

我该如何解决这个问题?

采纳答案by villapx

Here's a simple, non-C++11 solution, using the same remove_extra_whitespace()signature as in the question:

这是一个简单的非 C++11 解决方案,使用与问题中相同的remove_extra_whitespace()签名:

std::string word;
while (stream >> word)
{
    ...
}

Output:

输出:

    if (!output.empty()) // special case: no space before first word
        output += ' ';
    output += word;

回答by Christophe

There are already plenty of nice solutions. I propose you an alternative based on a dedicated <algorithm>meant to avoid consecutive duplicates: unique_copy():

已经有很多不错的解决方案了。我建议您基于专用的替代方案,<algorithm>以避免连续重复 unique_copy()::

#include <string>
#include <iostream>
#include <algorithm>
#include <cctype>

int main()
{
    std::string input {"asfa sas    f f dgdgd  dg   ggg"};
    bool prev_is_space = true;
    input.erase(std::remove_if(input.begin(), input.end(), [&prev_is_space](unsigned char curr) {
        bool r = std::isspace(curr) && prev_is_space;
        prev_is_space = std::isspace(curr);
        return r;

    }), input.end());

    std::cout << input << "\n";
}

Here is a live demo. Note that I changed from c style strings to the safer and more powerful C++ strings.

这是一个现场演示请注意,我从 c 样式字符串更改为更安全、更强大的 C++ 字符串。

Edit:if keeping c-style strings is required in your code, you could use almost the same code but with pointers instead of iterators. That's the magic of C++. Here is another live demo.

编辑:如果在您的代码中需要保留 c 样式的字符串,您可以使用几乎相同的代码,但使用指针而不是迭代器。这就是 C++ 的魅力所在。这是另一个现场演示

回答by anatolyg

Since you use C++, you can take advantage of standard-library features designed for that sort of work. You could use std::string(instead of char[0x255]) and std::istringstream, which will replace most of the pointer arithmetic.

由于您使用 C++,因此您可以利用专为此类工作设计的标准库功能。您可以使用std::string(而不是char[0x255]) and std::istringstream,它将替换大部分指针算法。

First, make a string stream:

首先,创建一个字符串流:

void erase(char * p) {
    // note that this ony works good when initial array is allocated in the static array
    // so we do not need to rearrange memory
    *p = 0; 
}

int main()
{
    char input [] {"asfa sas    f f dgdgd  dg   ggg"};
    bool prev_is_space = true;
    erase(std::remove_if(std::begin(input), std::end(input), [&prev_is_space](unsigned char curr) {
        bool r = std::isspace(curr) && prev_is_space;
        prev_is_space = std::isspace(curr);
        return r;

    }));

    std::cout << input << "\n";
}

Then, read strings from it. It will remove the whitespace delimiters automatically:

然后,从中读取字符串。它将自动删除空格分隔符:

#include <algorithm>
#include <string>
#include <iostream>

struct if_not_prev_space
{
    // Is last encountered character space.
    bool m_is = false;

    bool operator()(const char c)
    {                                      
        // Copy if last was not space, or current is not space.                                                                                                                                                              
        const bool ret = !m_is || c != ' ';
        m_is = c == ' ';
        return ret;
    }
};


int main()
{
    const std::string s("abc  sssd g g sdg    gg  gf into abc sssd g g sdg gg gf");
    std::string o;
    std::copy_if(std::begin(s), std::end(s), std::back_inserter(o), if_not_prev_space());
    std::cout << o << std::endl;
}

Inside the loop, build your output string:

在循环内,构建您的输出字符串:

void remove_extra_whitespaces(char* input, char* output)
{
    int srcOffs = 0, destOffs = 0, numRead = 0;

    while(sscanf(input + srcOffs, "%s%n", output + destOffs, &numRead) > 0)
    {
        srcOffs += numRead;
        destOffs += strlen(output + destOffs);
        output[destOffs++] = ' '; // overwrite 0, advance past that
    }
    output[destOffs > 0 ? destOffs-1 : 0] = '
inline std::string& remove_extra_ws_mute(std::string& s)
{
    s.erase(std::unique(std::begin(s), std::end(s), [](unsigned char a, unsigned char b){
        return std::isspace(a) && std::isspace(b);
    }), std::end(s));

    return s;
}

inline std::string remove_extra_ws_copy(std::string s)
{
    return remove_extra_ws_mute(s);
}
'; }

A disadvantage of this method is that it allocates memory dynamically (including several reallocations, performed when the output string grows).

这种方法的一个缺点是它动态分配内存(包括在输出字符串增长时执行的几次重新分配)。

回答by Lol4t0

for in-place modification you can apply erase-remove technic:

对于就地修改,您可以应用擦除删除技术:

char* remove_extra_ws(char const* s)
{
    std::size_t len = std::strlen(s);

    char* buf = new char[len + 1];
    std::strcpy(buf, s);

    // Note that std::unique will also retain the null terminator
    // in its correct position at the end of the valid portion
    // of the string    
    std::unique(buf, buf + len + 1, [](unsigned char a, unsigned char b){
        return (a && std::isspace(a)) && (b && std::isspace(b));
    });

    return buf;
}

So you first move all extra spaces to the end of the string and then truncate it.

因此,您首先将所有多余的空格移动到字符串的末尾,然后将其截断。



The great advantage of C++ is that is universal enough to port your code to plain-c-static strings with only fewmodifications:

C++ 的巨大优势在于它的通用性足以将您的代码移植到纯 c-static 字符串中,只需少量修改:

char temp[] = " alsdasdl   gasdasd  ee";
remove_whitesaces(temp);
printf("%s\n", temp);

int remove_whitesaces(char *p)
{
    int len = strlen(p);
    int new_len = 0;
    bool space = false;

    for (int i = 0; i < len; i++)
    {
        switch (p[i])
        {
        case ' ': space = true;  break;
        case '\t': space = true;  break;
        case '\n': break; // you could set space true for \r and \n
        case '\r': break; // if you consider them spaces, I just ignore them.
        default:
            if (space && new_len > 0)
                p[new_len++] = ' ';
            p[new_len++] = p[i];
            space = false;
        }
    }

    p[new_len] = '
inline string to_string(int _Val)
    {   // convert int to string
    return (_Integral_to_string("%d", _Val));
    }

inline string to_string(unsigned int _Val)
    {   // convert unsigned int to string
    return (_Integral_to_string("%u", _Val));
    }
'; return new_len; } // and you can use it with strings too, inline int remove_whitesaces(std::string &str) { int len = remove_whitesaces(&str[0]); str.resize(len); return len; // returning len for consistency with the primary function // but u can return std::string instead. } // again no memory allocation is gonna take place, // since resize does not not free memory because the length is either equal or lower

Interesting enough removestep here is string-representation independent. It will work with std::stringwithout modifications at all.

remove这里足够有趣的步骤是独立于字符串表示的。它将在std::string没有任何修改的情况下工作。

回答by Ami Tavory

There are plenty of ways of doing this (e.g., using regular expressions), but one way you could do this is using std::copy_ifwith a stateful functor remembering whether the last character was a space:

有很多方法可以做到这一点(例如,使用正则表达式),但是你可以做到这一点的一种方法是使用std::copy_if一个有状态的函子来记住最后一个字符是否是一个空格:

template<class _Ty> inline
    string _Integral_to_string(const char *_Fmt, _Ty _Val)
    {   // convert _Ty to string
    static_assert(is_integral<_Ty>::value,
        "_Ty must be integral");
    char _Buf[_TO_STRING_BUF_SIZE];
    int _Len = _CSTD sprintf_s(_Buf, _TO_STRING_BUF_SIZE, _Fmt, _Val);
    return (string(_Buf, _Len));
    }

回答by Peter - Reinstate Monica

I have the sinking feeling that good ol' scanf will do (in fact, this is the C school equivalent to Anatoly's C++ solution):

我有一种下沉的感觉,好的 ol' scanf 会做(事实上,这是 C 学校相当于 Anatoly 的 C++ 解决方案):

#include <iostream>
#include <string>
using namespace std;
void removeExtraSpace(string str);
int main(){
    string s;
    cout << "Enter a string with extra spaces: ";
    getline(cin, s);
    removeExtraSpace(s);
    return 0;
}
void removeExtraSpace(string str){
    int len = str.size();
    if(len==0){
        cout << "Simplified String: " << endl;
        cout << "I would appreciate it if you could enter more than 0 characters. " << endl;
        return;
    }
    char ch1[len];
    char ch2[len];
    //Placing characters of str in ch1[]
    for(int i=0; i<len; i++){
        ch1[i]=str[i];
    }
    //Computing index of 1st non-space character
    int pos=0;
    for(int i=0; i<len; i++){
        if(ch1[i] != ' '){
            pos = i;
            break;
        }
    }
    int cons_arr = 1;
    ch2[0] = ch1[pos];
    for(int i=(pos+1); i<len; i++){
        char x = ch1[i];
        if(x==char(32)){
            //Checking whether character at ch2[i]==' '
            if(ch2[cons_arr-1] == ' '){
                continue;
            }
            else{
                ch2[cons_arr] = ' ';
                cons_arr++;
                continue;
            }
        }
        ch2[cons_arr] = x;
        cons_arr++;
    }
    //Printing the char array
    cout << "Simplified string: " << endl;
    for(int i=0; i<cons_arr; i++){
        cout << ch2[i];
    }
    cout << endl;
}

We exploit the fact that scanfhas magical built-in space skipping capabilities. We then use the perhaps less known %n"conversion" specification which gives us the amount of chars consumed by scanf. This feature frequently comes in handy when reading from strings, like here. The bitter drop which makes this solution less-than-perfect is the strlencall on the output (there is no "how many bytes have I actually just written" conversion specifier, unfortunately).

我们利用scanf具有神奇的内置空格跳过功能的事实。然后,我们使用可能鲜为人知的%n“转换”规范,它为我们提供了scanf. 从字符串中读取时,此功能经常派上用场,例如此处。使这个解决方案不那么完美的苦涩是strlen对输出的调用(不幸的是,没有“我实际上刚了多少字节”转换说明符)。

Last not least use of scanf is easy here because sufficient memory is guaranteed to exist at output; if that were not the case, the code would become more complex due to buffering and overflow handling.

最后同样重要的是,在这里使用 scanf 很容易,因为保证有足够的内存output;如果不是这种情况,由于缓冲和溢出处理,代码将变得更加复杂。

回答by Galik

You can use std::uniquewhich reduces adjacent duplicates to a single instance according to how you define what makes two elements equal is.

您可以使用std::unique,它根据您如何定义使两个元素相等的方式将相邻的重复项减少到单个实例。

Here I have defined elements as equal if they are both whitespacecharacters:

在这里,如果元素都是空白字符,我将元素定义为相等:

    strProductName = string((LPCSTR)pvProductName, iProductNameLen)

std::uniquemoves the duplicates to the end of the string and returns an iterator to the beginning of them so they can be erased.

std::unique将重复项移动到字符串的末尾,并将迭代器返回到它们的开头,以便可以擦除它们。

Additionally, if you mustwork with low level strings then you can still use std::uniqueon the pointers:

此外,如果您必须使用低级字符串,那么您仍然可以在指针上使用std::unique

##代码##

回答by Jts

Since you are writing c-style, here's a way to do what you want. Note that you can remove '\r'and '\n'which are line breaks (but of course that's up to you if you consider those whitespaces or not).

由于您正在编写 c 风格,这里有一种方法可以做您想做的事。请注意,您可以删除'\r''\n'哪些是换行符(当然,这取决于您是否考虑这些空格)。

This function should be as fast or faster than any other alternative and no memory allocation takes place even when it's called with std::strings (I've overloaded it).

这个函数应该和其他任何替代方法一样快或快,并且即使使用 std::strings 调用它也不会发生内存分配(我已经重载了它)。

##代码##

If you take a brief look at the C++ Standard library, you will notice that a lot C++ functions that return std::string, or other std::objects are basically a wrapper to a well written extern "C" function. So don't be afraid to use C functions in C++ applications, if they are well written and you can overload them to support std::strings and such.

如果您简要查看 C++ 标准库,您会注意到许多返回 std::string 或其他 std::objects 的 C++ 函数基本上是编写良好的 extern "C" 函数的包装器。所以不要害怕在 C++ 应用程序中使用 C 函数,如果它们编写得很好并且你可以重载它们以支持 std::strings 等。

For example, in Visual Studio 2015, std::to_stringis written exactly like this:

例如,在 Visual Studio 2015 中,std::to_string完全是这样写的:

##代码##

and _Integral_to_string is a wrapper to a C function sprintf_s

和 _Integral_to_string 是 C 函数 sprintf_s 的包装器

##代码##

回答by Hans

Well here is a longish(but easy) solution that does not use pointers. It can be optimized further but hey it works.

那么这是一个不使用指针的冗长(但简单)的解决方案。它可以进一步优化,但嘿它的工作原理。

##代码##

回答by Jan

I ended up here for a slighly different problem. Since I don't know where else to put it, and I found out what was wrong, I share it here. Don't be cross with me, please. I had some strings that would print additional spaces at their ends, while showing up without spaces in debugging. The strings where formed in windows calls like VerQueryValue(), which besides other stuff outputs a string length, as e.g. iProductNameLen in the following line converting the result to a string named strProductName:

我最终在这里解决了一个稍微不同的问题。由于我不知道还能把它放在哪里,而且我发现了问题所在,所以我在这里分享。请不要生我的气。我有一些字符串会在它们的末端打印额外的空格,而在调试时显示没有空格。在 Windows 中形成的字符串调用类似于 VerQueryValue(),除了其他东西之外,它还会输出一个字符串长度,例如下一行中的 iProductNameLen 将结果转换为名为 strProductName 的字符串:

##代码##

then produced a string with a \0 byte at the end, which did not show easily in de debugger, but printed on screen as a space. I'll leave the solution of this as an excercise, since it is not hard at all, once you are aware of this.

然后在末尾生成一个带有 \0 字节的字符串,该字符串在调试器中不容易显示,而是作为空格打印在屏幕上。我将把这个解决方案留作练习,因为一旦你意识到这一点,它一点也不难。