C++函数计算字符串中的所有单词

Question

提问by evilHyman

I was asked this during an interview and apparently it's an easy question but it wasn't and still isn't obvious to me.

我在一次采访中被问到这个问题，显然这是一个简单的问题，但对我来说不是，现在仍然不明显。

Given a string, count all the words in it. Doesn't matter if they are repeated. Just the total count like in a text files word count. Words are anything separated by a space and punctuation doesn't matter, as long as it's part of a word.

给定一个字符串，计算其中的所有单词。是否重复都没有关系。就像在文本文件中的总字数一样。单词是由空格分隔的任何东西，标点符号无关紧要，只要它是单词的一部分。

For example: A very, very, very, very, very big dog ate my homework!!!! ==> 11 words

例如： A very, very, very, very, very big dog ate my homework!!!! ==> 11 words

My "algorithm" just goes through looking for spaces and incrementing a counter until I hit a null. Since i didn't get the job and was asked to leave after that I guess My solution wasn't good? Anyone have a more clever solution? Am I missing something?

我的“算法”只是通过查找空格和递增计数器，直到我遇到空值。由于我没有得到这份工作并被要求离开之后我想我的解决方案不好？有人有更聪明的解决方案吗？我错过了什么吗？

Answer 1

采纳答案by dash-tom-bang

A less clever, more obvious-to-all-of-the-programmers-on-your-team method of doing it.

一种不太聪明，对团队中的所有程序员来说更明显的方法。

#include <cctype>

int CountWords(const char* str)
{
   if (str == NULL)
      return error_condition;  // let the requirements define this...

   bool inSpaces = true;
   int numWords = 0;

   while (*str != 'unsigned int countWordsInString(std::string const& str)
{
    std::stringstream stream(str);
    return std::distance(std::istream_iterator<std::string>(stream), std::istream_iterator<std::string>());
}
')
   {
      if (std::isspace(*str))
      {
         inSpaces = true;
      }
      else if (inSpaces)
      {
         numWords++;
         inSpaces = false;
      }

      ++str;
   }

   return numWords;
}

Answer 2

回答by Martin York

Assuming words are white space separated:

假设单词以空格分隔：

std::stringstream  stream(str);
std::string        oneWord;

stream >> oneWord; // Reads one space separated word.

Note: There may be more than one space between words. Also this does not catch other white space characters like tab new line or carriage return. So counting spaces is not enough.

注意：单词之间可能有多个空格。此外，这不会捕获其他空白字符，如制表符换行或回车。所以计算空格是不够的。

The stream input operator >> when used to read a string from a stream. Reads one white space separated word. So they were probably looking for you to use this to identify words.

流输入运算符 >> 用于从流中读取字符串。读取一个空格分隔的单词。所以他们可能在找你用它来识别单词。

std::stringstream  stream(str);
std::string        oneWord;
unsigned int       count = 0;

while(stream >> oneWord) { ++count;}
// count now has the number of words in the string.

When can use this to count words in a string.

什么时候可以使用它来计算字符串中的单词。

std::stringstream  stream(str);
std::string        oneWord;
unsigned int       count = 0;

std::istream_iterator loop = std::istream_iterator<std::string>(stream);
std::istream_iterator end  = std::istream_iterator<std::string>();

for(;loop != end; ++count, ++loop) { *loop; }

Getting complicated:
Streams can be treated just like any other container and there are iterators to loop through them std::istream_iterator. When you use the ++ operator on an istream_iterator it just read the next value from the stream using the operator >>. In this case we are reading std::string so it reads a space separated word.

变得复杂：
流可以像任何其他容器一样被处理，并且有迭代器来循环它们 std::istream_iterator。当您在 istream_iterator 上使用 ++ 运算符时，它只是使用运算符 >> 从流中读取下一个值。在这种情况下，我们正在读取 std::string，因此它读取一个空格分隔的单词。

unsigned int countWordsInString(std::string const& str)
{
    std::stringstream stream;

    // sneaky way to use the string as the buffer to avoid copy.
    stream.rdbuf()->pubsetbuf (str.c_str(), str.length() );
    return std::distance(std::istream_iterator<std::string>(stream), std::istream_iterator<std::string>());
}

Using std::distance just wraps all the above in a tidy package as it find the distance between two iterators by doing ++ on the first until we reach the second.

使用 std::distance 只是将上述所有内容包装在一个整洁的包中，因为它通过在第一个迭代器上执行 ++ 直到我们到达第二个迭代器来找到两个迭代器之间的距离。

To avoid copying the string we can be sneaky:

为了避免复制字符串，我们可以偷偷摸摸：

vector<string> result;
split(result, "aaaa bbbb cccc", is_any_of(" \t\n\v\f\r"), token_compress_on);

Note: we still copy each word out of the original into a temporary. But the cost of that is minimal.

注意：我们还是把原文中的每个词都复制到一个临时的。但这样做的成本是最低的。

Answer 3

回答by Christopher Hunt

Another boost based solution that may work (untested):

另一种可能有效的基于提升的解决方案（未经测试）：

//Count the number of words on string
#include <iostream>
#include <string>
#include <algorithm> //count and count_if is declared here

int main () {
    std::string sTEST("Text to verify how many words it has.");

    std::cout << std::count(sTEST.cbegin(), sTEST.cend(), ' ')+1;

    return 0;
}

More information can be found in the Boost String Algorithms Library

可以在Boost String Algorithms Library 中找到更多信息

Answer 4

回答by TheArquitect

You can use the std::count or std::count_if to do that. Below a simple example with std::count:

您可以使用 std::count 或 std::count_if 来做到这一点。下面是一个带有 std::count 的简单示例：

//Count the number of words on string
#include <string>
#include <iostream>

int main () {
    std::string T("Text to   verify :  How many words does it   have?");

    size_t NWords = T.empty() || T.back() == ' ' ? 0 : 1;
    for (size_t s = T.size(); s > 0; --s)
        if (T[s] == ' ' && T[s-1] != ' ') ++NWords;

    std::cout << NWords;

    return 0;
}

UPDATE: Due the observation made by Aydin ?zcan (Nov 16) I made a change to this solution. Now the words may have more than one space between them. :)

更新：由于 Aydin ?zcan（11 月 16 日）的观察，我对此解决方案进行了更改。现在单词之间可能有多个空格。:)

#include <boost/iterator/transform_iterator.hpp>
#include <cctype>

boost::transform_iterator
    < int (*)(int), std::string::const_iterator, bool const& >
    pen( str.begin(), std::isalnum ), end( str.end(), std::isalnum );

size_t word_cnt = 0;

while ( pen != end ) {
    word_cnt += * pen;
    pen = std::mismatch( pen+1, end, pen ).first;
}

return word_cnt;

Answer 5

回答by Potatoswatter

This can be done without manually looking at every character or copying the string.

无需手动查看每个字符或复制字符串即可完成此操作。

if ( str.empty() ) return 0;

size_t word_cnt = std::isalnum( * str.begin() );

for ( std::string::const_iterator pen = str.begin(); ++ pen != str.end(); ) {
    word_cnt += std::isalnum( pen[ 0 ] ) && ! std::isalnum( pen[ -1 ] );
}

return word_cnt;

I took the liberty of using isalnuminstead of isspace.

我冒昧地使用isalnum而不是isspace.

This is not something I would do at a job interview. (It's not like it compiled the first time.)

这不是我在求职面试中会做的事情。（这不像是第一次编译。）

Or, for all the Boost haters ;v)

或者，对于所有讨厌 Boost 的人；v)

#include <iostream>
#include <string>
using namespace std;

int countNumberOfWords(string sentence){
    int numberOfWords = 0;
    size_t i;

    if (isalpha(sentence[0])) {
        numberOfWords++;
    }

    for (i = 1; i < sentence.length(); i++) {
        if ((isalpha(sentence[i])) && (!isalpha(sentence[i-1]))) {
            numberOfWords++;
        }
    }

    return numberOfWords;
}

int main()
{
    string sentence;
    cout<<"Enter the sentence : ";
    getline(cin, sentence);

    int numberOfWords = countNumberOfWords(sentence);
    cout<<"The number of words in the sentence is : "<<numberOfWords<<endl;

    return 0;
}

Answer 6

回答by totjammykd

An O(N) solution that is also very simple to understand and implement:

一个 O(N) 解决方案，也很容易理解和实现：

(I haven't checked for an empty string input. But I am sure you can do that easily.)

（我还没有检查空字符串输入。但我相信你可以轻松做到这一点。）

 i | 0123456789
c1 | A very, very, very, very, very big dog ate my homework!!!!
c2 |  A very, very, very, very, very big dog ate my homework!!!!
   |  x     x     x     x     x    x   x   x   x  x

Answer 7

回答by Lakshay Garg

Here is a single pass, branchless (almost), locale-aware algorithm which handles cases with more than one space between words:

这是一个单程、无分支（几乎）、区域感知算法，它处理单词之间有多个空格的情况：

If the string is empty return 0
let transitions = number of adjacent char pairs (c1, c2) where c1 == ' 'and c2 != ' '
if the sentence starts with a space, return transitionselse return transitions + 1

如果字符串为空返回0
让转换 = 相邻字符对 (c1, c2) 的数量，其中c1 == ' '和c2 != ' '
如果句子以空格开头，则返回transitionselse returntransitions + 1

Here is an example with string = "A very, very, very, very, very big dog ate my homework!!!!"

这是一个字符串 =“一只非常、非常、非常、非常、非常非常大的狗吃了我的作业！！！”的例子

Let `i` be the loop counter.

When i=0: c1='A' and c2=' ', the condition `c1 == ' '` and `c2 != ' '` is not met
When i=1: c1=' ' and c2='A', the condition is met
... and so on for the remaining characters

Explanation

解释

size_t count_words_naive(const std::string_view& s)
{
    if (s.size() == 0) return 0;
    size_t count = 0;
    bool isspace1, isspace2 = true;
    for (auto c : s) {
        isspace1 = std::exchange(isspace2, isspace(c));
        count += (isspace1 && !isspace2);
    }
    return count;
}

Here are 2 solutions I came up with

这是我想出的 2 个解决方案

Naive solution

天真的解决方案

size_t count_words_using_inner_prod(const std::string_view& s)
{
    if (s.size() == 0) return 0;
    auto starts_with_space = isspace(s.front());
    auto num_transitions = std::inner_product(
            s.begin()+1, s.end(), s.begin(), 0, std::plus<>(),
            [](char c2, char c1) { return isspace(c1) && !isspace(c2); });
    return num_transitions + !starts_with_space;
}

If you think carefully, you will be able to reduce this set of operations into an inner product (just for fun, I don't recommend this as this is arguably much less readable).

如果您仔细考虑，您将能够将这组操作简化为一个内积（只是为了好玩，我不建议这样做，因为这可能会降低可读性）。

Inner product solution

内积解决方案

bool is_letter(char c) { return c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z'; }

int count_words(const string& s) {
    int i = 0, N = s.size(), count = 0;
    while(i < N) {
        while(i < N && !is_letter(s[i])) i++;
        if(i == N) break;
        while(i < N && is_letter(s[i])) i++;
        count++;
    }
    return count;
}

Answer 8

回答by sunjerry

A very concise O(N) approach:

一个非常简洁的 O(N) 方法：

int DC(const string& A, int low, int high) {
    if(low > high) return 0;
    int mid = low + (high - low) / 2;

    int count_left = DC(A, low, mid-1);
    int count_right = DC(A, mid+1, high);

    if(!is_letter(A[mid])) 
        return count_left + count_right;
    else {
        if(mid == low && mid == high) return 1;
        if(mid-1 < low) {
            if(is_letter(A[mid+1])) return count_right;
            else return count_right+1;
        } else if(mid+1 > high) {
            if(is_letter(A[mid-1])) return count_left;
            else return count_left+1;
        }
        else {
            if(!is_letter(A[mid-1]) && !is_letter(A[mid+1])) 
                return count_left + count_right + 1;
            else if(is_letter(A[mid-1]) && is_letter(A[mid+1]))
                return count_left + count_right - 1;
            else
                return count_left + count_right;
        }
    }
}

int count_words_divide_n_conquer(const string& s) {
    return DC(s, 0, s.size()-1);
}

A divide-and-conquer approach, complexity is also O(N):

一种分而治之的方法，复杂度也是 O(N)：

##代码##

C++函数计算字符串中的所有单词

提问by evilHyman

采纳答案by dash-tom-bang

回答by Martin York

回答by Christopher Hunt

回答by TheArquitect

回答by Potatoswatter

回答by totjammykd

回答by Lakshay Garg

回答by sunjerry

相关推荐

最近更新

标签

C++函数计算字符串中的所有单词

提问by evilHyman

采纳答案by dash-tom-bang

回答by Martin York

回答by Christopher Hunt

回答by TheArquitect

回答by Potatoswatter

回答by totjammykd

回答by Lakshay Garg

回答by sunjerry

相关推荐

用于 C++ 文件 IO 错误的 Try-Catch 块不起作用

C++ 我可以在没有 qmake 或 Qt Creator 的情况下使用 Qt 吗？

C++ 前向声明与包含

C++ pthread 睡眠 linux

相关推荐

最近更新

标签