C++ 从文本文件中逐字或逐字符地读入单词

Question

提问by Matt Swezey

I've been googling around and reading through my book and trying to write out code to read through a text file and process words out of it, one by one, so i can put them in alphabetical order and keep a count of how many words where used and much a word was used. I can't seem to get my GetNextWord() function to work properly and it's driving me crazy.

我一直在谷歌搜索并通读我的书，并试图写出代码来阅读文本文件并从中处理单词，一个一个，这样我就可以按字母顺序排列它们并计算有多少单词在哪里使用并且使用了很多词。我似乎无法让我的 GetNextWord() 函数正常工作，这让我发疯。

I need to read the words in, one by one, and convert each letter to the lowercase if it is upper case. Which I know how to do that, and have done that successfully. It's just getting the word character by character and putting it into a string that is holding me up.

我需要一个一个地读入单词，如果是大写，则将每个字母转换为小写。我知道如何做到这一点，并且已经成功做到了。它只是逐个字符地获取单词并将其放入一个阻止我的字符串中。

This is my most recent try at it: Any help would be amazing or a link to a tutorial over how to read from an input file word by word. (Word being alpha characters a-z and ' (don't) ended by whitespace, comma, period, ; , : , ect....

这是我最近的尝试：任何帮助都会很棒，或者提供有关如何逐字读取输入文件的教程的链接。（单词是字母字符 az 和 '（不要）以空格、逗号、句点、; 、 : 等结尾......

void GetNextWord()
{
    string word = "";
    char c;

    while(inFile.get(c))
    {
        while( c > 64 && c < 123 || c == 39)
        {
            if((isupper(c)))
            {
                c = (tolower(c));
            }
            word = word + c;
        }
        outFile << word;
    }
}

Answer 1

采纳答案by sbi

Your logic is wrong. The inner loop runs as long as cdoesn't change, and there's nothing in it that would change c.

你的逻辑是错误的。只要c不改变，内部循环就会运行，并且其中没有任何东西会改变c。

Why are you having two loops anyway? I think you might be confused about whether that function is supposed to read the next word or all words. Try to separate those concerns, put them into different functions (one of which is calling the other). I find it easiest to approach such problems in a top-down order:

为什么你有两个循环呢？我想您可能会对这个函数是应该读取下一个单词还是所有单词感到困惑。尝试将这些关注点分开，将它们放入不同的函数中（其中一个调用另一个）。我发现以自上而下的顺序解决此类问题最容易：

while(inFile.good()) {
  std::string word = GetNextWord(inFile);
  if(!word.empty())
    std::cout << word << std::endl;
}

Now fill in the gaps by defining GetNextWord()to read everything up to the next word boundary.

现在通过定义GetNextWord()读取所有内容来填补空白，直到下一个单词边界。

Answer 2

回答by Scott Stafford

You can read the file word by word by using the >>operator. For example, see this link: http://www.daniweb.com/forums/thread30942.html.

您可以使用>>运算符逐字读取文件。例如，请参阅此链接：http: //www.daniweb.com/forums/thread30942.html。

I excerpted their example here:

我在这里摘录了他们的例子：

ifstream in ( "somefile" );
vector<string> words;
string word

if ( !in )
  return;

while ( in>> word )
  words.push_back ( word );

Answer 3

回答by Max Lybbert

Personally I like to read in input with std::getline(std::istream&, std::string&)(in the <string>header, but you will of course also need to #includea stream header).

我个人喜欢用std::getline(std::istream&, std::string&)（在<string>标题中，但你当然也需要#include一个流标题）来读取输入。

This function breaks on newline, which is whitespace by your problem's definition. But it's not the entire answer to your question. After reading in the line of text, you're going to need to use string operationsor standard algorithms to break the string into words. Or you could loop over the string by hand.

此函数在换行符处中断，根据您的问题定义，这是空格。但这不是您问题的全部答案。读入文本行后，您将需要使用字符串操作或标准算法将字符串分解为单词。或者您可以手动遍历字符串。

The guts would be something like:

胆量是这样的：

std::string buffer;
while (std::getline(std::cin, buffer) {
// break each line into words, according to problem spec
}

Answer 4

回答by IAE

I use

我用

// str is a string that holds the line of data from ifs- the text file.
// str holds the words to be split, res the vector to store them in.
while( getline( ifs, str ) ) 
    split(str, res);


void split(const string& str, vector<string>& vec)
{
    typedef unsigned int uint;

    const string::size_type size(str.size());
    uint start(0);
    uint range(0);

 /* Explanation: 
  * Range - Length of the word to be extracted without spaces.
  * start - Start of next word. During initialization, starts at space 0.
  * 
  * Runs until it encounters a ' ', then splits the string with a substr() function,
  * as well as making sure that all characters are lower-case (without wasting time
  * to check if they already are, as I feel a char-by-char check for upper-case takes
  * just as much time as lowering them all anyway.                                       
 */
    for( uint i(0); i < size; ++i )
    {
        if( isspace(str[i]) )
        {
            vec.push_back( toLower(str.substr(start, range + 1)) );
            start = i + 1;
            range = 0;
        } else
            ++range;
    }
    vec.push_back( toLower(str.substr(start, range)) );
}

I'm not sure this is particularly helpful to you, but I'll try. The toLower function is a quick function that simply uses the ::toLower() function. This reads each char until a space, then stuffs it in an vector. I'm not entirely sure what you mean with char by char.

我不确定这对您特别有帮助，但我会尝试。toLower 函数是一个简单的使用 ::toLower() 函数的快速函数。这会读取每个字符直到一个空格，然后将其填充到一个向量中。我不完全确定你对char一个char的意思。

Do you want to extract a word character by a time? Or do you want to check each character as you go along? Or do you mean you want to extract one word, finish, and then come back? If so, I would 1) recommend a vector anyway, and 2) let me know so I can refactor the code.

你想一次提取一个单词字符吗？或者您想在进行过程中检查每个角色？或者你的意思是你想提取一个词，完成，然后回来？如果是这样，我会 1) 无论如何推荐一个向量，并且 2) 让我知道，以便我可以重构代码。

Answer 5

回答by Andrew Bainbridge

What's going to terminate your inner loop if c == 'a'? ASCII value for 'a' is 97.

如果 c == 'a' 什么会终止你的内部循环？'a' 的 ASCII 值为 97。

C++ 从文本文件中逐字或逐字符地读入单词

提问by Matt Swezey

采纳答案by sbi

回答by Scott Stafford

回答by Max Lybbert

回答by IAE

回答by Andrew Bainbridge

相关推荐

最近更新

标签

C++ 从文本文件中逐字或逐字符地读入单词

提问by Matt Swezey

采纳答案by sbi

回答by Scott Stafford

回答by Max Lybbert

回答by IAE

回答by Andrew Bainbridge

相关推荐

C++11 范围出口保护，一个好主意？

类中的静态字符串常量与常量的命名空间 [c++]

C++函数计算字符串中的所有单词

C++ 使用 mfc 的动态菜单

相关推荐

最近更新

标签