C++ 从文本文件中逐字或逐字符地读入单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3714649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C++ Reading in words from a text file, word by word or char by char
提问by Matt Swezey
I've been googling around and reading through my book and trying to write out code to read through a text file and process words out of it, one by one, so i can put them in alphabetical order and keep a count of how many words where used and much a word was used. I can't seem to get my GetNextWord() function to work properly and it's driving me crazy.
我一直在谷歌搜索并通读我的书,并试图写出代码来阅读文本文件并从中处理单词,一个一个,这样我就可以按字母顺序排列它们并计算有多少单词在哪里使用并且使用了很多词。我似乎无法让我的 GetNextWord() 函数正常工作,这让我发疯。
I need to read the words in, one by one, and convert each letter to the lowercase if it is upper case. Which I know how to do that, and have done that successfully. It's just getting the word character by character and putting it into a string that is holding me up.
我需要一个一个地读入单词,如果是大写,则将每个字母转换为小写。我知道如何做到这一点,并且已经成功做到了。它只是逐个字符地获取单词并将其放入一个阻止我的字符串中。
This is my most recent try at it: Any help would be amazing or a link to a tutorial over how to read from an input file word by word. (Word being alpha characters a-z and ' (don't) ended by whitespace, comma, period, ; , : , ect....
这是我最近的尝试:任何帮助都会很棒,或者提供有关如何逐字读取输入文件的教程的链接。(单词是字母字符 az 和 '(不要)以空格、逗号、句点、; 、 : 等结尾......
void GetNextWord()
{
string word = "";
char c;
while(inFile.get(c))
{
while( c > 64 && c < 123 || c == 39)
{
if((isupper(c)))
{
c = (tolower(c));
}
word = word + c;
}
outFile << word;
}
}
采纳答案by sbi
Your logic is wrong. The inner loop runs as long as c
doesn't change, and there's nothing in it that would change c
.
你的逻辑是错误的。只要c
不改变,内部循环就会运行,并且其中没有任何东西会改变c
。
Why are you having two loops anyway? I think you might be confused about whether that function is supposed to read the next word or all words. Try to separate those concerns, put them into different functions (one of which is calling the other). I find it easiest to approach such problems in a top-down order:
为什么你有两个循环呢?我想您可能会对这个函数是应该读取下一个单词还是所有单词感到困惑。尝试将这些关注点分开,将它们放入不同的函数中(其中一个调用另一个)。我发现以自上而下的顺序解决此类问题最容易:
while(inFile.good()) {
std::string word = GetNextWord(inFile);
if(!word.empty())
std::cout << word << std::endl;
}
Now fill in the gaps by defining GetNextWord()
to read everything up to the next word boundary.
现在通过定义GetNextWord()
读取所有内容来填补空白,直到下一个单词边界。
回答by Scott Stafford
You can read the file word by word by using the >>
operator. For example, see this link: http://www.daniweb.com/forums/thread30942.html.
您可以使用>>
运算符逐字读取文件。例如,请参阅此链接:http: //www.daniweb.com/forums/thread30942.html。
I excerpted their example here:
我在这里摘录了他们的例子:
ifstream in ( "somefile" );
vector<string> words;
string word
if ( !in )
return;
while ( in>> word )
words.push_back ( word );
回答by Max Lybbert
Personally I like to read in input with std::getline(std::istream&, std::string&)
(in the <string>
header, but you will of course also need to #include
a stream header).
我个人喜欢用std::getline(std::istream&, std::string&)
(在<string>
标题中,但你当然也需要#include
一个流标题)来读取输入。
This function breaks on newline, which is whitespace by your problem's definition. But it's not the entire answer to your question. After reading in the line of text, you're going to need to use string operationsor standard algorithms to break the string into words. Or you could loop over the string by hand.
此函数在换行符处中断,根据您的问题定义,这是空格。但这不是您问题的全部答案。读入文本行后,您将需要使用字符串操作或标准算法将字符串分解为单词。或者您可以手动遍历字符串。
The guts would be something like:
胆量是这样的:
std::string buffer;
while (std::getline(std::cin, buffer) {
// break each line into words, according to problem spec
}
回答by IAE
I use
我用
// str is a string that holds the line of data from ifs- the text file.
// str holds the words to be split, res the vector to store them in.
while( getline( ifs, str ) )
split(str, res);
void split(const string& str, vector<string>& vec)
{
typedef unsigned int uint;
const string::size_type size(str.size());
uint start(0);
uint range(0);
/* Explanation:
* Range - Length of the word to be extracted without spaces.
* start - Start of next word. During initialization, starts at space 0.
*
* Runs until it encounters a ' ', then splits the string with a substr() function,
* as well as making sure that all characters are lower-case (without wasting time
* to check if they already are, as I feel a char-by-char check for upper-case takes
* just as much time as lowering them all anyway.
*/
for( uint i(0); i < size; ++i )
{
if( isspace(str[i]) )
{
vec.push_back( toLower(str.substr(start, range + 1)) );
start = i + 1;
range = 0;
} else
++range;
}
vec.push_back( toLower(str.substr(start, range)) );
}
I'm not sure this is particularly helpful to you, but I'll try. The toLower function is a quick function that simply uses the ::toLower() function. This reads each char until a space, then stuffs it in an vector. I'm not entirely sure what you mean with char by char.
我不确定这对您特别有帮助,但我会尝试。toLower 函数是一个简单的使用 ::toLower() 函数的快速函数。这会读取每个字符直到一个空格,然后将其填充到一个向量中。我不完全确定你对char一个char的意思。
Do you want to extract a word character by a time? Or do you want to check each character as you go along? Or do you mean you want to extract one word, finish, and then come back? If so, I would 1) recommend a vector anyway, and 2) let me know so I can refactor the code.
你想一次提取一个单词字符吗?或者您想在进行过程中检查每个角色?或者你的意思是你想提取一个词,完成,然后回来?如果是这样,我会 1) 无论如何推荐一个向量,并且 2) 让我知道,以便我可以重构代码。
回答by Andrew Bainbridge
What's going to terminate your inner loop if c == 'a'? ASCII value for 'a' is 97.
如果 c == 'a' 什么会终止你的内部循环?'a' 的 ASCII 值为 97。