C++ 使用 std::getline 检测输入结束

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19673332/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 23:02:26  来源:igfitidea点击:

Detecting end of input using std::getline

c++while-loopstdingetlinegedit

提问by zalenix

I have a code with the following snippet:

我有一个包含以下代码段的代码:

std::string input;
while(std::getline(std::cin, input))
{   
    //some read only processing with input
}

When I run the program code, I redirect stdin input through the file in.txt (which was created using gedit), and it contains:

当我运行程序代码时,我通过文件 in.txt(使用 gedit 创建)重定向 stdin 输入,它包含:

ABCD
DEFG
HIJK

Each of the above lines end with one newline in the file in.txt.

上述每一行都以文件 in.txt 中的一个换行符结尾。

The problem I am facing is, after the while loop runs for 3 times (for each line), the program control does not move forward and is stuck. My question is why is this happening and what can I do to resolve the problem?

我面临的问题是,while循环运行3次(每行)后,程序控制不前进,卡住了。我的问题是为什么会发生这种情况,我可以做些什么来解决这个问题?

Some clarification:

一些澄清:

I want to be able to run the program from the command line as such:

我希望能够从命令行运行程序,如下所示:

$ gcc program.cc -o out
$ ./out < in.txt

Additional Information:

附加信息:

I did some debugging and found that the while loop actually is running for 4 times (the fourth time with input as empty string). This is causing the loop to program to stall, because the //some processing read only with inputis unable to do its work.

我做了一些调试,发现 while 循环实际上运行了 4 次(第四次输入为空字符串)。这导致循环程序停止,因为//some 处理只读输入无法完成其工作。

So my refined question:

所以我提炼的问题:

1) Why is the 4th loop running at all?

1) 为什么第 4 个循环在运行?

Rationale behind having std::getline() in the while loop's condition must be that, when getline() cannot read any more input, it returns zero and hence the while loop breaks.

Contrary to that, while loop instead continues with an empty string! Why then have getline in the while loop condition at all? Isn't that bad design?

在 while 循环条件中使用 std::getline() 的基本原理必须是,当 getline() 无法读取更多输入时,它返回零,因此 while 循环中断。

与此相反,while 循环反而以空字符串继续!为什么在 while 循环条件中有 getline 呢?这不是糟糕的设计吗?

2) How do I ensure that the while doesn't run for the 4th time without using break statements?

2)如何确保 while 在不使用 break 语句的情况下不会第四次运行?

For now I have used a break statement and string stream as follows:

std::string input;
char temp;
while(std::getline(std::cin, input))
{       
    std::istringstream iss(input);
    if (!(iss >>temp))
    {    
        break;
    } 
    //some read only processing with input
}

But clearly there has to be a more elegant way.

现在我使用了一个 break 语句和字符串流,如下所示:

std::string input;
char temp;
while(std::getline(std::cin, input))
{       
    std::istringstream iss(input);
    if (!(iss >>temp))
    {    
        break;
    } 
    //some read only processing with input
}

但显然必须有一种更优雅的方式。

回答by Keith Thompson

Contrary to DeadMG's answer, I believe the problem is with the contents of your input file, not with your expectation about the behavior of the newline character.

DeadMG 的回答相反,我认为问题出在输入文件的内容上,而不是出在您对换行符行为的期望上。



UPDATE :Now that I've had a chance to play with gedit, I think I see what caused the problem. geditapparently is designed to make it difficult to create a file without a newline on the last line (which is sensible behavior). If you open geditand type three lines of input, typing Enterat the end of each line, then save the file, it will actually create a 4-line file, with the 4th line empty. The complete contents of the file, using your example, would then be "ABCD\nEFGH\nIJKL\n\n". To avoid creating that extra empty line, just don't type Enterat the end of the last line; geditwill provide the required newline character for you.

更新:现在我有机会玩了gedit,我想我知道是什么导致了问题。gedit显然是为了在最后一行没有换行符的情况下创建一个文件变得困难(这是明智的行为)。如果打开gedit并输入三行输入,Enter在每行末尾输入,然后保存文件,它实际上会创建一个 4 行文件,第 4 行为空。使用您的示例,文件的完整内容将是"ABCD\nEFGH\nIJKL\n\n". 为了避免创建额外的空行,不要Enter在最后一行的末尾输入;gedit将为您提供所需的换行符。

(As a special case, if you don't enter anything at all, geditwill create an empty file.)

(作为一种特殊情况,如果您根本不输入任何内容,gedit将创建一个空文件。)

Note this important distinction: In gedit, typing Entercreates a new line. In a text file stored on disk, a newline character (LF, '\n') denotes the end of the current line.

请注意这一重要区别:在 中gedit,键入Enter会创建一个新行。在存储在磁盘上的文本文件中,换行符 (LF, '\n') 表示当前行的结尾。



Text file representations vary from system to system. The most common representations for an end-of-line marker are a single ASCII LF (newline) character (Unix, Linux, and similar systems), and as sequence of two characters, CR and LF (MS Windows). I'll assume the Unix-like representation here. (UPDATE: In a comment, you said you're using Ubuntu 12.04 and gcc 4.6.3, so text files should definitely be in the Unix-style format.)

文本文件表示因系统而异。行尾标记最常见的表示形式是单个 ASCII LF(换行符)字符(Unix、Linux 和类似系统),以及两个字符的序列,CR 和 LF(MS Windows)。我将在这里假设类 Unix 表示。(更新:在评论中,你说你使用的是 Ubuntu 12.04 和 gcc 4.6.3,所以文本文件绝对应该是 Unix 风格的格式。)

I just wrote the following program based on the code in your question:

我刚刚根据您问题中的代码编写了以下程序:

#include <iostream>
#include <string>
int main() {
    std::string input;
    int line_number = 0;
    while(std::getline(std::cin, input))
    {   
        line_number ++;
        std::cout << "line " << line_number
                  << ", input = \"" << input << "\"\n";
    }
}

and I created a 3-line text file in.txt:

我创建了一个 3 行文本文件in.txt

ABCD
EFGH
IJHL

In the file in.txteach line is terminated by a single newline character.

在文件中,in.txt每一行都以一个换行符结尾。

Here's the output I get:

这是我得到的输出:

$ cat in.txt
ABCD
EFGH
IJHL
$ g++ c.cpp -o c
$ ./c < in.txt
line 1, input = "ABCD"
line 2, input = "EFGH"
line 3, input = "IJHL"
$

The final newline at the very end of the file does not start a newline, it merely marks the end of the current line. (A text file that doesn't end with a newline character might not even be valid, depending on the system.)

文件末尾的最后一个换行符不会开始换行符,它只是标记当前行的结尾。(不以换行符结尾的文本文件甚至可能无效,具体取决于系统。)

I can get the behavior you describe if I add a secondnewline character to the end of in.txt:

如果我在末尾添加第二个换行符,我可以获得您描述的行为in.txt

$ echo '' >> in.txt
$ cat in.txt
ABCD
EFGH
IJHL

$ ./c < in.txt
line 1, input = "ABCD"
line 2, input = "EFGH"
line 3, input = "IJHL"
line 4, input = ""
$

The program sees an empty line at the end of the input file because there's an empty line at the end of the input file.

程序在输入文件的末尾看到一个空行,因为在输入文件的末尾有一个空行

If you examine the contents of in.txt, you'll find twonewline (LF) characters at the very end, one to mark the end of the third line, and one to mark the end of the (empty) fourth line. (Or if it's a Windows-format text file, you'll find a CR-LF-CR-LF sequence at the very end of the file.)

如果您检查 的内容in.txt,您会在最后发现两个换行 (LF) 字符,一个标记第三行的结尾,一个标记(空的)第四行的结尾。(或者,如果它是 Windows 格式的文本文件,您会在文件的最后找到一个 CR-LF-CR-LF 序列。)

If your code doesn't deal properly with empty lines, then you should either ensure that it doesn't receive any empty lines on its input, or, better, modify it so it handles empty lines correctly. Howshould it handle empty lines? That depends on what the program is required to do, and it's probably entirely up to you. You can silently skip empty lines:

如果您的代码没有正确处理空行,那么您应该确保它的输入没有收到任何空行,或者更好的是修改它以正确处理空行。它应该如何处理空行?这取决于程序需要做什么,这可能完全取决于您。您可以默默地跳过空行:

if (input != "") {
    // process line
}

or you can treat an empty line as an error:

或者您可以将空行视为错误:

if (input == "") {
    // error handling code
}

or you can treat empty lines as valid data.

或者您可以将空行视为有效数据。

In any case, you should decide exactly how you want to handle empty lines.

在任何情况下,您都应该确切地决定如何处理空行。

回答by Puppy

Why is the 4th loop running at all?

为什么第四个循环在运行?

Because the text input contains four lines.

因为文本输入包含四行。

The new line character means just that- "Start a new line". It does not mean "The preceeding line is complete", and in this test, the difference between those two semantics is revealed. So we have

换行符的意思就是-“开始新行”。这并不意味着“前一行是完整的”,在这个测试中,这两种语义之间的差异被揭示出来。所以我们有

1. ABCD
2. DEFG
3. HIJK
4.

The newline character at the end of the third line begins a new line- just like it should do and exactly like its name says it will. The fact that that line is empty is why you get back an empty string. If you want to avoid it, trim the newline at the end of the third line, or, simply special-case if (input == "") break;.

第三行末尾的换行符开始一个新行——就像它应该做的那样,正如它的名字所说的那样。该行为空的事实是您返回空字符串的原因。如果您想避免它,请在第三行末尾修剪换行符,或者只是 special-case if (input == "") break;

The problem has nothing to do with your code, and lies in your faulty expectation of the behaviour of the newline character.

问题与您的代码无关,而在于您对换行符行为的错误期望。

回答by zalenix

Finale:

结局:

Edit: Please read the accepted answer for the correct explanation of the problem and the solution as well.

编辑:请阅读已接受的答案以正确解释问题和解决方案。



As a note to people using std::getline() in their while loop condition, remember to check if it's an empty string inside the loop and break accordingly, like this:

作为在 while 循环条件中使用 std::getline() 的人的注意事项,请记住检查循环内是否为空字符串并相应地中断,如下所示:

string input;
while(std::getline(std::cin, input))
{
    if(input = "")
        break;
    //some read only processing with input 
}

My suggestion: Don't have std::getline() in the while loop condition at all. Rather use std::cin like this:

我的建议:在 while 循环条件中根本没有 std::getline() 。而是像这样使用 std::cin :

while(std::cin>>a>>b)
{
    //loop body
}

This way extra checking for empty string will not be required and code design is better.

这样就不需要额外检查空字符串,代码设计更好。

The latter method mentioned above negates the explicit checking of an empty string (However, it is always better to do as much explicit checking as possible on the format of the input).

上面提到的后一种方法否定了对空字符串的显式检查(但是,最好对输入的格式进行尽可能多的显式检查)。