C语言 为什么“while (!feof (file))”总是错误的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5431941/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 08:09:47  来源:igfitidea点击:

Why is “while ( !feof (file) )” always wrong?

cfilewhile-loopeoffeof

提问by William Pursell

I've seen people trying to read files like this in a lot of posts lately:

我最近在很多帖子中看到有人试图阅读这样的文件:

#include <stdio.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
    char *path = "stdin";
    FILE *fp = argc > 1 ? fopen(path=argv[1], "r") : stdin;

    if( fp == NULL ) {
        perror(path);
        return EXIT_FAILURE;
    }

    while( !feof(fp) ) {  /* THIS IS WRONG */
        /* Read and process data from file… */
    }
    if( fclose(fp) != 0 ) {
        perror(path);
        return EXIT_FAILURE;
    }
    return EXIT_SUCCESS;
}

What is wrong with this loop?

这个循环有什么问题?

采纳答案by Kerrek SB

I'd like to provide an abstract, high-level perspective.

我想提供一个抽象的、高层次的观点。

Concurrency and simultaneity

并发性和同时性

I/O operations interact with the environment. The environment is not part of your program, and not under your control. The environment truly exists "concurrently" with your program. As with all things concurrent, questions about the "current state" don't make sense: There is no concept of "simultaneity" across concurrent events. Many properties of state simply don't existconcurrently.

I/O 操作与环境交互。环境不是您的程序的一部分,也不在您的控制之下。环境真正与您的程序“同时”存在。与所有并发事物一样,关于“当前状态”的问题没有意义:并发事件之间没有“同时性”的概念。状态的许多属性根本不会同时存在

Let me make this more precise: Suppose you want to ask, "do you have more data". You could ask this of a concurrent container, or of your I/O system. But the answer is generally unactionable, and thus meaningless. So what if the container says "yes" – by the time you try reading, it may no longer have data. Similarly, if the answer is "no", by the time you try reading, data may have arrived. The conclusion is that there simply isno property like "I have data", since you cannot act meaningfully in response to any possible answer. (The situation is slightly better with buffered input, where you might conceivably get a "yes, I have data" that constitutes some kind of guarantee, but you would still have to be able to deal with the opposite case. And with output the situation is certainly just as bad as I described: you never know if that disk or that network buffer is full.)

让我更准确地说:假设您想问,“您有更多数据吗”。您可以询问并发容器或 I/O 系统。但答案通常是不可操作的,因此毫无意义。那么如果容器说“是”怎么办——当你尝试阅读时,它可能不再有数据。同样,如果答案为“否”,则在您尝试阅读时,数据可能已经到达。结论是简单的没有像“我有数据”这样的属性,因为您无法对任何可能的答案做出有意义的反应。(缓冲输入的情况稍微好一些,你可能会得到一个“是的,我有数据”,这构成了某种保证,但你仍然必须能够处理相反的情况。输出情况肯定和我描述的一样糟糕:你永远不知道那个磁盘或那个网络缓冲区是否已满。)

So we conclude that it is impossible, and in fact unreasonable, to ask an I/O system whether it will beable to perform an I/O operation. The only possible way we can interact with it (just as with a concurrent container) is to attemptthe operation and check whether it succeeded or failed. At that moment where you interact with the environment, then and only then can you know whether the interaction was actually possible, and at that point you must commit to performing the interaction. (This is a "synchronisation point", if you will.)

因此,我们得出结论,这是不可能的,而事实上未合理的,要问的I / O系统是否能够执行I / O操作。我们与它交互的唯一可能方式(就像与并发容器一样)是尝试操作并检查它是成功还是失败。在您与环境交互的那一刻,只有那时您才能知道交互是否真的可能,并且在那个时候您必须承诺执行交互。(这是一个“同步点”,如果你愿意的话。)

EOF

EOF

Now we get to EOF. EOF is the responseyou get from an attemptedI/O operation. It means that you were trying to read or write something, but when doing so you failed to read or write any data, and instead the end of the input or output was encountered. This is true for essentially all the I/O APIs, whether it be the C standard library, C++ iostreams, or other libraries. As long as the I/O operations succeed, you simply cannot knowwhether further, future operations will succeed. You mustalways first try the operation and then respond to success or failure.

现在我们进入EOF。EOF 是您从尝试的I/O 操作中获得的响应。这意味着您正在尝试读取或写入某些内容,但是这样做时您无法读取或写入任何数据,而是遇到了输入或输出的结尾。基本上所有 I/O API 都是如此,无论是 C 标准库、C++ iostream 还是其他库。只要 I/O 操作成功,您就无法知道进一步的操作是否会成功。您必须始终先尝试操作,然后再响应成功或失败。

Examples

例子

In each of the examples, note carefully that we firstattempt the I/O operation and thenconsume the result if it is valid. Note further that we alwaysmust use the result of the I/O operation, though the result takes different shapes and forms in each example.

在每个示例中,请仔细注意我们首先尝试 I/O 操作,然后在结果有效时使用它。进一步注意,我们总是必须使用 I/O 操作的结果,尽管在每个示例中结果采用不同的形状和形式。

  • C stdio, read from a file:

    for (;;) {
        size_t n = fread(buf, 1, bufsize, infile);
        consume(buf, n);
        if (n < bufsize) { break; }
    }
    

    The result we must use is n, the number of elements that were read (which may be as little as zero).

  • C stdio, scanf:

    for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) {
        consume(a, b, c);
    }
    

    The result we must use is the return value of scanf, the number of elements converted.

  • C++, iostreams formatted extraction:

    for (int n; std::cin >> n; ) {
        consume(n);
    }
    

    The result we must use is std::cinitself, which can be evaluated in a boolean context and tells us whether the stream is still in the good()state.

  • C++, iostreams getline:

    for (std::string line; std::getline(std::cin, line); ) {
        consume(line);
    }
    

    The result we must use is again std::cin, just as before.

  • POSIX, write(2)to flush a buffer:

    char const * p = buf;
    ssize_t n = bufsize;
    for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {}
    if (n != 0) { /* error, failed to write complete buffer */ }
    

    The result we use here is k, the number of bytes written. The point here is that we can only know how many bytes were written afterthe write operation.

  • POSIX getline()

    char *buffer = NULL;
    size_t bufsiz = 0;
    ssize_t nbytes;
    while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1)
    {
        /* Use nbytes of data in buffer */
    }
    free(buffer);
    

    The result we must use is nbytes, the number of bytes up to and including the newline (or EOF if the file did not end with a newline).

    Note that the function explicitly returns -1(and not EOF!) when an error occurs or it reaches EOF.

  • C stdio,从文件中读取:

    for (;;) {
        size_t n = fread(buf, 1, bufsize, infile);
        consume(buf, n);
        if (n < bufsize) { break; }
    }
    

    我们必须使用的结果是n,读取的元素数(可能少至零)。

  • C 标准输出,scanf

    for (int a, b, c; scanf("%d %d %d", &a, &b, &c) == 3; ) {
        consume(a, b, c);
    }
    

    我们必须使用的结果是 的返回值scanf,即转换的元素数。

  • C++,iostreams 格式提取:

    for (int n; std::cin >> n; ) {
        consume(n);
    }
    

    我们必须使用的结果是std::cin它自己,它可以在布尔上下文中进行评估,并告诉我们流是否仍处于该good()状态。

  • C++, iostreams getline:

    for (std::string line; std::getline(std::cin, line); ) {
        consume(line);
    }
    

    我们必须使用的结果再次是std::cin,就像以前一样。

  • POSIX,write(2)刷新缓冲区:

    char const * p = buf;
    ssize_t n = bufsize;
    for (ssize_t k = bufsize; (k = write(fd, p, n)) > 0; p += k, n -= k) {}
    if (n != 0) { /* error, failed to write complete buffer */ }
    

    我们在这里使用的结果是k,写入的字节数。这里的重点是我们只能知道写操作写入了多少字节。

  • POSIX getline()

    char *buffer = NULL;
    size_t bufsiz = 0;
    ssize_t nbytes;
    while ((nbytes = getline(&buffer, &bufsiz, fp)) != -1)
    {
        /* Use nbytes of data in buffer */
    }
    free(buffer);
    

    我们必须使用的结果是nbytes,直到并包括换行符(如果文件没有以换行符结尾,则为 EOF)的字节数。

    请注意,该函数-1在发生错误或到达 EOF 时显式返回(而不是 EOF!)。

You may notice that we very rarely spell out the actual word "EOF". We usually detect the error condition in some other way that is more immediately interesting to us (e.g. failure to perform as much I/O as we had desired). In every example there is some API feature that could tell us explicitly that the EOF state has been encountered, but this is in fact not a terribly useful piece of information. It is much more of a detail than we often care about. What matters is whether the I/O succeeded, more-so than how it failed.

您可能会注意到,我们很少拼出实际的单词“EOF”。我们通常以我们更感兴趣的其他方式检测错误条件(例如,未能执行我们期望的 I/O)。在每个示例中,都有一些 API 功能可以明确告诉我们已经遇到 EOF 状态,但这实际上并不是非常有用的信息。这比我们经常关心的要多得多。重要的是 I/O 是否成功,而不是它是如何失败的。

  • A final example that actually queries the EOF state: Suppose you have a string and want to test that it represents an integer in its entirety, with no extra bits at the end except whitespace. Using C++ iostreams, it goes like this:

    std::string input = "   123   ";   // example
    
    std::istringstream iss(input);
    int value;
    if (iss >> value >> std::ws && iss.get() == EOF) {
        consume(value);
    } else {
        // error, "input" is not parsable as an integer
    }
    

    We use two results here. The first is iss, the stream object itself, to check that the formatted extraction to valuesucceeded. But then, after also consuming whitespace, we perform another I/O/ operation, iss.get(), and expect it to fail as EOF, which is the case if the entire string has already been consumed by the formatted extraction.

    In the C standard library you can achieve something similar with the strto*lfunctions by checking that the end pointer has reached the end of the input string.

  • 最后一个实际查询 EOF 状态的示例:假设您有一个字符串并想测试它是否代表一个完整的整数,末尾除了空格之外没有额外的位。使用 C++ iostreams,它是这样的:

    std::string input = "   123   ";   // example
    
    std::istringstream iss(input);
    int value;
    if (iss >> value >> std::ws && iss.get() == EOF) {
        consume(value);
    } else {
        // error, "input" is not parsable as an integer
    }
    

    我们在这里使用两个结果。第一个是iss,流对象本身,用于检查格式化提取是否value成功。但是,在也消耗了空格之后,我们执行另一个 I/O/ 操作,iss.get(),并期望它作为 EOF 失败,如果整个字符串已经被格式化的提取消耗,就会出现这种情况。

    在 C 标准库中,您可以strto*l通过检查结束指针是否已到达输入字符串的末尾来实现与函数类似的功能。

The answer

答案

while(!feof)is wrong because it tests for something that is irrelevant and fails to test for something that you need to know. The result is that you are erroneously executing code that assumes that it is accessing data that was read successfully, when in fact this never happened.

while(!feof)是错误的,因为它测试了不相关的东西,而未能测试你需要知道的东西。结果是您错误地执行了假定它正在访问已成功读取的数据的代码,而实际上这从未发生过。

回答by William Pursell

It's wrong because (in the absence of a read error) it enters the loop one more time than the author expects. If there is a read error, the loop never terminates.

这是错误的,因为(在没有读取错误的情况下)它进入循环的次数比作者预期的多。如果存在读取错误,则循环永远不会终止。

Consider the following code:

考虑以下代码:

/* WARNING: demonstration of bad coding technique!! */

#include <stdio.h>
#include <stdlib.h>

FILE *Fopen(const char *path, const char *mode);

int main(int argc, char **argv)
{
    FILE *in;
    unsigned count;

    in = argc > 1 ? Fopen(argv[1], "r") : stdin;
    count = 0;

    /* WARNING: this is a bug */
    while( !feof(in) ) {  /* This is WRONG! */
        fgetc(in);
        count++;
    }
    printf("Number of characters read: %u\n", count);
    return EXIT_SUCCESS;
}

FILE * Fopen(const char *path, const char *mode)
{
    FILE *f = fopen(path, mode);
    if( f == NULL ) {
        perror(path);
        exit(EXIT_FAILURE);
    }
    return f;
}

This program will consistently print one greater than the number of characters in the input stream (assuming no read errors). Consider the case where the input stream is empty:

该程序将始终打印比输入流中的字符数多一个的字符(假设没有读取错误)。考虑输入流为空的情况:

$ ./a.out < /dev/null
Number of characters read: 1

In this case, feof()is called before any data has been read, so it returns false. The loop is entered, fgetc()is called (and returns EOF), and count is incremented. Then feof()is called and returns true, causing the loop to abort.

在这种情况下,feof()在读取任何数据之前调用,因此它返回 false。进入循环,fgetc()被调用(并返回EOF),并且计数增加。然后feof()被调用并返回 true,导致循环中止。

This happens in all such cases. feof()does not return true until aftera read on the stream encounters the end of file. The purpose of feof()is NOT to check if the next read will reach the end of file. The purpose of feof()is to distinguish between a read error and having reached the end of the file. If fread()returns 0, you must use feof/ferrorto decide whether an error was encountered or if all of the data was consumed. Similarly if fgetcreturns EOF. feof()is only useful afterfread has returned zero or fgetchas returned EOF. Before that happens, feof()will always return 0.

在所有此类情况下都会发生这种情况。 feof()直到对流的读取遇到文件结尾后才返回真。的目的feof()不是检查下一次读取是否会到达文件末尾。的目的feof()是区分读取错误和到达文件末尾。如果fread()返回 0,则必须使用feof/ferror来决定是否遇到错误或是否消耗了所有数据。同样,如果fgetc返回EOFfeof()fread 返回零或fgetc返回后才有用EOF。在此之前,feof()将始终返回 0。

It is always necessary to check the return value of a read (either an fread(), or an fscanf(), or an fgetc()) before calling feof().

在调用 之前,始终需要检查读取的返回值(anfread()或 anfscanf()或 an fgetc()feof()

Even worse, consider the case where a read error occurs. In that case, fgetc()returns EOF, feof()returns false, and the loop never terminates. In all cases where while(!feof(p))is used, there must be at least a check inside the loop for ferror(), or at the very least the while condition should be replaced with while(!feof(p) && !ferror(p))or there is a very real possibility of an infinite loop, probably spewing all sorts of garbage as invalid data is being processed.

更糟糕的是,考虑发生读取错误的情况。在这种情况下,fgetc()返回EOFfeof()返回false,循环永远不会终止。在所有使用的情况下,while(!feof(p))循环内部必须至少有一个检查 for ferror(),或者至少应该替换 while 条件,while(!feof(p) && !ferror(p))或者存在无限循环的非常真实的可能性,可能会喷出各种垃圾作为正在处理无效数据。

So, in summary, although I cannot state with certainty that there is never a situation in which it may be semantically correct to write "while(!feof(f))" (although there mustbe another check inside the loop with a break to avoid a infinite loop on a read error), it is the case that it is almost certainly always wrong. And even if a case ever arose where it would be correct, it is so idiomatically wrong that it would not be the right way to write the code. Anyone seeing that code should immediately hesitate and say, "that's a bug". And possibly slap the author (unless the author is your boss in which case discretion is advised.)

所以,总而言之,虽然我不能肯定地说,从来没有这样一种情况,在这种情况下,写“ while(!feof(f))”在语义上可能是正确的(尽管必须在循环内部进行另一个检查,并带有中断以避免读错误时的无限循环),在这种情况下,它几乎肯定总是错误的。即使出现了一个正确的案例,它也是如此惯用的错误,以至于它不是编写代码的正确方法。任何看到该代码的人都应该立即犹豫并说,“这是一个错误”。并可能打作者一巴掌(除非作者是你的老板,在这种情况下建议谨慎行事。)

回答by Erik

No it's not always wrong. If your loop condition is "while we haven't tried to read past end of file" then you use while (!feof(f)). This is however not a common loop condition - usually you want to test for something else (such as "can I read more"). while (!feof(f))isn't wrong, it's just usedwrong.

不,它并不总是错的。如果您的循环条件是“虽然我们还没有尝试读取文件末尾”,那么您可以使用while (!feof(f)). 然而,这不是常见的循环条件 - 通常您想测试其他内容(例如“我可以阅读更多内容吗”)。while (!feof(f))没有错,只是错了。

回答by AProgrammer

feof()indicates if one has tried to read past the end of file. That means it has little predictive effect: if it is true, you are sure that the next input operation will fail (you aren't sure the previous one failed BTW), but if it is false, you aren't sure the next input operation will succeed. More over, input operations may fail for other reasons than the end of file (a format error for formatted input, a pure IO failure -- disk failure, network timeout -- for all input kinds), so even if you could be predictive about the end of file (and anybody who has tried to implement Ada one, which is predictive, will tell you it can complex if you need to skip spaces, and that it has undesirable effects on interactive devices -- sometimes forcing the input of the next line before starting the handling of the previous one), you would have to be able to handle a failure.

feof()指示是否试图读取文件末尾。这意味着它几乎没有预测效果:如果它是真的,你肯定下一个输入操作会失败(顺便说一句,你不确定前一个失败),但如果它是假的,你不确定下一个输入操作会成功。此外,输入操作可能会因为文件结尾以外的其他原因而失败(格式化输入的格式错误、纯 IO 故障——磁盘故障、网络超时——对于所有输入类型),因此即使您可以预测文件的结尾(以及任何尝试实现 Ada one 的人,它是预测性的,会告诉您,如果您需要跳过空格,它会很复杂,并且它对交互式设备有不良影响——有时会强制输入下一个在开始处理上一个之前的行),

So the correct idiom in C is to loop with the IO operation success as loop condition, and then test the cause of the failure. For instance:

所以C中正确的习惯用法是以IO操作成功为循环条件进行循环,然后测试失败的原因。例如:

while (fgets(line, sizeof(line), file)) {
    /* note that fgets don't strip the terminating \n, checking its
       presence allow to handle lines longer that sizeof(line), not showed here */
    ...
}
if (ferror(file)) {
   /* IO failure */
} else if (feof(file)) {
   /* format error (not possible with fgets, but would be with fscanf) or end of file */
} else {
   /* format error (not possible with fgets, but would be with fscanf) */
}