C语言 C 编程:如何将整个文件内容读入缓冲区

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14002954/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 04:49:41  来源:igfitidea点击:

C Programming: How to read the whole file contents into a buffer

cfile-io

提问by Sunny

I want to write the full contents of a file into a buffer. The file actually only contains a string which i need to compare with a string.

我想将文件的全部内容写入缓冲区。该文件实际上只包含一个字符串,我需要与一个字符串进行比较。

What would be the most efficient option which is portable even on linux.

什么是最有效的选择,即使在 linux 上也是可移植的。

ENV: Windows

环境:窗户

回答by

Portability between Linux and Windows is a big headache, since Linux is a POSIX-conformant system with - generally - a proper, high quality toolchain for C, whereas Windows doesn't even provide a lot of functions in the C standard library.

Linux 和 Windows 之间的可移植性是一个令人头疼的问题,因为 Linux 是一个符合 POSIX 的系统,通常有一个适当的、高质量的 C 工具链,而 Windows 甚至不提供 C 标准库中的很多功能。

However, if you want to stick to the standard, you can write something like this:

但是,如果你想坚持标准,你可以这样写:

#include <stdio.h>
#include <stdlib.h>

FILE *f = fopen("textfile.txt", "rb");
fseek(f, 0, SEEK_END);
long fsize = ftell(f);
fseek(f, 0, SEEK_SET);  /* same as rewind(f); */

char *string = malloc(fsize + 1);
fread(string, 1, fsize, f);
fclose(f);

string[fsize] = 0;

Here stringwill contain the contents of the text file as a properly 0-terminated C string. This code is just standard C, it's not POSIX-specific (although that it doesn't guarantee it will work/compile on Windows...)

这里string将包含文本文件的内容作为正确的 0 终止的 C 字符串。这段代码只是标准的 C,它不是 POSIX 特定的(尽管它不能保证它可以在 Windows 上工作/编译......)

回答by Nominal Animal

Here is what I would recommend.

这是我要推荐的。

It should conform to C89, and be completely portable. In particular, it works also on pipes and sockets on POSIXy systems.

它应该符合C89,并且是完全可移植的。特别是,它也适用于 POSIXy 系统上的管道和套接字。

The idea is that we read the input in large-ish chunks (READALL_CHUNK), dynamically reallocating the buffer as we need it. We only use realloc(), fread(), ferror(), and free():

这个想法是我们以大块 ( READALL_CHUNK)读取输入,根据需要动态重新分配缓冲区。我们只使用realloc(), fread(), ferror(), 和free()

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

/* Size of each input chunk to be
   read and allocate for. */
#ifndef  READALL_CHUNK
#define  READALL_CHUNK  262144
#endif

#define  READALL_OK          0  /* Success */
#define  READALL_INVALID    -1  /* Invalid parameters */
#define  READALL_ERROR      -2  /* Stream error */
#define  READALL_TOOMUCH    -3  /* Too much input */
#define  READALL_NOMEM      -4  /* Out of memory */

/* This function returns one of the READALL_ constants above.
   If the return value is zero == READALL_OK, then:
     (*dataptr) points to a dynamically allocated buffer, with
     (*sizeptr) chars read from the file.
     The buffer is allocated for one extra char, which is NUL,
     and automatically appended after the data.
   Initial values of (*dataptr) and (*sizeptr) are ignored.
*/
int readall(FILE *in, char **dataptr, size_t *sizeptr)
{
    char  *data = NULL, *temp;
    size_t size = 0;
    size_t used = 0;
    size_t n;

    /* None of the parameters can be NULL. */
    if (in == NULL || dataptr == NULL || sizeptr == NULL)
        return READALL_INVALID;

    /* A read error already occurred? */
    if (ferror(in))
        return READALL_ERROR;

    while (1) {

        if (used + READALL_CHUNK + 1 > size) {
            size = used + READALL_CHUNK + 1;

            /* Overflow check. Some ANSI C compilers
               may optimize this away, though. */
            if (size <= used) {
                free(data);
                return READALL_TOOMUCH;
            }

            temp = realloc(data, size);
            if (temp == NULL) {
                free(data);
                return READALL_NOMEM;
            }
            data = temp;
????    }

        n = fread(data + used, 1, READALL_CHUNK, in);
        if (n == 0)
            break;

        used += n;
    }

    if (ferror(in)) {
        free(data);
        return READALL_ERROR;
    }

    temp = realloc(data, used + 1);
    if (temp == NULL) {
        free(data);
        return READALL_NOMEM;
    }
    data = temp;
    data[used] = '
#include <stdio.h>

char buffer[MAX_FILE_SIZE];
size_t i;

for (i = 0; i < MAX_FILE_SIZE; ++i)
{
    int c = getc(fp);

    if (c == EOF)
    {
        buffer[i] = 0x00;
        break;
    }

    buffer[i] = c;
}
'; *dataptr = data; *sizeptr = used; return READALL_OK; }

Above, I've used a constant chunk size, READALL_CHUNK== 262144 (256*1024). This means that in the worst case, up to 262145 chars are wasted (allocated but not used), but only temporarily. At the end, the function reallocates the buffer to the optimal size. Also, this means that we do four reallocations per megabyte of data read.

上面,我使用了一个恒定的块大小,READALL_CHUNK== 262144 ( 256*1024)。这意味着在最坏的情况下,最多会浪费 262145 个字符(已分配但未使用),但只是暂时的。最后,该函数将缓冲区重新分配到最佳大小。此外,这意味着我们对读取的每兆字节数据进行四次重新分配。

The 262144-byte default in the code above is a conservative value; it works well for even old minilaptops and Raspberry Pis and most embedded devices with at least a few megabytes of RAM available for the process. Yet, it is not so small that it slows down the operation (due to many read calls, and many buffer reallocations) on most systems.

上面代码中的 262144 字节默认值是一个保守值;它甚至适用于旧的小型笔记本电脑和 Raspberry Pi 以及大多数具有至少几兆字节 RAM 可用于该过程的嵌入式设备。然而,它并没有小到会减慢大多数系统上的操作速度(由于许多读取调用和许多缓冲区重新分配)。

For desktop machines at this time (2017), I recommend a much larger READALL_CHUNK, perhaps #define READALL_CHUNK 2097152(2 MiB).

对于此时(2017 年)的台式机,我建议使用更大的 . 文件READALL_CHUNK,也许#define READALL_CHUNK 2097152(2 MiB)。

Because the definition of READALL_CHUNKis guarded (i.e., it is defined only if it is at that point in the code still undefined), you can override the default value at compile time, by using (in most C compilers) -DREADALL_CHUNK=2097152command-line option -- but do check your compiler options for defining a preprocessor macro using command-line options.

因为 的定义READALL_CHUNK是受保护的(即,只有在代码中的那个点仍未定义时才定义它),您可以在编译时通过使用(在大多数 C 编译器中)-DREADALL_CHUNK=2097152命令行选项来覆盖默认值--但是请检查您的编译器选项以使用命令行选项定义预处理器宏。

回答by md5

A portable solution could use getc.

便携式解决方案可以使用getc.

##代码##

If you don't want to have a MAX_FILE_SIZEmacro or if it is a big number (such that bufferwould be to big to fit on the stack), use dynamic allocation.

如果您不想拥有MAX_FILE_SIZE宏或者它是一个大数字(以至于buffer太大而无法放入堆栈),请使用动态分配。