C语言 将文本文件读入 C 缓冲区的正确方法?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2029103/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 04:06:47  来源:igfitidea点击:

Correct way to read a text file into a buffer in C?

cinputbuffer

提问by Gary Willoughby

I'm dealing with small text files that i want to read into a buffer while i process them, so i've come up with the following code:

我正在处理我想在处理它们时读入缓冲区的小文本文件,所以我想出了以下代码:

...
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}
...

Is this the correct way of putting the contents of the file into the buffer or am i abusing strcat()?

这是将文件内容放入缓冲区的正确方法还是我滥用strcat()

I then iterate through the buffer thus:

然后我遍历缓冲区,因此:

for(int x = 0; (c = source[x]) != '
char source[1000000];

FILE *fp = fopen("TheFile.txt", "r");
if(fp != NULL)
{
    while((symbol = getc(fp)) != EOF)
    {
        strcat(source, &symbol);
    }
    fclose(fp);
}
'; x++) { //Process chars }

回答by Michael

#include <stdio.h>
#define MAXBUFLEN 1000000

char source[MAXBUFLEN + 1];
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    size_t newLen = fread(source, sizeof(char), MAXBUFLEN, fp);
    if ( ferror( fp ) != 0 ) {
        fputs("Error reading file", stderr);
    } else {
        source[newLen++] = '
#include <stdio.h>
#include <stdlib.h>

char *source = NULL;
FILE *fp = fopen("foo.txt", "r");
if (fp != NULL) {
    /* Go to the end of the file. */
    if (fseek(fp, 0L, SEEK_END) == 0) {
        /* Get the size of the file. */
        long bufsize = ftell(fp);
        if (bufsize == -1) { /* Error */ }

        /* Allocate our buffer to that size. */
        source = malloc(sizeof(char) * (bufsize + 1));

        /* Go back to the start of the file. */
        if (fseek(fp, 0L, SEEK_SET) != 0) { /* Error */ }

        /* Read the entire file into memory. */
        size_t newLen = fread(source, sizeof(char), bufsize, fp);
        if ( ferror( fp ) != 0 ) {
            fputs("Error reading file", stderr);
        } else {
            source[newLen++] = '
#include  <unistd.h> 
void main()
{
    struct stat stat;
    int fd;
    //get file descriptor
    fstat(fd, &stat);
    //the size of the file is now in stat.st_size
}
'; /* Just to be safe. */ } } fclose(fp); } free(source); /* Don't forget to call free() later! */
'; /* Just to be safe. */ } fclose(fp); }

There are quite a few things wrong with this code:

这段代码有很多问题:

  1. It is very slow (you are extracting the buffer one character at a time).
  2. If the filesize is over sizeof(source), this is prone to buffer overflows.
  3. Really, when you look at it more closely, this code should not work at all. As stated in the man pages:
  1. 它非常慢(您一次提取一个字符的缓冲区)。
  2. 如果文件大小超过sizeof(source),这很容易发生缓冲区溢出。
  3. 真的,当你更仔细地观察它时,这段代码根本不应该工作。如手册页所述:

The strcat()function appends a copy of the null-terminated string s2 to the end of the null-terminated string s1, then add a terminating `\0'.

strcat()函数将空终止字符串 s2 的副本附加到空终止字符串 s1 的末尾,然后添加一个终止“\0”。

You are appending a character (not a NUL-terminated string!) to a string that may or may not be NUL-terminated. The onlytime I can imagine this working according to the man-page description is if every character in the file is NUL-terminated, in which case this would be rather pointless. So yes, this is most definitely a terrible abuse of strcat().

您将一个字符(不是以 NUL 结尾的字符串!)附加到一个可能以也可能不会以 NUL 结尾的字符串中。的唯一一次我能想象这个根据手册页描述的工作是,如果该文件中的每个字符是NULL结尾的,在这种情况下,这将是毫无意义。所以是的,这绝对是对strcat().

The following are two alternatives to consider using instead.

以下是可以考虑使用的两种替代方法。

If you know the maximum buffer size ahead of time:

如果您提前知道最大缓冲区大小:

   source[i] = getc(fp); 
   i++;

Or, if you do not:

或者,如果您不这样做:

#define DEFAULT_SIZE 100
#define STEP_SIZE 100

char *buffer[DEFAULT_SIZE];
size_t buffer_sz=DEFAULT_SIZE;
size_t i=0;
while(!feof(fp)){
  buffer[i]=fgetc(fp);
  i++;
  if(i>=buffer_sz){
    buffer_sz+=STEP_SIZE;
    void *tmp=buffer;
    buffer=realloc(buffer,buffer_sz);
    if(buffer==null){ free(tmp); exit(1);} //ensure we don't have a memory leak
  }
}
buffer[i]=0;

回答by Martin Beckett

Yes - you would probably be arrested for your terriable abuse of strcat !

是的 - 你可能会因为你对 strcat 的可怕滥用而被捕!

Take a look at getline() it reads the data a line at a time but importantly it can limit the number of characters you read, so you don't overflow the buffer.

看看 getline() 它一次读取一行数据,但重要的是它可以限制您读取的字符数,因此您不会溢出缓冲区。

Strcat is relatively slow because it has to search the entire string for the end on every character insertion. You would normally keep a pointer to the current end of the string storage and pass that to getline as the position to read the next line into.

strcat 相对较慢,因为它必须在每次插入字符时搜索整个字符串以查找结尾。您通常会保留一个指向字符串存储当前末尾的指针,并将其传递给 getline 作为读取下一行的位置。

回答by toweleeele

If you're on a linux system, once you have the file descriptor you can get a lot of information about the file using fstat()

如果您使用的是 linux 系统,一旦您拥有文件描述符,您就可以使用 fstat() 获取有关该文件的大量信息

http://linux.die.net/man/2/stat

http://linux.die.net/man/2/stat

so you might have

所以你可能有

##代码##

This avoids seeking to the beginning and end of the file.

这避免了查找文件的开头和结尾。

回答by Martin Wickman

Why don't you just use the array of chars you have? This ought to do it:

你为什么不直接使用你拥有的字符数组?这应该这样做:

##代码##

回答by Earlz

Not tested, but should work.. And yes, it could be better implemented with fread, I'll leave that as an exercise to the reader.

没有经过测试,但应该可以工作.. 是的,它可以用 fread 更好地实现,我将把它留给读者作为练习。

##代码##

回答by Mark Ransom

See this article from JoelOnSoftwarefor why you don't want to use strcat.

请参阅JoelOnSoftware 的这篇文章,了解您为什么不想使用strcat.

Look at freadfor an alternative. Use it with 1 for the size when you're reading bytes or characters.

查看fread寻找替代方案。读取字节或字符时,将其与 1 一起用于大小。

回答by Ioan

Have you considered mmap()? You can read from the file directly as if it were already in memory.

你考虑过 mmap() 吗?您可以直接从文件中读取,就好像它已经在内存中一样。

http://beej.us/guide/bgipc/output/html/multipage/mmap.html

http://beej.us/guide/bgipc/output/html/multipage/mmap.html