C语言读取未知大小的文本文件

Question

提问by Amir

I am trying to read in a text file of unknown size into an array of characters. This is what I have so far.

我试图将未知大小的文本文件读入一个字符数组。这是我到目前为止。

#include<stdio.h>
#include<string.h>

    int main()
    {
            FILE *ptr_file;
            char buf[1000];
        char output[];
            ptr_file =fopen("CodeSV.txt","r");
            if (!ptr_file)
                return 1;   

        while (fgets(buf,1000, ptr_file)!=NULL)
            strcat(output, buf);
        printf("%s",output);

    fclose(ptr_file);

    printf("%s",output);
        return 0;
}

But I do not know how to allocate a size for the output array when I am reading in a file of unknown size. Also when I put in a size for the output say n=1000, I get segmentation fault. I am a very inexperienced programmer any guidance is appreciated :)

但是当我读入一个未知大小的文件时，我不知道如何为输出数组分配一个大小。此外，当我输入输出的大小时说 n=1000，我得到了分段错误。我是一个非常缺乏经验的程序员任何指导表示赞赏:)

The textfile itself is technically a .csv file so the contents look like the following : "0,0,0,1,0,1,0,1,1,0,1..."

文本文件本身在技术上是一个 .csv 文件，因此内容如下所示：“0,0,0,1,0,1,0,1,1,0,1...”

Answer 1

采纳答案by Steve Summit

The standard way to do this is to use mallocto allocate an array of some size, and start reading into it, and if you run out of array before you run out of characters (that is, if you don't reach EOFbefore filling up the array), pick a bigger size for the array and use reallocto make it bigger.

执行此操作的标准方法是使用malloc分配某个大小的数组，然后开始读入它，如果在用完字符之前用完数组（也就是说，如果EOF在填充之前没有到达）数组），为数组选择一个更大的大小并使用realloc它来使它更大。

Here's how the read-and-allocate loop might look. I've chosen to read input a character at a time using getchar(rather than a line at a time using fgets).

下面是 read-and-allocate 循环的样子。我选择使用一次读取输入一个字符getchar（而不是使用一次读取一行fgets）。

int c;
int nch = 0;
int size = 10;
char *buf = malloc(size);
if(buf == NULL)
    {
    fprintf(stderr, "out of memory\n");
    exit(1);
    }

while((c = getchar()) != EOF)
    {
    if(nch >= size-1)
        {
        /* time to make it bigger */
        size += 10;
        buf = realloc(buf, size);
        if(buf == NULL)
            {
            fprintf(stderr, "out of memory\n");
            exit(1);
            }
        }

    buf[nch++] = c;
    }

buf[nch++] = '#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define LMAX 255

int main (int argc, char **argv) {

    if (argc < 2 ) {
        fprintf (stderr, "error: insufficient input, usage: %s <filename>\n",
                 argv[0]);
        return 1;
    }

    char **array = NULL;        /* array of pointers to char        */ 
    char *ln = NULL;            /* NULL forces getline to allocate  */
    size_t n = 0;               /* buf size, 0 use getline default  */
    ssize_t nchr = 0;           /* number of chars actually read    */
    size_t idx = 0;             /* array index for number of lines  */
    size_t it = 0;              /* general iterator variable        */
    size_t lmax = LMAX;         /* current array pointer allocation */
    FILE *fp = NULL;            /* file pointer                     */

    if (!(fp = fopen (argv[1], "r"))) { /* open file for reading    */
        fprintf (stderr, "error: file open failed '%s'.", argv[1]);
        return 1;
    }

    /* allocate LMAX pointers and set to NULL. Each of the 255 pointers will
       point to (hold the address of) the beginning of each string read from
       the file below. This will allow access to each string with array[x].
    */
    if (!(array = calloc (LMAX, sizeof *array))) {
        fprintf (stderr, "error: memory allocation failed.");
        return 1;
    }

    /* prototype - ssize_t getline (char **ln, size_t *n, FILE *fp)
       above we declared: char *ln and size_t n. Why don't they match? Simple,
       we will be passing the address of each to getline, so we simply precede
       the variable with the urinary '&' which forces an addition level of
       dereference making char* char** and size_t size_t *. Now the arguments
       match the prototype.
    */
    while ((nchr = getline (&ln, &n, fp)) != -1)    /* read line    */
    {
        while (nchr > 0 && (ln[nchr-1] == '\n' || ln[nchr-1] == '\r'))
            ln[--nchr] = 0;     /* strip newline or carriage rtn    */

        /* allocate & copy ln to array - this will create a block of memory
           to hold each character in ln and copy the characters in ln to that
           memory address. The address will then be stored in array[idx].
           (idx++ just increases idx by 1 so it is ready for the next address) 
           There is a lot going on in that simple: array[idx++] = strdup (ln);
        */
        array[idx++] = strdup (ln);

        if (idx == lmax) {      /* if lmax lines reached, realloc   */
            char **tmp = realloc (array, lmax * 2 * sizeof *array);
            if (!tmp)
                return -1;
            array = tmp;
            lmax *= 2;
        }
    }

    if (fp) fclose (fp);        /* close file */
    if (ln) free (ln);          /* free memory allocated to ln  */

    /* 
        process/use lines in array as needed
        (simple print all lines example below)
    */

    printf ("\nLines in file:\n\n");    /* print lines in file  */
    for (it = 0; it < idx; it++)    
        printf ("  array [%3zu]  %s\n", it, array[it]);
    printf ("\n");

    for (it = 0; it < idx; it++)        /* free array memory    */
        free (array[it]);
    free (array);

    return 0;
}
';

printf("\"%s\"", buf);

Two notes about this code:

关于这段代码的两个注意事项：

The numbers 10 for the initial size and the increment are much too small; in real code you'd want to use something considerably bigger.
It's easy to forget to ensure that there's room for the trailing '\0'; in this code I've tried to do that with the -1in if(nch >= size-1).

初始大小和增量的数字 10 太小了；在实际代码中，您会想要使用更大的东西。
很容易忘记确保尾随的 '\0' 有空间；在这段代码中，我试图用-1in来做到这一点if(nch >= size-1)。

Answer 2

回答by David C. Rankin

I would be remiss if I didn't add to the answers probably one of the most standard ways of reading an unknown number of lines of unknown length from a text file. In C you have two primary methods of character input. (1) character-orientedinput (i.e. getchar, getc, etc..) and (2) line-orientedinput (i.e. fgets, getline).

如果我不添加答案，我将是失职的，这可能是从文本文件中读取未知长度的未知行数的最标准方法之一。在 C 中，您有两种主要的字符输入方法。(1)面向字符的输入（即getchar,getc等）和 (2)面向行的输入（即fgets, getline）。

From that mix of functions, the POSIX function getlineby default will allocate sufficient space to read a line of any length (up to the exhaustion of system memory). Further, when reading linesof input, line-orientedinput is generally the proper choice.

从这些函数组合中，POSIX 函数getline默认会分配足够的空间来读取任意长度的行（直到耗尽系统内存）。此外，在读取时线输入的，面向行的输入一般是合适的选择。

To read an unknown number of lines, the general approach is to allocate an anticipated number of pointers (in an array of pointers-to-char) and then reallocate as necessary if you end up needing more. If you want to work with the complexities of stringing pointers-to-struct together in a linked-list, that's fine, but it is far simpler to handle an array of strings. (a linked-list is more appropriate when you have a struct with multiple members, rather than a single line)

要读取未知数量的行，一般方法是分配预期数量的指针（在一个pointers-to-char 数组中），然后如果您最终需要更多，则根据需要重新分配。如果您想处理在链表中将指向结构的指针串在一起的复杂性，那很好，但处理字符串数组要简单得多。（当您有一个包含多个成员的结构而不是单行时，链表更合适）

The process is straight forward. (1) allocate memory for some initial number of pointers (LMAXbelow at 255) and then as each line is read (2) allocate memory to hold the line and copy the line to the array (strdupis used below which both (a) allocates memory to hold the string, and (b) copies the string to the new memory block returning a pointer to its address)(You assign the pointer returned to your array of strings as array[x])

这个过程是直接的。(1) 为一些初始数量的指针分配内存（LMAX低于 at 255），然后在读取每一行时（2）分配内存以保存该行并将该行复制到数组（strdup用于以下两个（a）分配内存到保存字符串，并且 (b) 将字符串复制到新的内存块，返回指向其地址的指针）（您将返回到字符串数组的指针分配为array[x]）

As with any dynamic allocation of memory, youare responsible for keeping track of the memory allocated, preserving a pointer to the start of each allocated block of memory (so you can free it later), and then freeing the memory when it is no longer needed. (Use valgrindor some similar memory checker to confirm you have no memory errors and have freed all memory you have created)

与任何动态内存分配一样，您负责跟踪分配的内存，保留一个指向每个已分配内存块开头的指针（以便您以后可以释放它），然后在不再需要时释放内存需要。（使用valgrind或一些类似的内存检查器来确认您没有内存错误并释放了您创建的所有内存）

Below is an example of the approach which simply reads any text file and prints its lines back to stdoutbefore freeing the memory allocated to hold the file. Once you have read all lines (or while you are reading all lines), you can easily parse your csvinput into individual values.

下面是该方法的示例，该方法仅读取任何文本文件并stdout在释放分配用于保存该文件的内存之前将其行打印回。阅读完所有行后（或在阅读所有行时），您可以轻松地将csv输入解析为单个值。

Note:below, when LMAXlines have been read, the arrayis reallocated to hold twice as many as before and the read continues. (You can set LMAXto 1if you want to allocate a new pointer for each line, but that is a very inefficientway to handle memory allocation) Choosing some reasonable anticipated starting value, and then reallocating 2Xthe current is a standard reallocation approach, but you are free to allocate additional blocks in any size you choose.

注意：下面，当LMAX读取了行后，array重新分配以容纳两倍于以前的行并继续读取。（可以设置LMAX到1，如果你要分配给每行一个新的指针，但是这是一个非常低效的方法来处理内存分配）选择一些合理预期的初始值，然后重新分配2X的电流是标准的重新分配的做法，但你以您选择的任何大小自由分配额外的块。

Look over the code and let me know if you have any questions.

查看代码，如果您有任何问题，请告诉我。

$ ./bin/getline_rdfile dat/damages.txt

Lines in file:

  array [  0]  Personal injury damage awards are unliquidated
  array [  1]  and are not capable of certain measurement; thus, the
  array [  2]  jury has broad discretion in assessing the amount of
  array [  3]  damages in a personal injury case. Yet, at the same
  array [  4]  time, a factual sufficiency review insures that the
  array [  5]  evidence supports the jury's award; and, although
  array [  6]  difficult, the law requires appellate courts to conduct
  array [  7]  factual sufficiency reviews on damage awards in
  array [  8]  personal injury cases. Thus, while a jury has latitude in
  array [  9]  assessing intangible damages in personal injury cases,
  array [ 10]  a jury's damage award does not escape the scrutiny of
  array [ 11]  appellate review.
  array [ 12]
  array [ 13]  Because Texas law applies no physical manifestation
  array [ 14]  rule to restrict wrongful death recoveries, a
  array [ 15]  trial court in a death case is prudent when it chooses
  array [ 16]  to submit the issues of mental anguish and loss of
  array [ 17]  society and companionship. While there is a
  array [ 18]  presumption of mental anguish for the wrongful death
  array [ 19]  beneficiary, the Texas Supreme Court has not indicated
  array [ 20]  that reviewing courts should presume that the mental
  array [ 21]  anguish is sufficient to support a large award. Testimony
  array [ 22]  that proves the beneficiary suffered severe mental
  array [ 23]  anguish or severe grief should be a significant and
  array [ 24]  sometimes determining factor in a factual sufficiency
  array [ 25]  analysis of large non-pecuniary damage awards.

Use/Output

使用/输出

$ valgrind ./bin/getline_rdfile dat/damages.txt
==14321== Memcheck, a memory error detector
==14321== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==14321== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==14321== Command: ./bin/getline_rdfile dat/damages.txt
==14321==

Lines in file:

  array [  0]  Personal injury damage awards are unliquidated
  <snip>
  ...
  array [ 25]  analysis of large non-pecuniary damage awards.

==14321==
==14321== HEAP SUMMARY:
==14321==     in use at exit: 0 bytes in 0 blocks
==14321==   total heap usage: 29 allocs, 29 frees, 3,997 bytes allocated
==14321==
==14321== All heap blocks were freed -- no leaks are possible
==14321==
==14321== For counts of detected and suppressed errors, rerun with: -v
==14321== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

Memory Check

内存检查

int main(int argc, char** argv)
{
   FILE* fpInputFile = NULL; 
   unsigned long ulSize = 0;  // Input File size
   unsigned long ulIteration = 0; 
   unsigned char* ucBuffer; // Buffer data

  if(argc != 2)
  {
   printf("Enter ihe file name \n");
   return -1;
  }
  fpInputFile = fopen(argv[1],"r"); // file open

  if(!fpInputFile){
    fprintf(stderr,"File opening failed");
  }
  fseek(fpInputFile,0,SEEK_END);
  ulSize = ftell(fpInputFile); //current file position
  fseek(fpInputFile,0,SEEK_SET);
  ucBuffer = (unsigned char*)malloc(ulSize); // memory allocation for ucBuffer var
  fread(ucBuffer,1,ulSize,fpInputFile); // Read file
  fclose(fpInputFile); // close the  file
 }

Answer 3

回答by abhi312

char* buffer;
size_t result;
long lSize;

pFile = fopen("CodeSV.txt","r");
if (pFile==NULL) {fputs ("File error",stderr); exit (1);}

// obtain file size:
fseek (pFile , 0 , SEEK_END);
lSize = ftell (pFile);
rewind (pFile);
buffer = malloc(lSize);

// copy the file into the buffer:
result = fread (buffer,1,lSize,pFile);
if (result != lSize) {fputs ("Reading error 2",stderr); exit (3);}

/* the whole file is now loaded in the memory buffer. */
fclose (pFile);

Use fseek and ftell to get offset of text file

使用 fseek 和 ftell 获取文本文件的偏移量

Answer 4

回答by rakeb.mazharul

I wrote the following code for reading a file of unknown size and take every character into a buffer (works perfectly for me). Please read the following references to get a good grip on file handling:

我编写了以下代码来读取未知大小的文件并将每个字符放入缓冲区（对我来说非常有用）。请阅读以下参考资料以更好地掌握文件处理：

fseek,
ftell,
rewind,
fread,
fwrite(out of the scope of this question).

搜索，
告诉，
倒带，
恐惧，
fwrite（超出本问题的范围）。

Try something like this:

尝试这样的事情：

#include<stdio.h>
#include<string.h>

    int main()
    {
        FILE *ptr_file;
        char output[10000];
        ptr_file =fopen("lol_temp.txt","r");
        if (!ptr_file)
           return 1;   
        int bytes_read = fread(output,1,10000,ptr_file);
        fclose(ptr_file);
        printf("%s",output);
        return 0;
    }

Answer 5

回答by Milan Patel

If size of the file which you are reading is not much large then you can try this:

如果您正在阅读的文件的大小不是很大，那么您可以尝试以下操作：

#include <stdio.h>
#include <stdlib.h>

typedef char Titem; //just to identify it
// Interface of list
typedef struct node *Tpointer;
typedef struct node {
    Titem item;
    Tpointer next;
} Tnode;
typedef Tpointer Tlist;

void initialize_list(Tlist *list);
void insert_to_list_end(Tlist *list, Titem data);
void cleanup_list(Tlist *list);

// Implementation of list (only obj file is need in your application)
void initialize_list(Tlist *list) {
    *list = NULL;
}
void insert_to_list_end(Tlist *list, Titem data) {
    Tpointer newnode, last = *list;
    newnode = (Tpointer)malloc(sizeof(Tnode));
    newnode->item = data;
    newnode->next = NULL;
    if (last == NULL){
        *list = newnode;
    }//first node
    else{
        while (1) {
            if (last->next == NULL) {
                last->next = newnode;
                break;
            }
            last = last->next;
        }
    }
}
void cleanup_list(Tlist *list) {
    Tpointer aux1, aux2;
    aux1 = *list;
    while (aux1 != NULL) {
        aux2 = aux1->next;
        free(aux1);
        printf("\nDeleted"); //for testing purposes
        aux1 = aux2;
    }
    initialize_list(list);
}

#define file_dir "CodeSV.txt"
int main(void){
    FILE *fp;
    fp = fopen(file_dir, "r");
    int counter = 1;
    Tlist list;
    if (fp) {
        initialize_list(&list);
        int c;
        while ((c = getc(fp)) != EOF){
            insert_to_list_end(&list, (char)c);
            counter++;
        }
        fclose(fp);
    }
    else{ printf("file not found"); return 0; }

    //creating a string with what you read
    char stringFromFile[counter];
    Tlist currentNode = list;
    int i;
    for (i = 0; i <= counter; i++) {
        stringFromFile[i] = currentNode->item;
        currentNode = currentNode->next;
        if (currentNode == NULL) { break; }
    }
    printf("WHAT YOU JUST READ: %s", stringFromFile);

    /*here you can manipulate the string as you wish. But remember to free the linked list (call cleanup_list) when u're done*/
    cleanup_list(&list);
    return 1;
}

Answer 6

回答by Lukas

This is better done using dynamic linked list than an array. Here I have a simple liked list to store every char you read from the file. since you said "ultimately I want to read the file into a string, and manipulate the string and output that modified string as a new text file" I finally created a string of the file. I tested it so I guess it should work fine :) You can separate the interface and implementation of the list to separate file or even use obj. file of the implementation

使用动态链表比使用数组更好。在这里，我有一个简单的喜欢列表来存储您从文件中读取的每个字符。由于您说“最终我想将文件读入一个字符串，并操作该字符串并将修改后的字符串输出为一个新的文本文件”，因此我最终创建了该文件的一个字符串。我测试了它，所以我想它应该可以正常工作:) 您可以将列表的接口和实现分开来单独的文件，甚至可以使用 obj。实施文件

#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h> 

#define LINE_MAXSIZE 65536
typedef struct line_T {
  struct line_T *next;
  char *data;
  size_t length;
} line_T;

line_T *ReadFile(FILE *istream) {
  line_T head;
  line_T *p = &head;
  char *buf = malloc(LINE_MAXSIZE);
  assert(buf);
  while (fgets(buf, LINE_MAXSIZE, istream)) {
    p->next = malloc(sizeof *(p->next));
    assert(p->next);
    p = p->next;
    p->next = NULL;
    p->length = strlen(buf);
    assert(p->length < LINE_MAXSIZE - 1);  // TBD: cope with long lines
    p->data = malloc(p->length + 1);
    assert(p->data);
    memcpy(p->data, buf, p->length + 1);
  }
  free(buf);
  return head.next;
}

unsigned long long CountConsumeData(line_T *p) {
  unsigned long long sum = 0;
  while (p) {
    sum += p->length;
    free(p->data);
    line_T *next = p->next;
    free(p);
    p = next;
  }
  return sum;
}

int main(void) {
  const char *fname = "CodeSV.txt";
  FILE *istream = fopen(fname, "r");
  line_T *p = ReadFile(istream);
  fclose(istream);
  printf("Length : %llu\n", CountConsumeData(p));
  return 0;
}

Answer 7

回答by chux - Reinstate Monica

Should OP wants to do text processing and manipulate lines, instead of reading the entire file into 1 string, make a linked list of lines.

如果 OP 想要进行文本处理和操作行，而不是将整个文件读入 1 个字符串，而是创建一个行的链接列表。

##代码##

C语言读取未知大小的文本文件

提问by Amir

采纳答案by Steve Summit

回答by David C. Rankin

回答by abhi312

回答by rakeb.mazharul

回答by Milan Patel

回答by Lukas

回答by chux - Reinstate Monica

相关推荐

最近更新

标签

C语言 读取未知大小的文本文件

提问by Amir

采纳答案by Steve Summit

回答by David C. Rankin

回答by abhi312

回答by rakeb.mazharul

回答by Milan Patel

回答by Lukas

回答by chux - Reinstate Monica

相关推荐

C语言 如何将文件指针移动到文件中的下一行？

C语言 在 C 编程中，编译时什么是“未定义引用”错误？

C语言 使用 GCC 交叉编译器时对 printf 的未定义引用

C语言 C中空格的符号是什么？

相关推荐

最近更新

标签

C语言读取未知大小的文本文件

C语言如何将文件指针移动到文件中的下一行？

C语言在 C 编程中，编译时什么是“未定义引用”错误？

C语言使用 GCC 交叉编译器时对 printf 的未定义引用