C语言在不知道行长的情况下从文件中读取行

Question

提问by ryyst

I want to read in a file line by line, without knowing the line length before. Here's what I got so far:

我想逐行读取文件，而不知道之前的行长。这是我到目前为止所得到的：

int ch = getc(file);
int length = 0;
char buffer[4095];

while (ch != '\n' && ch != EOF) {
    ch = getc(file);
    buffer[length] = ch;
    length++;
}

printf("Line length: %d characters.", length);

char newbuffer[length + 1];

for (int i = 0; i < length; i++)
    newbuffer[i] = buffer[i];

newbuffer[length] = 'int CUR_MAX = 4095;
char *buffer = (char*) malloc(sizeof(char) * CUR_MAX); // allocate buffer.
int length = 0;

while ( (ch != '\n') && (ch != EOF) ) {
    if(length ==CUR_MAX) { // time to expand ?
      CUR_MAX *= 2; // expand to double the current size of anything similar.
      buffer = realloc(buffer, CUR_MAX); // re allocate memory.
    }
    ch = getc(file); // read from stream.
    buffer[length] = ch; // stuff in buffer.
    length++;
}
.
.
free(buffer);
';    // newbuffer now contains the line.

I can now figure out the line length, but only for lines that are shorter than 4095 characters, plus the two char arrays seem like an awkward way of doing the task. Is there a better way to do this (I already used fgets() but got told it wasn't the best way)?

我现在可以计算出行长，但仅适用于短于 4095 个字符的行，加上两个字符数组似乎是完成任务的一种尴尬方式。有没有更好的方法来做到这一点（我已经使用了 fgets() 但被告知这不是最好的方法）？

--Ry

--赖

Answer 1

采纳答案by codaddict

You can start with some suitable size of your choice and then use reallocmidway if you need more space as:

您可以从您选择的一些合适尺寸开始，然后realloc在需要更多空间时使用：

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>

char *readLine(char **line, size_t *length)
{
    assert(line != NULL);
    assert(length != NULL);

    size_t count = 0;

    *length = *length > 0 ? *length : 1024;

    if (!*line)
    {
        *line = calloc(*length, sizeof(**line));
        if (!*line)
        {
            return NULL;
        }
    }
    else
    {
        memset(*line, 0, *length);
    }

    for (int ch = getc(stdin); ch != '\n' && ch != EOF; ch = getc(stdin))
    {
        if (count == *length)
        {
            *length += 2;
            *line = realloc(*line, *length);
            if (!*line)
            {
                return NULL;
            }
        }

        (*line)[count] = (char)ch;

        ++count;
    }

    return *line;
}

You'll have to check for allocation errors after calls to mallocand realloc.

您必须在调用malloc和后检查分配错误realloc。

Answer 2

回答by jamesdlin

You might want to look into Chuck B. Falconer's public domain ggetslibrary. If you're on a system with glibc, you probably have a (non-standard) getlinefunction available to you.

您可能想查看Chuck B. Falconer 的公共领域ggets库。如果您在使用 glibc 的系统上，您可能有一个（非标准）getline函数可供您使用。

Answer 3

回答by dunst0

That is how i did it for stdin, if you call it like readLine(NULL, 0)the function allocates a buffer for you with the size of 1024 and let it grow in steps of 1024. If you call the function with readLine(NULL, 10)you get a buffer with steps of 10. If you have a buffer you can supply it with it size.

这就是我为 stdin 所做的，如果你调用它就像readLine(NULL, 0)函数为你分配一个大小为 1024 的缓冲区并让它以 1024 的步长增长。如果你调用函数，readLine(NULL, 10)你会得到一个步长为 10 的缓冲区。如果您有缓冲区，则可以为其提供大小。

char *arr = NULL ;
    // Read unlimited string, terminated with newline. Similar to dynamic size fgets.
if ( fscanf(stdin, "%m[^\n]", &arr) == 1 ) {
   // Do something with arr
   free(arr) ;
} ;

Answer 4

回答by Blindy

You're close. Basically you want to read chunks of data and check them for \ncharacters. If you find one, good, you have an end of line. If you don't, you have to increase your buffer (ie allocate a new buffer twice the size of the first one and copy the data from the first one in the new one, then delete the old buffer and rename your new buffer as the old -- or just reallocif you're in C) then read some more until you do find an ending.

你很接近。基本上你想读取数据块并检查它们的\n字符。如果你找到一个，很好，你有一个结束。如果不这样做，则必须增加缓冲区（即分配一个两倍于第一个缓冲区大小的新缓冲区，并将第一个缓冲区中的数据复制到新缓冲区中，然后删除旧缓冲区并将新缓冲区重命名为旧的——或者只是realloc如果你在 C 中）然后再读一些，直到你找到一个结局。

Once you have your ending, the text from the beginning of the buffer to the \ncharacter is your line. Copy it to a buffer or work on it in place, up to you.

一旦你有了结尾，从缓冲区开头到\n字符的文本就是你的行。将其复制到缓冲区或就地处理，由您决定。

After you're ready for the next line, you can copy the "rest" of the input over the current line (basically a left shift) and fill the rest of the buffer with data from the input. You then go again until you run out of data.

在您准备好下一行之后，您可以将输入的“其余部分”复制到当前行（基本上是左移）并用来自输入的数据填充缓冲区的其余部分。然后你再去，直到你用完数据。

This of course can be optimized, with a circular buffer for example, but this should be more than sufficient for any reasonable io-bound algorithm.

这当然可以优化，例如使用循环缓冲区，但这对于任何合理的 io-bound 算法来说应该足够了。

Answer 5

回答by dash-o

Consider the scanf '%m' format conversion modifier (POSIX)

考虑 scanf '%m' 格式转换修饰符 (POSIX)

##代码##

Quoting from scanf man page:

引用 scanf 手册页：

An optional 'm' character. This is used with string conversions (%s, %c, %[), and relieves the caller of the need to allocate a corresponding buffer to hold the input: instead, scanf() allocates a buffer of sufficient size, and assigns the address of this buffer to the corresponding pointer argument, which should be a pointer to a char * variable (this variable does not need to be initialized before the call). The caller should subsequently free(3) this buffer when it is no longer required

一个可选的“m”字符。这与字符串转换 (%s, %c, %[) 一起使用，并使调用者无需分配相应的缓冲区来保存输入：相反， scanf() 分配一个足够大小的缓冲区，并分配地址这个缓冲区的对应指针参数，应该是一个指向char *变量的指针（这个变量不需要在调用前初始化）。当不再需要时，调用者应该随后释放（3）这个缓冲区

C语言在不知道行长的情况下从文件中读取行

提问by ryyst

采纳答案by codaddict

回答by jamesdlin

回答by dunst0

回答by Blindy

回答by dash-o

相关推荐

最近更新

标签

C语言 在不知道行长的情况下从文件中读取行

提问by ryyst

采纳答案by codaddict

回答by jamesdlin

回答by dunst0

回答by Blindy

回答by dash-o

相关推荐

C语言 1L是什么意思？

C语言 Eclipse 中的 C 代码自动完成

C语言 从 cuda 内核打印

C语言 如何从一个范围内生成一个随机整数

相关推荐

最近更新

标签

C语言在不知道行长的情况下从文件中读取行

C语言从 cuda 内核打印

C语言如何从一个范围内生成一个随机整数