C语言 使用 fread() 将文本文件读入缓冲区 - 为什么缓冲区中的值不是每个字符各自的 ASCII 值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18972879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 07:32:34  来源:igfitidea点击:

Using fread() to read text file into a buffer - why are the values in the buffer not each character's respective ASCII value?

c

提问by user2809475

First off, this isn't homework. Just trying to understand why I'm seeing what I'm seeing on my screen.

首先,这不是家庭作业。只是想了解为什么我会看到我在屏幕上看到的内容。

The stuff below (my own work) currently takes an input file and reads it as a binary file. I want it to store each byte read in an array (for later use). For the sake of brevity the input file (Hello.txt) just contains 'Hello World', without the apostrophes.

下面的内容(我自己的工作)当前采用一个输入文件并将其作为二进制文件读取。我希望它将读取的每个字节存储在一个数组中(供以后使用)。为简洁起见,输入文件 (Hello.txt) 仅包含“Hello World”,不带撇号。

int main(int argc, char *argv[]) {

    FILE *input;
    int i, size;
    int *array;

    input = fopen("Hello.txt", "rb");
    if (input == NULL) {
        perror("Invalid file specified.");
        exit(-1);
    }

    fseek(input, 0, SEEK_END);
    size = ftell(input);
    fseek(input, 0, SEEK_SET);

    array = (int*) malloc(size * sizeof(int));
    if (array == NULL) {
        perror("Could not allocate array.");
        exit(-1);
    }
    else {
        input = fopen("Hello.txt", "rb");
        fread(array, sizeof(int), size, input);
        // some check on return value of fread?
        fclose(input);
    }

    for (i = 0; i < size; i++) {
        printf("array[%d] == %d\n", i, array[i]);
    }

Why is it that having the print statement in the for loop as it is above causes the output to look like this

为什么在上面的 for 循环中使用 print 语句会导致输出看起来像这样

array[0] == 1819043144
array[1] == 1867980911
array[2] == 6581362
array[3] == 0
array[4] == 0
array[5] == 0
array[6] == 0
array[7] == 0
array[8] == 0
array[9] == 0
array[10] == 0

while having it like this

在拥有它的同时

printf("array[%d] == %d\n", i, ((char *)array)[i]);

makes the output look like this (decimal ASCII value for each character)

使输出看起来像这样(每个字符的十进制 ASCII 值)

array[0] == 72
array[1] == 101
array[2] == 108
array[3] == 108
array[4] == 111
array[5] == 32
array[6] == 87
array[7] == 111
array[8] == 114
array[9] == 108
array[10] == 100

? If I'm reading it as a binary file and want to read byte by byte, why don't I get the right ASCII value using the first print statement?

? 如果我将它作为二进制文件读取并想逐字节读取,为什么我不能使用第一个打印语句获得正确的 ASCII 值?

On a related note, what happens if the input file I send in isn't a text document (e.g., jpeg)?

在相关说明中,如果我发送的输入文件不是文本文档(例如 jpeg)会怎样?

Sorry is this is an entirely trivial matter, but I can't seem to figure out why.

对不起,这是一件完全微不足道的事情,但我似乎无法弄清楚为什么

回答by ChrisWue

The behaviour is not surprising:

这种行为并不奇怪:

  • You have a file containing 11 characters. sizeof(char)is 1.
  • Now you allocate an array of intwith 11 int. sizeof(int)is very likely to be 4 on your machine
  • You instruct freadto read up to 11 ints (up to 44 bytes). So the first 4 characters will be read as an intand stored in array[0]and the next 4 in array[1].
    • If you had checked the return of freadit would tell you that it actually only read 2 elements (as the content is 11 bytes it can only read 2 ints and the last 3 remaining bytes cannot be successfully read as an int).
  • Now you loop over the array and print the number which is the intyou get build up by the first 4 characters.
  • In your alternative solution you pretent to point to a sequence of chars so the array index will only increment in 1 byte offsets
  • 您有一个包含 11 个字符的文件。sizeof(char)是 1。
  • 现在你分配了一个int11 个整数的数组。sizeof(int)在您的机器上很可能是 4
  • 您指示fread读取最多 11int秒(最多 44 个字节)。因此,前 4 个字符将被读取为 anint并存储array[0]array[1].
    • 如果您检查了它的返回值,fread它会告诉您它实际上只读取了 2 个元素(因为内容是 11 个字节,它只能读取 2int秒,而最后 3 个剩余字节不能成功读取为int)。
  • 现在循环遍历数组并打印int由前 4 个字符组成的数字。
  • 在您的替代解决方案中,您假装指向一系列字符,因此数组索引只会以 1 个字节的偏移量递增

The memory layout basically looks like this:

内存布局基本上是这样的:

array[0]
|       array[1]
|       |
1 2 3 4 5 6 7 8 9 10 11
| |
| ((char *)array)[1]
((char *)array)[0]

回答by simpletron

Your ftell returns the current value of the position indicator of the stream.

您的 ftell 返回流的位置指示器的当前值。

And it returns number of byte the file has. And you are reading file as the sequence of int 4-byte and ofcourse the later element will be 0. For more detail, you are reading 4 x size bytes from a file with size bytes.

它返回文件的字节数。并且您正在读取文件作为 int 4 字节的序列,当然后面的元素将为 0。有关更多详细信息,您正在从具有大小字节的文件中读取 4 x 大小的字节。

Your array should be type of char.

您的数组应该是字符类型。

Something like

就像是

char* array = malloc(sizeOfFile * sizeof(char));
if(array == NULL) {
  ...
}

fread(array, sizeOf(char), sizeOfFile, filePointer);
// ..

Just the idea, not the code. Hope this help;

只是想法,而不是代码。希望这有帮助;