C语言 二进制文件的 EOF 到底是什么?健康)状况?特点?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16677632/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What really is EOF for binary files? Condition? Character?
提问by Thokchom
I have managed this far with the knowledge that EOFis a special character inserted automatically at the end of a text file to indicate its end. But I now feel the need for some more clarification on this. I checked on Google and the Wikipedia page for EOFbut they couldn't answer the following, and there are no exact Stack Overflow links for this either. So please help me on this:
到目前为止,我已经知道这EOF是一个特殊字符,它会自动插入到文本文件的末尾以指示其结尾。但我现在觉得有必要对此进行更多澄清。我在谷歌和维基百科页面上查过,EOF但他们无法回答以下问题,也没有确切的 Stack Overflow 链接。所以请帮我解决这个问题:
My book says that binary mode files keep track of the end of file from the number of characters present in the directory entry of the file. (In contrast to text files which have a special EOF character to mark the end). So what is the story of
EOFin context of binary files? I am confused because in the following program I successfully use!=EOFcomparison while reading from an.exefile in binary mode:#include<stdio.h> #include<stdlib.h> int main() { int ch; FILE *fp1,*fp2; fp1=fopen("source.exe","rb"); fp2=fopen("dest.exe","wb"); if(fp1==NULL||fp2==NULL) { printf("Error opening files"); exit(-1); } while((ch=getc(fp1))!=EOF) putc(ch,fp2); fclose(fp1); fclose(fp2); }Is
EOFa special "character" at all? Or is it a conditionas Wikipedia says, a condition where the computer knows when to return a particular value like-1(EOFon my computer)? Example of such "condition" being when a character-reading function finishes reading all characters present, or when character/string I/O functions encounter an error in reading/writing?Interestingly, the Stack Overflow tag for
EOFblended both those definitions of theEOF. The tag forEOFsaid "In programming realm, EOF is a sequence of byte (or a chacracter) which indicates that there are no more contents after this.", while it also said in the "about" section that "End of file (commonly abbreviated EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream."
我的书说二进制模式文件根据文件目录条目中存在的字符数跟踪文件的结尾。(与具有特殊 EOF 字符来标记结尾的文本文件相反)。那么
EOF在二进制文件的上下文中的故事是什么?我很困惑,因为在以下程序中,我在以二进制模式!=EOF读取.exe文件时成功使用了比较:#include<stdio.h> #include<stdlib.h> int main() { int ch; FILE *fp1,*fp2; fp1=fopen("source.exe","rb"); fp2=fopen("dest.exe","wb"); if(fp1==NULL||fp2==NULL) { printf("Error opening files"); exit(-1); } while((ch=getc(fp1))!=EOF) putc(ch,fp2); fclose(fp1); fclose(fp2); }是
EOF一个特殊的“角色”吗?或者它是一个条件为维基说,当计算机知道何时返回像一个特定值的情况-1(EOF我的电脑上)?这种“条件”的例子是当字符读取函数完成读取所有存在的字符时,或者当字符/字符串 I/O 函数在读取/写入时遇到错误?有趣的是,Stack Overflow 标签
EOF混合了EOF. 为标记EOF表示“以编程领域,EOF是字节的一个序列(或一个chacracter),这表明存在在此之后没有更多的内容。” ,同时在“关于”部分也说“文件结束(通常缩写为EOF)是计算机操作系统中无法从数据源读取更多数据的情况。数据源通常称为文件或溪流。”
But I have a strong feeling EOFwon't be a character as every other function seems to be returning it when it encounters an error during I/O.
但我有一种强烈的感觉,EOF不会是一个角色,因为其他所有函数似乎在 I/O 期间遇到错误时都会返回它。
It will be really nice of you if you can clear the matter for me.
如果你能帮我解决这个问题,那真是太好了。
回答by Eric Postpischil
The various EOF indicators that C provides to you do not necessarily have anything to do with how the file system marks the end of a file.
C 提供给您的各种 EOF 指示符不一定与文件系统如何标记文件结尾有关。
Most modern file systems know the length of a file because they record it somewhere, separately from the contents of the file. The routines that read the file keep track of where you are reading and they stop when you reach the end. The C library routines generate an EOF value to return to you; they are not returning a value that is actually in the file.
大多数现代文件系统都知道文件的长度,因为它们将文件记录在某处,与文件的内容分开。读取文件的例程会跟踪您正在阅读的位置,并在您到达末尾时停止。C 库例程生成一个 EOF 值返回给您;它们没有返回文件中实际存在的值。
Note that the EOF returned by C library routines is not actually a character. The C library routines generally return an int, and that intis eithera character value or an EOF. E.g., in one implementation, the characters might have values from 0 to 255, and EOF might have the value ?1. When the library routine encountered the end of the file, it did not actually see a ?1 character, because there is no such character. Instead, it was told by the underlying system routine that the end of file had been reached, and it responded by returning ?1 to you.
请注意,C 库例程返回的 EOF 实际上不是一个字符。C库例程通常返回int,这int是任一字符值或EOF。例如,在一种实现中,字符可能具有从 0 到 255 的值,而 EOF 可能具有值 ?1。当库例程遇到文件结尾时,它实际上并没有看到 ?1 字符,因为没有这样的字符。相反,底层系统例程告诉它已到达文件末尾,它通过向您返回 ?1 来响应。
Old and crude file systems might have a value in the file that marks the end of file. For various reasons, this is usually undesirable. In its simplest implementation, it makes it impossible to store arbitrary data in the file, because you cannot store the end-of-file marker as data. One could, however, have an implementation in which the raw data in the file contains something that indicates the end of file, but data is transformed when reading or writing so that arbitrary data can be stored. (E.g., by “quoting” the end-of-file marker.)
旧的和原始的文件系统可能在文件中有一个值来标记文件的结尾。由于各种原因,这通常是不可取的。在其最简单的实现中,无法在文件中存储任意数据,因为您无法将文件结束标记存储为数据。但是,可以有一种实现,其中文件中的原始数据包含指示文件结尾的内容,但在读取或写入时会转换数据,以便可以存储任意数据。(例如,通过“引用”文件结束标记。)
In certain cases, things like end-of-file markers also appear in streams. This is common when reading from the terminal (or a pseudo-terminal or terminal-like device). On Windows, pressing control-Z is an indication that the user is done entering input, and it is treated similarly to reach an end-of-file. This does not mean that control-Z is an EOF. The software reading from the terminal sees control-Z, treats it as end-of-file, and returns end-of-file indications, which are likely different from control-Z. On Unix, control-D is commonly a similar sentinel marking the end of input.
在某些情况下,文件结束标记等内容也会出现在流中。这在从终端(或伪终端或类似终端的设备)读取时很常见。在 Windows 上,按下 control-Z 表示用户已完成输入输入,并且与到达文件结尾的处理方式类似。这并不意味着 control-Z 是 EOF。从终端读取的软件看到 control-Z,将其视为文件结束,并返回文件结束指示,这可能与 control-Z 不同。在 Unix 上,control-D 通常是一个类似的标记输入结束的哨兵。
回答by Christopher Neylan
This should clear it up nicely for you.
Basically, EOF is just a macro with a pre-defined value representing the error code from I/O functions indicating that there is no more data to be read.
基本上,EOF 只是一个带有预定义值的宏,代表来自 I/O 函数的错误代码,表明没有更多数据要读取。
回答by Christopher Neylan
The file doesn't actually contain an EOF. EOF isn't a character of sorts - remember a byte can be between 0 and 255, so it wouldn't make sense if a file could contain a -1. The EOF is a signal from the operating system that you're using, which indicates the end of the file has been reached. Notice how getc() returns an int- that is so it can return that -1 to tell you the stream has reached the end of the file.
该文件实际上并不包含 EOF。EOF 不是某种字符 - 请记住,一个字节可以在 0 到 255 之间,因此如果文件可以包含 -1,则没有意义。EOF 是来自您正在使用的操作系统的信号,表示已到达文件末尾。请注意 getc() 如何返回一个int- 即它可以返回 -1 以告诉您流已到达文件末尾。
The EOF signal is treated the same for binary and text files - the actual definition of binary and text stream varies between the OSes (for example on *nix binary and text mode are the same thing.) Either way, as stated above, it is not part of the file itself. The OS passes it to getc() to tell the program that the end of the stream has been reached.
对于二进制文件和文本文件,EOF 信号的处理方式相同 - 二进制和文本流的实际定义因操作系统而异(例如在 *nix 二进制和文本模式上是相同的。)无论哪种方式,如上所述,它都是不是文件本身的一部分。操作系统将它传递给 getc() 以告诉程序已到达流的末尾。
来自GNU C 库:
This macro is an integer value that is returned by a number of narrow stream functions to indicate an end-of-file condition, or some other error situation. With the GNU C Library, EOF is -1. In other libraries, its value may be some other negative number.
这个宏是一个整数值,由许多窄流函数返回,以指示文件结束条件或其他一些错误情况。对于 GNU C 库,EOF 为 -1。在其他库中,它的值可能是其他一些负数。
回答by Eric
EOFis not a character. In this context, it's -1, which, technically, isn't a character (if you wanted to be extremely precise, it could be argued that it could be a character, but that's irrelevant in this discussion). EOF, just to be clear is "End of File". While you're reading a file, you need to know when to stop, otherwise a number of things could happen depending on the environment if you try to read pastthe end of the file.
EOF不是一个字符。在这种情况下,它是 -1,从技术上讲,它不是一个字符(如果您想非常精确,可以说它可能是一个字符,但这与本讨论无关)。 EOF,要清楚的是“文件结束”。当你读文件,你需要知道什么时候停止,否则很多事情可能发生取决于环境,如果你尝试读取过去的文件的末尾。
So, a macro was devised to signal that End of File has been reached in the course of reading a file, which is EOF. For getcthis works because it returns an intrather than a char, so there's extra room to return something other than a charto signal EOF. Other I/O calls may signal EOFdifferently, such as by throwing an exception.
因此,设计了一个宏来表示在读取文件的过程中已到达文件结尾,即EOF. 对于getc这个作品,因为它返回的int,而不是一个char,所以有多余的空间之外的其他东西返回char到信号EOF。其他 I/O 调用可能会发出EOF不同的信号,例如抛出异常。
As a point of interest, in DOS (and maybe still on Windows?) an actual, physical character ^Zwas placed at the end of a file to signal its end. So, on DOS, there actually was an EOFcharacter. Unix never had such a thing.
作为一个有趣的点,在 DOS 中(也许还在 Windows 上?)一个实际的物理字符^Z被放置在文件的末尾以表示它的结束。所以,在 DOS 上,实际上有一个EOF字符。Unix 从来没有这样的事情。
回答by Project Zero
Well it is pretty much possible to find the EOF of a binary file if you study it's structure.
好吧,如果您研究它的结构,则很有可能找到二进制文件的 EOF。
No, you don't need the OS to know the EOF of an executable EOF.
不,您不需要操作系统知道可执行 EOF 的 EOF。
Almost every type of executable has a Page Zero which describes the basic information that the OS might need while loading the code into the memory and is stored as the first page of that executable.
几乎每种类型的可执行文件都有一个页面零,它描述了操作系统在将代码加载到内存中时可能需要的基本信息,并存储为该可执行文件的第一页。
Let's take the example of an MZ executable. https://wiki.osdev.org/MZ
让我们以 MZ 可执行文件为例。 https://wiki.osdev.org/MZ
Here at offset 2, we have the total number of complete/partial pages and right after that at offset 4 we have the number of bytes in the last page. This information is generally used by the OS to safely load the code into the memory, but you can use it to calculate the EOF of your binary file.
在偏移量 2 处,我们有完整/部分页面的总数,紧接着在偏移量 4 处,我们有最后一页的字节数。操作系统通常使用此信息将代码安全地加载到内存中,但您可以使用它来计算二进制文件的 EOF。
Algorithm:
算法:
1. Start
2. Parse the parameter and instantiate the file pointer as per your requirement.
3. Load the first page (zero) in a (char) buffer of default size of page zero and print it.
4. Get the value at *((short int*)(&buffer+2)) and store it in a loop variable called (short int) i.
5. Get the value at *((short int*)(&buffer+4)) and store it in a variable called (short int) l.
6. i--
7. Load and print (or do whatever you wanted to do) 'size of page' characters into a buffer until i equals zero.
8. Once the loop has finished executing just load `l` bytes into that buffer and again perform whatever you wanted to
9. Stop
If you're designing your own binary file format then consider adding some sort of meta data at the start of that file or a special character or word that denotes the end of that file.
如果您正在设计自己的二进制文件格式,请考虑在该文件的开头添加某种元数据,或者添加一个特殊字符或单词来表示该文件的结尾。
And there's a good amount of probability that the OS loads the size of the file from here with the help of simple maths and by analyzing the meta-data even though it might seem that the OS has stored it somewhere along with other information it's expected to store (Abstraction to reduce redundancy).
并且很有可能操作系统在简单数学的帮助下从这里加载文件的大小并通过分析元数据,即使操作系统似乎已经将它与其他信息一起存储在某个地方它预期存储(抽象以减少冗余)。

