Windows 上 python 中二进制和文本 I/O 的区别

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3257869/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 14:50:36  来源:igfitidea点击:

Difference between binary and text I/O in python on Windows

pythonwindowsfilefile-io

提问by Mad Scientist

I know that I should open a binary file using "rb"instead of "r"because Windows behaves differently for binary and non-binary files.

我知道我应该使用"rb"而不是打开二进制文件,"r"因为 Windows 对于二进制和非二进制文件的行为不同。

But I don't understand what exactly happens if I open a file the wrong way and why this distinction is even necessary. Other operating systems seem to do fine by treating both kinds of files the same.

但是我不明白如果我以错误的方式打开文件到底会发生什么,以及为什么这种区别是必要的。其他操作系统似乎可以通过相同地处理这两种文件而做得很好。

回答by Nas Banov

Well this is for historical (or as i like to say it, hysterical) reasons. The file open modes are inherited from C stdio library and hence we follow it.

嗯,这是出于历史(或者我喜欢说,歇斯底里)的原因。文件打开模式继承自 C stdio 库,因此我们遵循它。

For Windows, there is no difference between text and binary files, just like in any of the Unix clones. No, i mean it! - there are (were) file systems/OSes in which text file is completely different beast from object file and so on. In some you had to specify the maximum length of lines in advance and fixed size records were used... fossils from the times of 80-column paper punch-cards and such. Luckily, not so in Unices, Windows and Mac.

对于 Windows,文本文件和二进制文件之间没有区别,就像在任何 Unix 克隆中一样。不,我是认真的!- 有(有)文件系统/操作系统,其中文本文件与目标文件完全不同,等等。在某些情况下,您必须提前指定行的最大长度,并使用固定大小的记录……来自 80 列纸穿孔卡等时代的化石。幸运的是,Unices、Windows 和 Mac 并非如此。

However - all other things equal - Unix, Windows and Mac hystoricallydiffer in what characters they use in output stream to mark end of one line (or, same thing, as separator between lines). In Unix, \x0A (\n) is used. In Windows, sequence of two characters \x0D\x0A (\r\n) is used; on Mac - just \xOD (\r). Here are some clues on the origin of use of those two symbols - ASCII code 10 is called Line Feed (LF)and when sent to teletype, would cause it to move down one line (Y++), without changing its horizontal (X) position. Carriage Return (CR)- ASCII 13 - on the other hand, would cause the printing carriage to return to the beginning of the line (X=0) without scrolling one line down. So when sending output to the printer, both \r and \n had to be send, so that the carriage will move to the beginning of a new line. Now when typing on terminal keyboard, operators naturally are expected to press one key and not two for end of line. That on Apple][ was the key 'Return' (\r).

然而-所有其他条件相同的-的Unix,Windows和Mac hystorically在他们的输出流使用一条线路的标志到底是什么人物(或者,同样的事情,因为线之间的分隔符)是不同的。在 Unix 中,使用 \x0A (\n)。在 Windows 中,使用两个字符的序列 \x0D\x0A (\r\n);在 Mac 上 - 只是 \xOD (\r)。以下是有关使用这两个符号的起源的一些线索 - ASCII 码 10 称为换行 ( LF),当发送到电传打字机时,会导致它向下移动一行 (Y++),而不会改变其水平 (X) 位置. 回车 (CR)- ASCII 13 - 另一方面,会导致打印托架返回到行的开头 (X=0),而不向下滚动一行。因此,当将输出发送到打印机时,必须同时发送 \r 和 \n,以便托架将移动到新行的开头。现在,当在终端键盘上打字时,自然希望操作员按一个键而不是两个键来结束行。在 Apple][ 是关键的“返回”(\ r)。

At any rate, this is how things settled. C's creators were concerned about portability - much of Unix was written in C, unlike before, when OSes were written in assembler. So they did not want to deal with each platform quirks about text representation, so they added this evil hack to their I/O library depending on the platform, the input and output to that file will be "patched" on the fly so that the program will see the new lines the righteous, Unix-way - as '\n' - no matter if it was '\r\n' from Windows or '\r' from Mac. So the developer need not worry on what OS the program ran, it could still read and write text files in native format.

无论如何,事情就是这样解决的。C 的创建者关心可移植性——大部分 Unix 都是用 C 编写的,不像以前,操作系统是用汇编语言编写的。所以他们不想处理每个平台关于文本表示的怪癖,所以他们根据平台将这个邪恶的黑客添加到他们的 I/O 库中,该文件的输入和输出将被即时“修补”,以便程序将以正确的Unix 方式将新行视为 '\n' - 无论它是来自 Windows 的 '\r\n' 还是来自 Mac 的 '\r' 。因此开发者无需担心程序运行的操作系统,它仍然可以读取和写入本机格式的文本文件。

There was a problem, however - not all files are text, there are other formats and in they are very sensitive to replacing one character with another. So they though, we will call those "binary files" and indicate that to fopen()by including 'b' in the mode - and this will flag the library not to do any behind-the-scenes conversion. And that's how it came to be the way it is :)

然而,有一个问题——并非所有文件都是文本,还有其他格式,并且它们对用另一个字符替换一个字符非常敏感。所以他们虽然,我们将调用这些“二进制文件”,并fopen()通过在模式中包含“b”来表明这一点——这将标记库不要进行任何幕后转换。这就是它的样子:)

So to recap, if file is open with 'b' in binary mode, no conversions will take place. If it was open in text mode, depending on the platform, some conversions of the new line character(s) may occur - towards Unix point of view. Naturally, on Unix platform there is no difference between reading/writing to "text" or "binary" file.

所以回顾一下,如果文件在二进制模式下用 'b' 打开,则不会发生任何转换。如果它在文本模式下打开,根据平台的不同,可能会发生一些换行符的转换 - 从 Unix 的角度来看。自然,在 Unix 平台上,读/写“文本”或“二进制”文件没有区别。

回答by Thomas

This mode is about conversion of line endings.

这种模式是关于行尾的转换。

When reading in text mode, the platform's native line endings (\r\non Windows) are converted to Python's Unix-style \nline endings. When writing in text mode, the reverse happens.

在文本模式下阅读时,平台的本机行尾(\r\n在 Windows 上)将转换为 Python 的 Unix 样式\n行尾。在文本模式下书写时,情况正好相反。

In binary mode, no such conversion is done.

在二进制模式下,不进行此类转换。

Other platforms usually do fine without the conversion, because they store line endings natively as \n. (An exception is Mac OS, which used to use \rin the old days.) Code relying on this, however, is not portable.

其他平台通常在没有转换的情况下也能正常工作,因为它们本机将行结尾存储为\n. (Mac OS 是一个例外,它曾经\r在过去使用。)但是,依赖于此的代码不可移植。

回答by Ben Hoffstein

In Windows, text mode will convert the newline \nto a carriage return followed by a newline \r\n.

在 Windows 中,文本模式会将换行符转换\n为回车符后跟换行符\r\n

If you read text in binary mode, there are no problems. If you read binary data in text mode, it will likely be corrupted.

如果您以二进制模式阅读文本,则没有问题。如果您以文本模式读取二进制数据,它可能会被损坏。

回答by mdm

For reading files there should be no difference. When writing to text-files Windows will automatically mess up your line-breaks (it will add \r's before the \n's). That's why you should use "wb".

对于读取文件,应该没有区别。当写入文本文件时,Windows 会自动弄乱你的换行符(它会在\r's 之前添加\n's)。这就是为什么你应该使用"wb".