Python 二进制流中`open`和`io.BytesIO`的区别

Question

提问by Luke Whyte

I'm learning about working with streams in Python and I noticed that the IO docssay the following:

我正在学习在 Python 中使用流，我注意到IO 文档说如下：

The easiest way to create a binary stream is with open() with 'b' in the mode string:
f = open("myfile.jpg", "rb")
In-memory binary streams are also available as BytesIO objects:
f = io.BytesIO(b"some initial binary data: \x00\x01")

创建二进制流的最简单方法是在模式字符串中使用 open() 和 'b'：
f = open("myfile.jpg", "rb")
内存中的二进制流也可用作 BytesIO 对象：
f = io.BytesIO(b"some initial binary data: \x00\x01")

What is the difference between fas defined by openand fas defined by BytesIO. In other words, what makes a "In-memory binary stream" and how is that different from what opendoes?

fas defined byopen和fas defined by 有什么区别BytesIO。换句话说，是什么构成了“内存中的二进制流”，它与什么有什么不同open？

Answer 1

回答by vallentin

For simplicity's sake, let's consider writing instead of reading for now.

为简单起见，让我们暂时考虑写作而不是阅读。

So when you use open()like say:

所以当你使用open()like 时说：

with open("test.dat", "wb") as f:
    f.write(b"Hello World")
    f.write(b"Hello World")
    f.write(b"Hello World")

After executing that a file called test.datwill be created, containing 3x Hello World. The data wont be kept in memory after it's written to the file (unless being kept by a name).

执行test.dat后将创建一个名为的文件，其中包含 3x Hello World。数据在写入文件后不会保存在内存中（除非通过名称保存）。

Now when you consider io.BytesIO()instead:

现在，当你考虑io.BytesIO()：

with io.BytesIO() as f:
    f.write(b"Hello World")
    f.write(b"Hello World")
    f.write(b"Hello World")

Which instead of writing the contents to a file, it's written to an in memory buffer. In other words a chunk of RAM. Essentially writing the following would be the equivalent:

它不是将内容写入文件，而是写入内存缓冲区。换句话说，一块内存。基本上编写以下内容是等效的：

buffer = b""
buffer += b"Hello World"
buffer += b"Hello World"
buffer += b"Hello World"

In relation to the example with the with statement, then at the end there would also be a del buffer.

对于带有 with 语句的示例，最后也会有一个del buffer.

The key difference here is optimization and performance. io.BytesIOis able to do some optimizations that makes it faster than simply concatenating all the b"Hello World"one by one.

这里的主要区别是优化和性能。io.BytesIO能够进行一些优化，使其比简单地将所有内容b"Hello World"一一连接更快。

Just to prove it here's a small benchmark:

只是为了证明它，这是一个小基准：

Concat: 1.3529 seconds
BytesIO: 0.0090 seconds

连续：1.3529 秒
BytesIO：0.0090 秒

import io
import time

begin = time.time()
buffer = b""
for i in range(0, 50000):
    buffer += b"Hello World"
end = time.time()
seconds = end - begin
print("Concat:", seconds)

begin = time.time()
buffer = io.BytesIO()
for i in range(0, 50000):
    buffer.write(b"Hello World")
end = time.time()
seconds = end - begin
print("BytesIO:", seconds)

Besides the performance gain, using BytesIOinstead of concatenating has the advantage that BytesIOcan be used in place of a file object. So say you have a function that expects a file object to write to. Then you can give it that in-memory buffer instead of a file.

除了性能增益之外，使用BytesIO而不是连接具有BytesIO可以代替文件对象使用的优点。因此，假设您有一个需要写入文件对象的函数。然后你可以给它那个内存缓冲区而不是一个文件。

The difference is that open("myfile.jpg", "rb")simply loads and returns the contents of myfile.jpg; whereas, BytesIOagain is just a buffer containing some data.

不同之处在于open("myfile.jpg", "rb")简单地加载并返回myfile.jpg;的内容。而，BytesIO又只是一个包含一些数据的缓冲区。

Since BytesIOis just a buffer - if you wanted to write the contents to a file later - you'd have to do:

由于BytesIO只是一个缓冲区 - 如果您想稍后将内容写入文件 - 您必须执行以下操作：

buffer = io.BytesIO()
# ...
with open("test.dat", "wb") as f:
    f.write(buffer.getvalue())

Also, you didn't mention a version; I'm using Python 3. Related to the examples: I'm using the with statement instead of calling f.close()

另外，您没有提到版本；我正在使用 Python 3。与示例相关：我正在使用 with 语句而不是调用f.close()

Answer 2

回答by Blckknght

Using openopens a file on your hard drive. Depending on what mode you use, you can read or write (or both) from the disk.

使用open打开硬盘驱动器上的文件。根据您使用的模式，您可以从磁盘读取或写入（或两者）。

A BytesIOobject isn't associated with any real file on the disk. It's just a chunk of memory that behaves like a file does. It has the same API as a file object returned from open(with mode r+b, allowing reading and writing of binary data).

一个BytesIO对象不与磁盘上的任何真正的文件关联。它只是一块表现得像文件一样的内存。它具有与从open（使用 mode r+b，允许读取和写入二进制数据）返回的文件对象相同的 API 。

BytesIO(and it's close sibling StringIOwhich is always in text mode) can be useful when you need to pass data to or from an API that expect to be given a file object, but where you'd prefer to pass the data directly. You can load your input data you have into the BytesIObefore giving it to the library. After it returns, you can get any data the library wrote to the file from the BytesIOusing the getvalue()method. (Usually you'd only need to do one of those, of course.)

BytesIO（并且它StringIO是始终处于文本模式的亲密兄弟）当您需要将数据传入或传出期望获得文件对象的 API 时可能很有用，但您更喜欢直接传递数据。在将输入数据BytesIO提供给库之前，您可以将其加载到中。它返回后，您可以从BytesIOusinggetvalue()方法中获取库写入文件的任何数据。（当然，通常您只需要执行其中一项即可。）

Answer 3

回答by dkrynicki

f = open("myfile.jpg", "rb")

read bytes from file from disk disk and assign such value to object referenced as 'f' which is kept by Python in memory.

从磁盘磁盘读取文件中的字节并将此类值分配给引用为“f”的对象，该对象由 Python 保存在内存中。

f = io.BytesIO(b"some initial binary data: \x00\x01")

assign bytes stream value to object referenced as 'f' which is kept by Python in memory.

将字节流值分配给引用为“f”的对象，该对象由 Python 保存在内存中。

Python 二进制流中`open`和`io.BytesIO`的区别

提问by Luke Whyte

回答by vallentin

回答by Blckknght

回答by dkrynicki

相关推荐

最近更新

标签

Python 二进制流中`open`和`io.BytesIO`的区别

提问by Luke Whyte

回答by vallentin

回答by Blckknght

回答by dkrynicki

相关推荐

Python 在 Anaconda Navigator 上找不到包。接下来做什么？

python如何重复代码

Python 如何将 jupyter notebook 目录中的模块导入较低目录中的 notebooks？

Python 在熊猫中将字符串转换为小写

相关推荐

最近更新

标签