Python 用 io.TextIOWrapper 包装一个开放的流

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34447623/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:59:46  来源:igfitidea点击:

Wrap an open stream with io.TextIOWrapper

pythonunit-testingpython-3.xcharacter-encodingpython-2.x

提问by bignose

How can I wrap an open binary stream – a Python 2 file, a Python 3 io.BufferedReader, an io.BytesIO– in an io.TextIOWrapper?

我怎样才能将一个开放的二进制流——一个 Python 2 file、一个 Python 3 io.BufferedReader、一个io.BytesIO——包装在一个io.TextIOWrapper

I'm trying to write code that will work unchanged:

我正在尝试编写可以保持不变的代码:

  • Running on Python 2.
  • Running on Python 3.
  • With binary streams generated from the standard library (i.e. I can't control what type they are)
  • With binary streams made to be test doubles (i.e. no file handle, can't re-open).
  • Producing an io.TextIOWrapperthat wraps the specified stream.
  • 在 Python 2 上运行。
  • 在 Python 3 上运行。
  • 使用标准库生成的二进制流(即我无法控制它们是什么类型)
  • 将二进制流设为测试替身(即没有文件句柄,无法重新打开)。
  • 生成io.TextIOWrapper包装指定流的 。

The io.TextIOWrapperis needed because its API is expected by other parts of the standard library. Other file-like types exist, but don't provide the right API.

io.TextIOWrapper是必要的,因为它的API是由标准库的其它部分的预期。存在其他类似文件的类型,但不提供正确的 API。

Example

例子

Wrapping the binary stream presented as the subprocess.Popen.stdoutattribute:

包装作为subprocess.Popen.stdout属性显示的二进制流:

import subprocess
import io

gnupg_subprocess = subprocess.Popen(
        ["gpg", "--version"], stdout=subprocess.PIPE)
gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")

In unit tests, the stream is replaced with an io.BytesIOinstance to control its content without touching any subprocesses or filesystems.

在单元测试中,流被替换为一个io.BytesIO实例以控制其内容,而无需触及任何子进程或文件系统。

gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))

That works fine on the streams created by Python 3's standard library. The same code, though, fails on streams generated by Python 2:

这适用于 Python 3 标准库创建的流。但是,相同的代码在 Python 2 生成的流上失败:

[Python 2]
>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = io.TextIOWrapper(gnupg_subprocess.stdout, encoding="utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'file' object has no attribute 'readable'

Not a solution: Special treatment for file

不是解决方案:特殊处理 file

An obvious response is to have a branch in the code which tests whether the stream actually is a Python 2 fileobject, and handle that differently from io.*objects.

一个明显的反应是在代码中有一个分支来测试流是否实际上是 Python 2file对象,并以不同于io.*对象的方式处理它。

That's not an option for well-tested code, because it makes a branch that unit tests – which, in order to run as fast as possible, must not create any realfilesystem objects – can't exercise.

对于经过良好测试的代码,这不是一个选项,因为它创建了一个单元测试的分支——为了尽可能快地运行,不能创建任何真正的文件系统对象——不能执行。

The unit tests will be providing test doubles, not real fileobjects. So creating a branch which won't be exercised by those test doubles is defeating the test suite.

单元测试将提供测试替身,而不是真实的file对象。因此,创建一个不会被那些测试替身执行的分支会打败测试套件。

Not a solution: io.open

不是解决方案: io.open

Some respondents suggest re-opening (e.g. with io.open) the underlying file handle:

一些受访者建议重新打开(例如使用io.open)底层文件句柄:

gnupg_stdout = io.open(
        gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")

That works on both Python 3 and Python 2:

这适用于 Python 3 和 Python 2:

[Python 3]
>>> type(gnupg_subprocess.stdout)
<class '_io.BufferedReader'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
>>> type(gnupg_stdout)
<class '_io.TextIOWrapper'>
[Python 2]
>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
>>> type(gnupg_stdout)
<type '_io.TextIOWrapper'>

But of course it relies on re-opening a real filefrom its file handle. So it fails in unit tests when the test double is an io.BytesIOinstance:

但是当然它依赖于从其文件句柄重新打开一个真实文件。所以当测试替身是一个io.BytesIO实例时,它在单元测试中失败:

>>> gnupg_subprocess.stdout = io.BytesIO("Lorem ipsum".encode("utf-8"))
>>> type(gnupg_subprocess.stdout)
<type '_io.BytesIO'>
>>> gnupg_stdout = io.open(gnupg_subprocess.stdout.fileno(), mode='r', encoding="utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: fileno

Not a solution: codecs.getreader

不是解决方案: codecs.getreader

The standard library also has the codecsmodule, which provides wrapper features:

标准库也有codecs提供包装器特性的模块:

import codecs

gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)

That's good because it doesn't attempt to re-open the stream. But it fails to provide the io.TextIOWrapperAPI. Specifically, it doesn't inherit io.IOBaseand doesn't have the encodingattribute:

这很好,因为它不会尝试重新打开流。但是它没有提供io.TextIOWrapperAPI。具体来说,它不继承io.IOBase,并没有encoding属性

>>> type(gnupg_subprocess.stdout)
<type 'file'>
>>> gnupg_stdout = codecs.getreader("utf-8")(gnupg_subprocess.stdout)
>>> type(gnupg_stdout)
<type 'instance'>
>>> isinstance(gnupg_stdout, io.IOBase)
False
>>> gnupg_stdout.encoding
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/codecs.py", line 643, in __getattr__
    return getattr(self.stream, name)
AttributeError: '_io.BytesIO' object has no attribute 'encoding'

So codecsdoesn't provide objects which substitute for io.TextIOWrapper.

所以codecs不提供替代io.TextIOWrapper.

What to do?

该怎么办?

So how can I write code that works for both Python 2 and Python 3, with both the test doubles and the real objects, which wraps an io.TextIOWrapperaround the already-open byte stream?

那么我如何编写既适用于 Python 2 又适用于 Python 3 的代码,同时包含测试替身和真实对象,这些对象包装io.TextIOWrapper了已经打开的字节流

采纳答案by bignose

Based on multiple suggestions in various forums, and experimenting with the standard library to meet the criteria, my current conclusion is this can't be donewith the library and types as we currently have them.

根据各种论坛中的多项建议,并尝试使用标准库以满足标准,我目前的结论是,我们目前拥有的库和类型无法做到这一点

回答by jbg

Use codecs.getreaderto produce a wrapper object:

使用codecs.getreader生成包装对象:

text_stream = codecs.getreader("utf-8")(bytes_stream)

Works on Python 2 and Python 3.

适用于 Python 2 和 Python 3。

回答by jbg

It turns out you just need to wrap your io.BytesIOin io.BufferedReaderwhich exists on both Python 2 and Python 3.

事实证明,你只需要您的包裹io.BytesIOio.BufferedReader其中存在两个Python 2和Python 3中。

import io

reader = io.BufferedReader(io.BytesIO("Lorem ipsum".encode("utf-8")))
wrapper = io.TextIOWrapper(reader)
wrapper.read()  # returns Lorem ipsum

This answer originally suggested using os.pipe, but the read-side of the pipe would have to be wrapped in io.BufferedReader on Python 2 anyway to work, so this solution is simpler and avoids allocating a pipe.

这个答案最初建议使用 os.pipe,但管道的读取端无论如何都必须包装在 Python 2 上的 io.BufferedReader 中才能工作,所以这个解决方案更简单并且避免分配管道。

回答by jbg

Okay, this seems to be a complete solution, for all cases mentioned in the question, tested with Python 2.7 and Python 3.5. The general solution ended up being re-opening the file descriptor, but instead of io.BytesIO you need to use a pipe for your test double so that you have a file descriptor.

好的,这似乎是一个完整的解决方案,对于问题中提到的所有情况,都使用 Python 2.7 和 Python 3.5 进行了测试。一般的解决方案最终是重新打开文件描述符,但您需要为测试替身使用管道而不是 io.BytesIO,以便您拥有文件描述符。

import io
import subprocess
import os

# Example function, re-opens a file descriptor for UTF-8 decoding,
# reads until EOF and prints what is read.
def read_as_utf8(fileno):
    fp = io.open(fileno, mode="r", encoding="utf-8", closefd=False)
    print(fp.read())
    fp.close()

# Subprocess
gpg = subprocess.Popen(["gpg", "--version"], stdout=subprocess.PIPE)
read_as_utf8(gpg.stdout.fileno())

# Normal file (contains "Lorem ipsum." as UTF-8 bytes)
normal_file = open("loremipsum.txt", "rb")
read_as_utf8(normal_file.fileno())  # prints "Lorem ipsum."

# Pipe (for test harness - write whatever you want into the pipe)
pipe_r, pipe_w = os.pipe()
os.write(pipe_w, "Lorem ipsum.".encode("utf-8"))
os.close(pipe_w)
read_as_utf8(pipe_r)  # prints "Lorem ipsum."
os.close(pipe_r)

回答by Vek

I needed this as well, but based on the thread here, I determined that it was not possible using just Python 2's iomodule. While this breaks your "Special treatment for file" rule, the technique I went with was to create an extremely thin wrapper for file(code below) that could then be wrapped in an io.BufferedReader, which can in turn be passed to the io.TextIOWrapperconstructor. It will be a pain to unit test, as obviously the new code path can't be tested on Python 3.

我也需要这个,但是根据这里的线程,我确定仅使用 Python 2 的io模块是不可能的。虽然这打破了您的“特殊处理file”规则,但我采用的技术是为file(下面的代码)创建一个非常薄的包装器,然后可以将其包装在 中io.BufferedReader,然后可以将其传递给io.TextIOWrapper构造函数。单元测试会很痛苦,因为显然新的代码路径不能在 Python 3 上测试。

Incidentally, the reason the results of an open()can be passed directly to io.TextIOWrapperin Python 3 is because a binary-mode open()actually returns an io.BufferedReaderinstance to begin with (at least on Python 3.4, which is where I was testing at the time).

顺便说一下,an 的结果open()可以io.TextIOWrapper在 Python 3 中直接传递的原因是因为二进制模式open()实际上返回了一个io.BufferedReader开始的实例(至少在 Python 3.4 上,这是我当时测试的地方)。

import io
import six  # for six.PY2

if six.PY2:
    class _ReadableWrapper(object):
        def __init__(self, raw):
            self._raw = raw

        def readable(self):
            return True

        def writable(self):
            return False

        def seekable(self):
            return True

        def __getattr__(self, name):
            return getattr(self._raw, name)

def wrap_text(stream, *args, **kwargs):
    # Note: order important here, as 'file' doesn't exist in Python 3
    if six.PY2 and isinstance(stream, file):
        stream = io.BufferedReader(_ReadableWrapper(stream))

    return io.TextIOWrapper(stream)

At least this is small, so hopefully it minimizes the exposure for parts that cannot easily be unit tested.

至少这是很小的,所以希望它可以最大限度地减少不容易进行单元测试的部件的暴露。

回答by Grey Christoforo

Here's some code that I've tested in both python 2.7 and python 3.6.

这是我在 python 2.7 和 python 3.6 中测试过的一些代码。

The key here is that you need to use detach() on your previous stream first. This does not close the underlying file, it just rips out the raw stream object so that it can be reused. detach() will return an object that is wrappable with TextIOWrapper.

这里的关键是您需要先在之前的流中使用 detach() 。这不会关闭底层文件,它只是撕掉原始流对象,以便可以重用。detach() 将返回一个可使用 TextIOWrapper 包装的对象。

As an example here, I open a file in binary read mode, do a read on it like that, then I switch to a UTF-8 decoded text stream via io.TextIOWrapper.

举个例子,我以二进制读取模式打开一个文件,像这样读取它,然后我通过 io.TextIOWrapper 切换到一个 UTF-8 解码的文本流。

I saved this example as this-file.py

我将此示例保存为 this-file.py

import io

fileName = 'this-file.py'
fp = io.open(fileName,'rb')
fp.seek(20)
someBytes = fp.read(10)
print(type(someBytes) + len(someBytes))

# now let's do some wrapping to get a new text (non-binary) stream
pos = fp.tell() # we're about to lose our position, so let's save it
newStream = io.TextIOWrapper(fp.detach(),'utf-8') # FYI -- fp is now unusable
newStream.seek(pos)
theRest = newStream.read()
print(type(theRest), len(theRest))

Here's what I get when I run it with both python2 and python3.

这是我同时使用 python2 和 python3 运行它时得到的结果。

$ python2.7 this-file.py 
(<type 'str'>, 10)
(<type 'unicode'>, 406)
$ python3.6 this-file.py 
<class 'bytes'> 10
<class 'str'> 406

Obviously the print syntax is different and as expected the variable types differ between python versions but works like it should in both cases.

显然打印语法是不同的,正如预期的那样,python 版本之间的变量类型不同,但在两种情况下都应该像它应该的那样工作。