如何在 Python 3 中设置 sys.stdout 编码?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4374455/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 15:28:46  来源:igfitidea点击:

How to set sys.stdout encoding in Python 3?

pythonunicodepython-3.xstdout

提问by Greg Hewgill

Setting the default output encoding in Python 2 is a well-known idiom:

在 Python 2 中设置默认输出编码是一个众所周知的习惯用法:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

This wraps the sys.stdoutobject in a codec writer that encodes output in UTF-8.

这将sys.stdout对象包装在以 UTF-8 编码输出的编解码器编写器中。

However, this technique does not work in Python 3 because sys.stdout.write()expects a str, but the result of encoding is bytes, and an error occurs when codecstries to write the encoded bytes to the original sys.stdout.

但是,这种技术在 Python 3 中不起作用,因为sys.stdout.write()需要 a str,但编码的结果是bytes,并且codecs尝试将编码的字节写入原始sys.stdout.

What is the correct way to do this in Python 3?

在 Python 3 中执行此操作的正确方法是什么?

采纳答案by sth

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

从 Python 3.7 开始,您可以使用以下命令更改标准流的编码reconfigure()

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errorsparameter.

您还可以通过添加errors参数来修改处理编码错误的方式。

回答by Greg Hewgill

Python 3.1 added io.TextIOBase.detach(), with a note in the documentation for sys.stdout:

添加了 Python 3.1 io.TextIOBase.detach(),并在文档中注明sys.stdout

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach()streams can be made binary by default. This function sets stdinand stdoutto binary:

def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

默认情况下,标准流处于文本模式。要向这些写入或读取二进制数据,请使用底层二进制缓冲区。例如,要将字节写入stdout,请使用sys.stdout.buffer.write(b'abc'). io.TextIOBase.detach()默认情况下,使用流可以是二进制的。此函数将stdin和设置stdout为二进制:

def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:

因此,Python 3.1 及更高版本的对应成语是:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

回答by Lennart Regebro

sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.

sys.stdout 在 Python 3 中处于文本模式。因此您直接向其写入 unicode,不再需要 Python 2 的习惯用法。

Where this would fail in Python 2:

这在 Python 2 中会失败的地方:

>>> import sys
>>> sys.stdout.write(u"?nic?de")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

However, it works just dandy in Python 3:

但是,它在 Python 3 中非常有效:

>>> import sys
>>> sys.stdout.write("?nic?de")
?nic?de7

Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.

现在,如果你的 Python 不知道你的标准输出编码实际上是什么,那是一个不同的问题,很可能是在 Python 的构建中。

回答by bobince

Setting the default output encoding in Python 2 is a well-known idiom

在 Python 2 中设置默认输出编码是一个众所周知的习惯用法

Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.

哎呀!这是 Python 2 中众所周知的习语吗?对我来说,这似乎是一个危险的错误。

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.

它肯定会弄乱任何试图将二进制文件写入标准输出的脚本(例如,如果您是一个返回图像的 CGI 脚本,您将需要它)。字节和字符是完全不同的动物。用只接受字符的接口来修补指定接受字节的接口并不是一个好主意。

CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.writeto send bytes directly. Encoding page content to match its charsetparameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means printis no good for CGI any more.

CGI 和 HTTP 通常显式地使用字节。您应该只向 sys.stdout 发送字节。在 Python 3 中,这意味着使用sys.stdout.buffer.write直接发送字节。编码页面内容以匹配其charset参数应该在您的应用程序中在更高级别处理(在您返回文本内容而不是二进制内容的情况下)。这也意味着print不再对 CGI 有利。

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)

(更令人困惑的是,wsgiref 的 CGIHandler 直到最近才在 py3k 中被破坏,因此无法以这种方式将 WSGI 部署到 CGI。有了 PEP 3333 和 Python 3.2,这终于可行了。)

回答by ideasman42

I found this thread while searching for solutions to the same error,

我在寻找相同错误的解决方案时发现了这个线程,

An alternative solution to those already suggested is to set the PYTHONIOENCODINGenvironment variable beforePython starts, for my use - this is less trouble then swapping sys.stdoutafter Python is initialized:

对于已经建议的解决方案,另一种解决方案是Python 启动之前设置PYTHONIOENCODING环境变量,供我使用 - 这比在 Python 初始化后进行交换更麻烦:sys.stdout

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.

优点是不必去编辑 Python 代码。

回答by ptomato

Using detach()causes the interpreter to print a warning when it tries to close stdout just before it exits:

Usingdetach()导致解释器在它退出之前尝试关闭 stdout 时打印警告:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

Instead, this worked fine for me:

相反,这对我来说很好用:

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(And, of course, writing to default_outinstead of stdout.)

(当然,写入default_out而不是标准输出。)

回答by Hyman O'Connor

Other answers seem to recommend using codecs, but openworks for me:

其他答案似乎建议使用codecs,但open对我有用:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii".

即使我使用PYTHONIOENCODING="ascii".