如何使用 Python 3.4 (Windows 8) 将 utf-8 打印到控制台?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25127673/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to print utf-8 to console with Python 3.4 (Windows 8)?
提问by Austin A
I've never fully wrapped my head around encoding and decoding unicode to other formats (utf-8, utf-16, ascii, etc.) but I've reached a wall thatis both confusing and frustrating. What I'm trying to do is print utf-8 card symbols (?,?,?,?) from a python module to a windows console. The console that I'm using is git bash and I'm using console2 as a front-end. I've tried/read a number of approaches below and nothing has worked so far. Let me know if what I'm doing is possible and the right way to do it.
我从来没有完全考虑过将 unicode 编码和解码为其他格式(utf-8、utf-16、ascii 等),但我已经遇到了既令人困惑又令人沮丧的问题。我想要做的是将 utf-8 卡符号(?,?,?,?)从 python 模块打印到 Windows 控制台。我使用的控制台是 git bash,我使用 console2 作为前端。我已经尝试/阅读了以下多种方法,但到目前为止没有任何效果。让我知道我正在做的事情是否可行以及正确的做法。
- Made sure the console can handle utf-8 characters. These two tests make me believe that the console isn't the problem.
- 确保控制台可以处理 utf-8 字符。这两个测试让我相信控制台不是问题所在。


Attempt the same thing from the python module.
When I execute the .py, this is the result.print(u'?') UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>Attempt to encode ?. This gives me back the unicode set encoded in utf-8, but still no spade symbol.
text = '?' print(text.encode('utf-8')) b'\xe2\x99\xa0'
从 python 模块尝试同样的事情。
当我执行 .py 时,这就是结果。print(u'?') UnicodeEncodeError: 'charmap' codec can't encode character '\u2660' in position 0: character maps to <undefined>尝试编码?。这给了我以 utf-8 编码的 unicode 集,但仍然没有黑桃符号。
text = '?' print(text.encode('utf-8')) b'\xe2\x99\xa0'
I feel like I'm missing a step or not understanding the whole encode/decode process. I've read this, this, and this. The last of the pages suggests wrapping the sys.stdout into the code but thisarticle says using stdout is unnecessary and points to another page using the codecs module.
我觉得我错过了一步或不了解整个编码/解码过程。我读过这个、这个和这个。最后一个页面的提示包裹sys.stdout的入代码,但该文章说使用stdout是不必要的,点使用的编解码器模块的另一页。
I'm so confused! I feel as thought quality documentation on this subject is hard to find and hopefully someone can clear this up. Any help is always appreciated!
我很困惑!我觉得很难找到关于这个主题的思想质量文档,希望有人能解决这个问题。任何帮助总是不胜感激!
Austin
奥斯汀
回答by Kasramvd
By default, the console in Microsoft Windows only displays 256 characters (cp437, of "Code page 437", the original IBM-PC 1981 extended ASCII character set) as you say in comments.
默认情况下,Microsoft Windows 中的控制台仅显示您在注释中所说的256 个字符(cp437,属于“代码页 437”,原始 IBM-PC 1981 扩展 ASCII 字符集)。
and in other side the PYTHONIOENCODINGis set to UTF-8by default. so i think when you want to print unicode in windows you have to align sys.stdout.encodingand PYTHONIOENCODINGwith together !
在另一边,默认PYTHONIOENCODING设置为UTF-8。所以我认为当你想在 Windows 中打印 unicode 时,你必须对齐sys.stdout.encoding并 PYTHONIOENCODING在一起!
also note that when you specify an encoding for your.pyfile it just use it for that code and dont change the default systemencoding.
另请注意,当您为.py文件指定编码时,它仅用于该代码,而不要更改默认系统encoding。
so do something like this :
所以做这样的事情:
import codecs
my_str='?' # or something like my_str='\u05dd'
my_str.encode().decode('cp437')
回答by bobince
What I'm trying to do is print utf-8 card symbols (?,?,?,?) from a python module to a windows console
我想要做的是将 utf-8 卡符号(?,?,?,?)从 python 模块打印到 Windows 控制台
UTF-8 is a byte encoding of Unicode characters. ???? are Unicode characters which can be reproduced in a variety of encodings and UTF-8 is one of those encodings—as a UTF, UTF-8 can reproduce any Unicode character. But there is nothing specifically “UTF-8” about those characters.
UTF-8 是 Unicode 字符的字节编码。???是 Unicode 字符,可以用各种编码再现,而 UTF-8 是其中一种编码——作为 UTF,UTF-8 可以再现任何 Unicode 字符。但是这些字符并没有专门的“UTF-8”。
Other encodings that can reproduce the characters ???? are Windows code page 850and 437, which your console is likely to be using under a Western European install of Windows. You can print ? in these encodings but you are not using UTF-8 to do so, and you won't be able to use other Unicode characters that are available in UTF-8 but outside the scope of these code pages.
其他可以重现字符的编码???是 Windows代码页 850和437,您的控制台可能会在西欧安装的 Windows 下使用。可以打印吗?在这些编码中,但您没有使用 UTF-8 来执行此操作,并且您将无法使用在 UTF-8 中可用但在这些代码页范围之外的其他 Unicode 字符。
print(u'?')
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660'
In Python 3 this is the same as the print('?')test you did above, so there is something different about how you are invoking the script containing this print, compared to your py -3.4. What does sys.stdout.encodinggive you from the script?
在 Python 3 中,这与print('?')您在上面所做的测试相同,因此print与您的py -3.4. sys.stdout.encoding剧本给你什么?
To get printworking correctly you would have to make sure Python picks up the right encoding. If it is not doing that adequately from the terminal settings you would indeed have to set PYTHONIOENCODINGto cp437.
为了print正常工作,您必须确保 Python 选择正确的编码。如果它没有从终端设置中充分地做到这一点,您确实必须PYTHONIOENCODING将cp437.
>>> text = '?'
>>> print(text.encode('utf-8'))
b'\xe2\x99\xa0'
printcan only print Unicode strings. For other types including the bytesstring that results from the encode()method, it gets the literal representation (repr) of the object. b'\xe2\x99\xa0'is how you would write a Python 3 bytes literal containing a UTF-8 encoded ?.
print只能打印 Unicode 字符串。对于包括方法产生的bytes字符串在内的其他类型encode(),它获取对象的文字表示 ( repr)。b'\xe2\x99\xa0'是如何编写包含 UTF-8 编码的 Python 3 字节文字的?。
If what you want to do is bypass print's implicit encoding to PYTHONIOENCODING and substitute your own, you can do that explicitly:
如果您想要做的是绕过printPYTHONIOENCODING 的隐式编码并替换您自己的编码,您可以明确地做到这一点:
>>> import sys
>>> sys.stdout.buffer.write('?'.encode('cp437'))
This will of course generate wrong output for any consoles not running code page 437 (eg non-Western-European installs). Generally, for apps using the C stdio, like Python does, getting non-ASCII characters to the Windows console is just too unreliable to bother with.
这当然会为任何不运行代码页 437(例如非西欧安装)的控制台生成错误的输出。通常,对于使用 C stdio 的应用程序,就像 Python 那样,将非 ASCII 字符输入到 Windows 控制台实在是太不可靠了。
回答by user87690
You can look at it this way. A string is a sequence of characters, not a sequence of bytes. Characters are Unicode codepoints. Bytes are just numbers in range 0–255. At the low level, computers work just with sequences of bytes. If you want to a print a string, you just call print(a_string)in Python. But to communicate with the OS environment, the string has to be encoded to a sequence of bytes. This is done automatically somewhere under the hoods of printfunction. The encoding used is sys.stdout.encoding. If you get an UnicodeEncodeError, it means that your characters cannot be encoded using the current encoding.
你可以这样看。字符串是字符序列,而不是字节序列。字符是 Unicode 代码点。字节只是 0-255 范围内的数字。在底层,计算机只处理字节序列。如果你想打印一个字符串,你只需print(a_string)在 Python 中调用。但是为了与操作系统环境通信,必须将字符串编码为字节序列。这是在print函数引擎盖下的某个地方自动完成的。使用的编码是sys.stdout.encoding. 如果您得到UnicodeEncodeError,则表示您的字符无法使用当前编码进行编码。
As far as I know, it is currently not possible to run Python on Windows in a way that that the encoding used is capable of encoding every character (as UTF-8 or UTF-16) and both assumed by Python and really used by the OS environment for both input and output. There is a workaround – you can use win_unicode_consolepackage, which aims to solve this issue. Just install it by pip install win_unicode_console, and in your sitecustomizeimport it and call win_unicode_console.enable(). This will serve as an external patch to your Python installation ragarding this issue. See the documentation for more information: https://github.com/Drekin/win-unicode-console.
据我所知,目前不可能在 Windows 上运行 Python 的方式是所使用的编码能够对每个字符(如 UTF-8 或 UTF-16)进行编码,并且既由 Python 假定,又由 Python 实际使用输入和输出的操作系统环境。有一个解决方法 - 您可以使用win_unicode_console包,旨在解决此问题。只需安装它pip install win_unicode_console,并在您sitecustomize导入它并调用win_unicode_console.enable(). 这将作为解决此问题的 Python 安装的外部补丁。有关更多信息,请参阅文档:https: //github.com/Drekin/win-unicode-console。
回答by jfs
Do not encode to utf-8; print Unicode directly instead:
不要编码为utf-8;直接打印 Unicode:
print(u'?')
回答by Bensuperpc
Since Python 3.7.x, You can reconfigure stdout :
从 Python 3.7.x 开始,您可以重新配置 stdout :
import sys
sys.stdout.reconfigure(encoding='utf-8')

