如何在 Windows 控制台的 python 中打印 unicode 字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6725249/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 17:28:35  来源:igfitidea点击:

How to print a unicode string in python in Windows console

pythonwindowsunicodeencoding

提问by yonix

I'm working on a python application that can print text in multiple languages to the console in multiple platforms. The program works well on all UNIX platforms, but in windows there are errors printing unicode strings in command-line.

我正在开发一个 python 应用程序,该应用程序可以将多种语言的文本打印到多个平台的控制台。该程序在所有 UNIX 平台上运行良好,但在 Windows 中,在命令行中打印 unicode 字符串时会出错。

There's already a relevant thread regarding this: ( Windows cmd encoding change causes Python crash) but I couldn't find my specific answer there.

已经有一个与此相关的线程:(Windows cmd 编码更改导致 Python 崩溃)但我在那里找不到我的具体答案。

For example, for the following Asian text, in Linux, I can run:

比如下面的亚洲文字,在Linux下,我可以运行:

>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8")
引起的或

But in windows I get:

但在 Windows 中我得到:

>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8")
σ╝?Φ╡╖τ??μ??

I succeeded displaying the correct text with a message box when doing something like that:

执行以下操作时,我成功地使用消息框显示了正确的文本:

>>> file("bla.vbs", "w").write(u'MsgBox "\u5f15\u8d77\u7684\u6216", 4, "MyTitle"'.encode("utf-16"))
>>> os.system("cscript //U //NoLogo bla.vbs")

But, I want to be able to do it in windows console, and preferably - without requiring too much configuration outside my python code (because my application will be distributed to many hosts).

但是,我希望能够在 Windows 控制台中执行此操作,并且最好 - 在我的 python 代码之外不需要太多配置(因为我的应用程序将分发到许多主机)。

Is this possible?

这可能吗?

Edit:If it's not possible - I would be happy to accept some other suggestions of writing a console application in windows that displays unicode, e.g. a python implementation of an alternative windows console

编辑:如果不可能 - 我很乐意接受在显示 unicode 的 windows 中编写控制台应用程序的一些其他建议,例如替代 windows 控制台的 python 实现

回答by Kevin Edwards

There's a WriteConsoleW solution that provides a unicode argv and stdout (print) but not stdin: Windows cmd encoding change causes Python crash

有一个 WriteConsoleW 解决方案,它提供了一个 unicode argv 和 stdout(打印)但不提供 stdin:Windows cmd 编码更改导致 Python 崩溃

The only thing I modified is sys.argv to keep it unicode. The original version utf-8 encoded it for some reason.

我唯一修改的是 sys.argv 以保持它的 unicode。原始版本 utf-8 出于某种原因对其进行了编码。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

""" https://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash#answer-3259271
"""

import sys

if sys.platform == "win32":
    import codecs
    from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int
    from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID

    original_stderr = sys.stderr

    # If any exception occurs in this code, we'll probably try to print it on stderr,
    # which makes for frustrating debugging if stderr is directed to our wrapper.
    # So be paranoid about catching errors and reporting them to original_stderr,
    # so that we can at least see them.
    def _complain(message):
        print >>original_stderr, message if isinstance(message, str) else repr(message)

    # Work around <http://bugs.python.org/issue6058>.
    codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None)

    # Make Unicode console output work independently of the current code page.
    # This also fixes <http://bugs.python.org/issue1602>.
    # Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx>
    # and TZOmegaTZIOY
    # <https://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>.
    try:
        # <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx>
        # HANDLE WINAPI GetStdHandle(DWORD nStdHandle);
        # returns INVALID_HANDLE_VALUE, NULL, or a valid handle
        #
        # <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx>
        # DWORD WINAPI GetFileType(DWORD hFile);
        #
        # <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx>
        # BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode);

        GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32))
        STD_OUTPUT_HANDLE = DWORD(-11)
        STD_ERROR_HANDLE = DWORD(-12)
        GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32))
        FILE_TYPE_CHAR = 0x0002
        FILE_TYPE_REMOTE = 0x8000
        GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32))
        INVALID_HANDLE_VALUE = DWORD(-1).value

        def not_a_console(handle):
            if handle == INVALID_HANDLE_VALUE or handle is None:
                return True
            return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR
                    or GetConsoleMode(handle, byref(DWORD())) == 0)

        old_stdout_fileno = None
        old_stderr_fileno = None
        if hasattr(sys.stdout, 'fileno'):
            old_stdout_fileno = sys.stdout.fileno()
        if hasattr(sys.stderr, 'fileno'):
            old_stderr_fileno = sys.stderr.fileno()

        STDOUT_FILENO = 1
        STDERR_FILENO = 2
        real_stdout = (old_stdout_fileno == STDOUT_FILENO)
        real_stderr = (old_stderr_fileno == STDERR_FILENO)

        if real_stdout:
            hStdout = GetStdHandle(STD_OUTPUT_HANDLE)
            if not_a_console(hStdout):
                real_stdout = False

        if real_stderr:
            hStderr = GetStdHandle(STD_ERROR_HANDLE)
            if not_a_console(hStderr):
                real_stderr = False

        if real_stdout or real_stderr:
            # BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars,
            #                           LPDWORD lpCharsWritten, LPVOID lpReserved);

            WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32))

            class UnicodeOutput:
                def __init__(self, hConsole, stream, fileno, name):
                    self._hConsole = hConsole
                    self._stream = stream
                    self._fileno = fileno
                    self.closed = False
                    self.softspace = False
                    self.mode = 'w'
                    self.encoding = 'utf-8'
                    self.name = name
                    self.flush()

                def isatty(self):
                    return False

                def close(self):
                    # don't really close the handle, that would only cause problems
                    self.closed = True

                def fileno(self):
                    return self._fileno

                def flush(self):
                    if self._hConsole is None:
                        try:
                            self._stream.flush()
                        except Exception as e:
                            _complain("%s.flush: %r from %r" % (self.name, e, self._stream))
                            raise

                def write(self, text):
                    try:
                        if self._hConsole is None:
                            if isinstance(text, unicode):
                                text = text.encode('utf-8')
                            self._stream.write(text)
                        else:
                            if not isinstance(text, unicode):
                                text = str(text).decode('utf-8')
                            remaining = len(text)
                            while remaining:
                                n = DWORD(0)
                                # There is a shorter-than-documented limitation on the
                                # length of the string passed to WriteConsoleW (see
                                # <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>.
                                retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None)
                                if retval == 0 or n.value == 0:
                                    raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value))
                                remaining -= n.value
                                if not remaining:
                                    break
                                text = text[n.value:]
                    except Exception as e:
                        _complain("%s.write: %r" % (self.name, e))
                        raise

                def writelines(self, lines):
                    try:
                        for line in lines:
                            self.write(line)
                    except Exception as e:
                        _complain("%s.writelines: %r" % (self.name, e))
                        raise

            if real_stdout:
                sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>')
            else:
                sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>')

            if real_stderr:
                sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>')
            else:
                sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>')
    except Exception as e:
        _complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,))


    # While we're at it, let's unmangle the command-line arguments:

    # This works around <http://bugs.python.org/issue2128>.
    GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32))
    CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32))

    argc = c_int(0)
    argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc))

    argv = [argv_unicode[i] for i in xrange(0, argc.value)]

#    argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)]

    if not hasattr(sys, 'frozen'):
        # If this is an executable produced by py2exe or bbfreeze, then it will
        # have been invoked directly. Otherwise, unicode_argv[0] is the Python
        # interpreter, so skip that.
        argv = argv[1:]

        # Also skip option arguments to the Python interpreter.
        while len(argv) > 0:
            arg = argv[0]
            if not arg.startswith(u"-") or arg == u"-":
                break
            argv = argv[1:]
            if arg == u'-m':
                # sys.argv[0] should really be the absolute path of the module source,
                # but never mind
                break
            if arg == u'-c':
                argv[0] = u'-c'
                break

    # if you like:
    sys.argv = argv

回答by Pete Forman

Use a different console program. The following works in mintty, the default terminal emulator in Cygwin.

使用不同的控制台程序。以下适用于 mintty,Cygwin 中的默认终端模拟器。

>>> print u"\u5f15\u8d77\u7684\u6216"
引起的或

There are other console alternatives available for Windows but I have not assessed their Unicode support.

还有其他可用于 Windows 的控制台替代品,但我尚未评估它们的 Unicode 支持。

回答by Mat M

It merely comes from that cmd and powershell consoel do not support variable-width fonts. Fixed fonts do not have Chinese script included. Cygwin is in the same case.
Putty is more advanced, supporting variable-width fonts with cyrillic, vietnamese, arabic scripts, but no chinese so far.

它只是来自 cmd 和 powershell consoel 不支持可变宽度字体。固定字体不包含汉字。Cygwin 也是同样的情况。
Putty 更高级,支持西里尔文、越南文、阿拉伯文的变宽字体,但目前还没有中文。

HTH

HTH

回答by John Zwinck

Can you try using the program iconvon Windows, and piping your Python output through it? It'd go something like this:

您可以尝试iconv在 Windows 上使用该程序,并通过它传输您的 Python 输出吗?它会是这样的:

python foo.py | iconv -f utf-8 -t utf-16

You might have to do a little work to get iconvon Windows--it's part of Cygwin but you may be able to build it separately somehow if needed.

您可能需要做一些工作才能iconv使用 Windows——它是 Cygwin 的一部分,但如果需要,您可以以某种方式单独构建它。

回答by Basilevs

The question is answered in the PrintFails article.

PrintFails 文章中回答了这个问题。

By default, the console in Microsoft Windows only displays 256 characters (cp437, of Code page 437, the original IBM-PC 1981 extended ASCII character set.)

默认情况下,Microsoft Windows 中的控制台仅显示 256 个字符(cp437,代码页 437,原始 IBM-PC 1981 扩展 ASCII 字符集。)

For Russia this means CP866, other countries use their own codepages too. This means that to read Python output in Windows console correctly you should have windows configuration with native codepage configured to display printed symbols.

对于俄罗斯,这意味着 CP866,其他国家也使用自己的代码页。这意味着要在 Windows 控制台中正确读取 Python 输出,您应该具有配置为显示打印符号的本机代码页的 Windows 配置。

I suggest you to always print Unicode text without any encoding to ensure maximum compatibility with various platforms.

我建议您始终打印没有任何编码的 Unicode 文本,以确保与各种平台的最大兼容性。

If you try to print unprintable character you will get UnicodeEncodeError or see distorted text.

如果您尝试打印不可打印的字符,您将收到 UnicodeEncodeError 或看到扭曲的文本。

In some cases, if Python fails to determine output encoding correctly you might try to set PYTHONIOENCODING environment variable, do note however, that this probably won't work for your example, as your console is unable to present Asian text in current configuration.

在某些情况下,如果 Python 无法正确确定输出编码,您可能会尝试设置 PYTHONIOENCODING 环境变量,但请注意,这可能不适用于您的示例,因为您的控制台无法在当前配置中显示亚洲文本。

To reconfigure console use Control Panel->Language and Regional settings->Advanced(tab)->Non Unicode programs language(section). Note that menu names are translated by me from Russian.

要重新配置控制台,请使用控制面板-> 语言和区域设置-> 高级(选项卡)-> 非 Unicode 程序语言(部分)。请注意,菜单名称是我从俄语翻译过来的。

See also answers for the very similar question.

另请参阅非常相似问题的答案。