macos Mac os X 终端中的 Python unicode

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/918294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 06:11:14  来源:igfitidea点击:

Python unicode in Mac os X terminal

pythonmacosunicodeterminal

提问by disc0dancer

Can someone explain to me this odd thing:

有人可以向我解释这个奇怪的事情:

When in python shell I type the following Cyrillic string:

在 python shell 中,我输入以下 Cyrillic 字符串:

>>> print 'абвгд'
абвгд

but when I type:

但是当我输入:

>>> print u'абвгд'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-9: ordinal not in range(128)

Since the first tring came out correctly, I reckon my OS X terminal can represent unicode, but it turns out it can't in the second case. Why ?

由于第一个 tring 正确出现,我认为我的 OS X 终端可以表示 unicode,但事实证明它不能在第二种情况下。为什么 ?

回答by sth

>>> print 'абвгд'
абвгд

When you type in some characters, your terminal decides how these characters are represented to the application. Your terminal might give the characters to the application encoded as utf-8, ISO-8859-5 or even something that only your terminal understands. Python gets these characters as some sequence of bytes. Then python prints out these bytes as they are, and your terminal interprets them in some way to display characters. Since your terminal usually interprets the bytes the same way as it encoded them before, everything is displayed like you typed it in.

当您输入某些字符时,您的终端将决定如何将这些字符表示给应用程序。您的终端可能会向应用程序提供编码为 utf-8、ISO-8859-5 或什至只有您的终端才能理解的字符。Python 将这些字符作为一些字节序列获取。然后python按原样打印出这些字节,您的终端以某种方式解释它们以显示字符。由于您的终端通常以与之前编码它们相同的方式解释字节,因此所有内容都显示为您输入的内容。

>>> u'абвгд'

Here you type in some characters that arrive at the python interpreter as a sequence of bytes, maybe encoded in some way by the terminal. With the uprefix python tries to convert this data to unicode. To do this correctly python has to known what encoding your terminal uses. In your case it looks like Python guesses your terminals encoding would be ASCII, but the received data doesn't match that, so you get an encoding error.

在这里,您输入一些作为字节序列到达 python 解释器的字符,可能由终端以某种方式编码。使用u前缀 python 尝试将此数据转换为 unicode。要正确执行此操作,python 必须知道您的终端使用什么编码。在您的情况下,Python 猜测您的终端编码将是 ASCII,但接收到的数据与此不匹配,因此您会收到编码错误。

The straight forward way to create unicode strings in an interactive session would therefore be something like this this:

因此,在交互式会话中创建 unicode 字符串的直接方法是这样的:

>>> us = 'абвгд'.decode('my-terminal-encoding')

In files you can also specify the encoding of the file with a special mode line:

在文件中,您还可以使用特殊的模式行指定文件的编码:

# -*- encoding: ISO-8859-5 -*-
us = u'абвгд'

For other ways to set the default input encoding you can look at sys.setdefaultencoding(...)or sys.stdin.encoding.

有关设置默认输入编码的其他方法,您可以查看sys.setdefaultencoding(...)sys.stdin.encoding

回答by Ingmar Hupp

As of Python 2.6, you can use the environment variable PYTHONIOENCODINGto tell Python that your terminal is UTF-8 capable. The easiest way to make this permanent is by adding the following line to your ~/.bash_profile:

从 Python 2.6 开始,您可以使用环境变量PYTHONIOENCODING告诉 Python 您的终端支持 UTF-8。使其永久化的最简单方法是将以下行添加到您的~/.bash_profile

export PYTHONIOENCODING=utf-8

Terminal.app showing unicode output from Python

Terminal.app 显示来自 Python 的 unicode 输出

回答by Jarret Hardie

In addition to ensuring your OS X terminal is set to UTF-8, you may wish to set your python sys default encoding to UTF-8 or better. Create a file in /Library/Python/2.5/site-packagescalled sitecustomize.py. In this file put:

除了确保您的 OS X 终端设置为 UTF-8,您可能还希望将您的 python sys 默认编码设置为 UTF-8 或更好。在/Library/Python/2.5/site-packages名为sitecustomize.py. 在这个文件中:

import sys
sys.setdefaultencoding('utf-8')

The setdefaultencodingmethod is available only by the site module, and is removed from the sys namespace once startup has completed. As such, you'll need to start a new python interpreter for the change to take effect. You can verify the current default coding at any time after startup with sys.getdefaultencoding().

setdefaultencoding方法只能由站点模块使用,一旦启动完成,就会从sys 命名空间中删除。因此,您需要启动一个新的 Python 解释器以使更改生效。您可以在启动后随时验证当前的默认编码sys.getdefaultencoding()

If the characters aren't already unicode and you need to convert them, use the decodemethod on a string in order to decode the text from some other charset into unicode... best to specify which charset:

如果字符还不是 unicode 并且您需要转换它们,请decode在字符串上使用该方法,以便将其他字符集的文本解码为 un​​icode...最好指定哪个字符集:

s = 'абвгд'.decode('some_cyrillic_charset') # makes the string unicode
print s.encode('utf-8') # transform the unicode into utf-8, then print it

回答by cdonner

Also, make sure the terminal encoding is set to Unicode/UTF-8 (and not ascii, which seems to be your setting):

另外,请确保终端编码设置为 Unicode/UTF-8(而不是 ascii,这似乎是您的设置):

http://www.rift.dk/news.php?item.7.6

http://www.rift.dk/news.php?item.7.6

回答by workmad3

A unicode object needs to be encoded before it can be displayed on some consoles. Try

unicode 对象需要经过编码才能在某些控制台上显示。尝试

u'абвгд'.encode()

instead to encode the unicode to a string object (most likely using utf8 as a default encoding, but depends on your python config)

而是将 unicode 编码为字符串对象(很可能使用 utf8 作为默认编码,但取决于您的 python 配置)

回答by hekevintran

'абвгд' is not a unicode string

'абвгд' 不是 unicode 字符串

u'абвгд' is a unicode string

u'абвгд' 是一个 unicode 字符串

You cannot print unicode strings without encoding them. When you are dealing with strings in your application you want to make sure that any input is decoded and any output in encoded. This way your application will deal only with unicode strings internally and output strings in UTF8.

您不能在不编码的情况下打印 unicode 字符串。当您在应用程序中处理字符串时,您希望确保所有输入都已解码,所有输出均已编码。这样,您的应用程序将仅在内部处理 unicode 字符串并以 UTF8 格式输出字符串。

For reference:

以供参考:

>>> 'абвгд'.decode('utf8') == u'абвгд'
>>> True