python print() 函数实际上做了什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1979234/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does python print() function actually do?
提问by Kimvais
I was looking at this questionand started wondering what does the print
actually do.
我正在看这个问题并开始想知道print
实际上做了什么。
I have never found out how to use string.decode()
and string.encode()
to get an unicode string "out" in the python interactive shell in the same format as the print does. No matter what I do, I get either
我从来没有发现如何在 python 交互式 shell 中以与打印相同的格式使用string.decode()
和string.encode()
获取 unicode 字符串“输出”。无论我做什么,我都会得到
- UnicodeEncodeError or
- the escaped string with "\x##" notation...
- UnicodeEncodeError 或
- 带有“\x##”符号的转义字符串...
This is python 2.x, but I'm already trying to mend my ways and actually call print()
:)
这是python 2.x,但我已经在尝试改正并实际调用print()
:)
Example:
例子:
>>> import sys
>>> a = '\xAA\xBB\xCC'
>>> print(a)
a?ì
>>> a.encode(sys.stdout.encoding)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xaa in position 0: ordinal not in range(128)
>>> a.decode(sys.stdout.encoding)
u'\xaa\xbb\xcc'
EDIT:
编辑:
Why am I asking this? I am sick and tired of encode()
errors and realized that since print
can do it (at least in the interactive shell). I know that the MUST BE A WAYto magically do the encoding PROPERLY, by digging the info what encoding to use from somewhere...
我为什么要问这个?我厌倦了encode()
错误并意识到因为print
可以做到(至少在交互式 shell 中)。我知道必须是一种神奇地正确编码的方法,通过挖掘信息从某处使用什么编码......
ADDITIONAL INFO: I'm running Python 2.4.3 (#1, Sep 3 2009, 15:37:12) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
附加信息:我在 linux2 上运行 Python 2.4.3 (#1, Sep 3 2009, 15:37:12) [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)]
>>> sys.stdin.encoding
'ISO-8859-1'
>>> sys.stdout.encoding
'ISO-8859-1'
However, the results are the same with Python 2.6.2 (r262:71600, Sep 8 2009, 13:06:43) on the same linux box.
然而,结果与 Python 2.6.2 (r262:71600, Sep 8 2009, 13:06:43) 在同一个 linux 机器上的结果相同。
采纳答案by Micha? Marczyk
EDIT:(Major changes between this edit and the previous one... Note: I'm using Python 2.6.4 on an Ubuntu box.)
编辑:(此编辑与前一个编辑之间的主要更改...注意:我在 Ubuntu 机器上使用 Python 2.6.4。)
Firstly, in my first attempt at an answer, I provided some general information on print
and str
which I'm going to leave below for the benefit of anybody having simpler issues with print
and chancing upon this question. As for a new attempt at dealing with the issue experienced by the OP... Basically, I'm inclined to say that there's no silver bullet here and if print
somehow manages to make sense of a weird string literal, then that's not reproducible behaviour. I'm led to this conclusion by the following funny interaction with Python in my terminal window:
首先,我在回答第一个尝试,我提供了一些常规信息print
和str
其中我要去下面要离开有任何简单的问题与利益print
,并就这个问题chancing。至于处理 OP 遇到的问题的新尝试......基本上,我倾向于说这里没有灵丹妙药,如果print
以某种方式设法理解一个奇怪的字符串文字,那么这不是可重现的行为。通过在终端窗口中与 Python 进行以下有趣的交互,我得出了这个结论:
>>> print '\xaa\xbb\xcc'
??
Have you tried to input a?ì directly from the terminal? At a Linux terminal using utf-8 as the encoding, this is actually read in as six bytes, which can then be made to look like three unicode chars with the help of the decode
method:
您是否尝试过直接从终端输入 a?ì?在使用 utf-8 作为编码的 Linux 终端上,这实际上被读取为六个字节,然后可以在该decode
方法的帮助下使其看起来像三个 unicode 字符:
>>> 'a?ì'
'\xc2\xaa\xc2\xbb\xc3\x8c'
>>> 'a?ì'.decode(sys.stdin.encoding)
u'\xaa\xbb\xcc'
So, the '\xaa\xbb\xcc'
literal only makes sense if you decode it as a latin-1 literal(well, actually you could use a different encoding which agrees with latin-1 on the relevant characters). As for print
'just working' in your case, it certainly doesn't for me -- as mentioned above.
因此,'\xaa\xbb\xcc'
只有将文字解码为 latin-1 文字时,文字才有意义(实际上,您可以使用与相关字符上的 latin-1 一致的不同编码)。至于print
在你的情况下“只是工作”,它当然不适合我 - 如上所述。
This is explained by the fact that when you use a string literal not prefixed with a u
-- i.e. "asdf"
rather than u"asdf"
-- the resulting string will use some non-unicode encoding. No; as a matter of fact, the string object itself is going to be encoding-unaware, and you're going to have to treat it as if it was encoded with encoding x, for the correct value of x. This basic idea leads me to the following:
这是由以下事实解释的:当您使用没有前缀的字符串文字时u
——即"asdf"
而不是u"asdf"
——结果字符串将使用一些非 unicode 编码。不; 事实上,字符串对象本身将无法识别编码,并且您将不得不将其视为使用编码 x 进行编码,以获得正确的 x 值。这个基本思想使我得出以下结论:
a = '\xAA\xBB\xCC'
a.decode('latin1')
# result: u'\xAA\xBB\xCC'
print(a.decode('latin1'))
# output: a?ì
Note the lack of decoding errors and the proper output (which I expect to be stay proper at any other box). Apparently your string literal can be made sense of by Python, but not without some help.
注意没有解码错误和正确的输出(我希望在任何其他盒子上都保持正确)。显然,Python 可以理解您的字符串文字,但并非没有帮助。
Does this help? (At least in understanding how things work, if not in making the handling of encodings any easier...)
这有帮助吗?(至少在理解事情是如何工作的,如果不是让编码的处理变得更容易的话......)
Now for some funny bits with some explanatory value (hopefully)! This works fine for me:
现在有一些有趣的部分具有一些解释价值(希望如此)!这对我来说很好用:
sys.stdout.write("\xAA\xBB\xCC".decode('latin1').encode(sys.stdout.encoding))
Skipping either the decode or the encode part results in a unicode-related exception. Theoretically speaking, this makes sense, as the first decode is needed to decide what characters there are in the given string (the only thing obvious on first sight is what bytesthere are -- the Python 3 idea of having (unicode) strings for characters and bytes for, well, bytes, suddenly seems superbly reasonable), while the encode is needed so that the output respects the encoding of the output stream. Now this
跳过解码或编码部分会导致与 unicode 相关的异常。从理论上讲,这是有道理的,因为需要第一个解码来确定给定字符串中有哪些字符(第一眼看到的唯一明显的是有哪些字节——Python 3 的想法是为字符提供(unicode)字符串和字节,嗯,字节,突然看起来非常合理),而需要编码,以便输出尊重输出流的编码。现在这个
sys.stdout.write("???\n".decode(sys.stdin.encoding).encode(sys.stdout.encoding))
also works as expected, but the characters are actually coming from the keyboard and so are actually encoded with the stdin encoding... Also,
也按预期工作,但字符实际上来自键盘,因此实际上是用 stdin 编码进行编码的......另外,
ord('?'.decode('utf-8').encode('latin2'))
returns the correct 177 (my input encoding is utf-8), but '\xc4\x85'.encode('latin2') makes no sense to Python, as it has no clue as to how to make sense of '\xc4\x85' and figures that trying the 'ascii' code is the best it can do.
返回正确的 177(我的输入编码是 utf-8),但 '\xc4\x85'.encode('latin2') 对 Python 没有意义,因为它不知道如何理解 '\xc4\ x85' 和数字,尝试'ascii' 代码是它可以做的最好的。
The original answer:
原答案:
The relevant bitof Python docs (for version 2.6.4) says that print(obj)
is meant to print out the string given by str(obj)
. I suppose you could then wrap it in a call to unicode
(as in unicode(str(obj))
) to get a unicode string out -- or you could just use Python 3 and exchange this particular nuisance for a couple of different ones. ;-)
Python 文档的相关部分(对于版本 2.6.4)说这print(obj)
是为了打印出由str(obj)
. 我想然后您可以将它包装在对unicode
(如 in unicode(str(obj))
)的调用中以获取 unicode 字符串——或者您可以只使用 Python 3 并将这种特殊的麻烦交换为几个不同的麻烦。;-)
Incidentally, this shows that you can manipulate the result of print
ing an object just like you can manipulate the result of calling str
on an object, that is by messing with the __str__
method. Example:
顺便说一句,这表明您可以操纵print
对象的结果,就像您可以操纵str
对象调用的结果一样,即通过弄乱__str__
方法。例子:
class Foo(object):
def __str__(self):
return "I'm a Foo!"
print Foo()
As for the actual implementation of print
, I expect this won't be useful at all, but if you reallywant to know what's going on... It's in the file Python/bltinmodule.c
in the Python sources (I'm looking at version 2.6.4). Search for a line beginning with builtin_print
. It's actually entirely straightforward, no magic going on there. :-)
至于 的实际实现print
,我希望这根本没有用,但是如果您真的想知道发生了什么......它Python/bltinmodule.c
在 Python 源文件中的文件中(我正在查看 2.6.4 版) . 搜索以 开头的行builtin_print
。它实际上完全简单,没有魔法在那里发生。:-)
Hopefully this answers your question... But if you do have a more arcane problem which I'm missing entirely, do comment, I'll make a second attempt. Also, I'm assuming we're dealing with Python 2.x; otherwise I guess I wouldn't have a useful comment.
希望这能回答您的问题...但是如果您确实有我完全遗漏的更神秘的问题,请发表评论,我会再次尝试。另外,我假设我们正在处理 Python 2.x;否则我想我不会有有用的评论。
回答by Aaron Digulla
print()
uses sys.stdout.encoding
to determine what the output console can understand and then uses this encoding in the call to str.encode()
.
print()
用于sys.stdout.encoding
确定输出控制台可以理解的内容,然后在调用str.encode()
.
[EDIT] If you look at the source, it gets sys.stdout
and then calls:
[编辑] 如果您查看 source,它会获取sys.stdout
然后调用:
PyFile_WriteObject(PyTuple_GetItem(args, i), file,
Py_PRINT_RAW);
I guess the magic is in Py_PRINT_RAW
but the sourcejust says:
我想魔法就在其中,Py_PRINT_RAW
但消息来源只是说:
if (flags & Py_PRINT_RAW) {
value = PyObject_Str(v);
}
So no magic here. A loop over the arguments with sys.stdout.write(str(item))
should do the trick.
所以这里没有魔法。对参数的循环sys.stdout.write(str(item))
应该可以解决问题。
回答by Jason Orendorff
>>> import sys
>>> a = '\xAA\xBB\xCC'
>>> print(a)
a?ì
All print
is doing here is writing raw bytesto sys.stdout
. The string a
is a string of bytes, not Unicode characters.
所有print
在这里做的是写的原始字节来sys.stdout
。该字符串a
是一串字节,而不是 Unicode 字符。
Why am I asking this? I am sick and tired of encode() errors and realized that since print can do it (at least in the interactive shell). I know that the MUST BE A WAY to magically do the encoding PROPERLY, by digging the info what encoding to use from somewhere...
我为什么要问这个?我厌倦了 encode() 错误并意识到因为 print 可以做到(至少在交互式 shell 中)。我知道必须是一种神奇地正确编码的方法,通过挖掘信息从某处使用什么编码......
Alas no, print
is doing nothing at all magical here. You hand it some bytes, it dumps the bytes to stdout.
唉,print
这里根本没有做任何神奇的事情。你给它一些字节,它把字节转储到标准输出。
To use .encode()
and .decode()
properly, you need to understand the difference between bytes and characters, and I'm afraid you do have to figure out the correct encoding to use.
要正确使用.encode()
和.decode()
使用,您需要了解字节和字符之间的区别,恐怕您必须弄清楚要使用的正确编码。
回答by jfs
import sys
source_file_encoding = 'latin-1' # if there is no -*- coding: ... -*- line
a = '\xaa\xbb\xcc' # raw bytes that represent string in source_file_encoding
# print bytes, my terminal tries to interpret it as 'utf-8'
sys.stdout.write(a+'\n')
# -> ??
ua = a.decode(source_file_encoding)
sys.stdout.write(ua.encode(sys.stdout.encoding)+'\n')
# -> a?ì