Python 的 string.format() 和 Unicode

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13674663/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:22:17  来源:igfitidea点击:

Python's string.format() and Unicode

pythonunicode

提问by mpounsett

I'm having a problem with Python's string.format()and passing Unicode strings to it. This is similar to this older question, except that in my case the test code explodes on the print, not on the logging.info()call. Passing the same Unicode string object to a logging handler works fine.

我在使用 Pythonstring.format()并将 Unicode 字符串传递给它时遇到问题。这类似于这个较旧的问题,除了在我的情况下,测试代码在打印时爆炸,而不是在logging.info()调用时爆炸。将相同的 Unicode 字符串对象传递给日志处理程序可以正常工作。

This fails equally well with the older %formatting as well as string.format(). Just to make sure it was the string object that is the problem, and not print interacting badly with my terminal, I tried assigning the formatted string to a variable before printing.

对于旧%格式以及string.format(). 只是为了确保问题出在字符串对象上,并且不会打印与我的终端交互不良,我尝试在打印之前将格式化的字符串分配给一个变量。

def unicode_test():
    byte_string = '\xc3\xb4'
    unicode_string = unicode(byte_string, "utf-8")
    print "unicode object type: {}".format(type(unicode_string))
    output_string = "printed unicode object: {}".format(unicode_string)
    print output_string

if __name__ == '__main__':
    unicode_test()

The string object seems to assume it's getting ASCII.

字符串对象似乎假设它正在获取 ASCII。

% python -V
Python 2.7.2

% python ./unicodetest.py
unicode object type: <type 'unicode'>
Traceback (most recent call last):
  File "./unicodetest.py", line 10, in <module>
    unicode_test()
  File "./unicodetest.py", line 6, in unicode_test
    output_string = "printed unicode object: {}".format(unicode_string)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf4' in position 0: ordinal not in range(128)

Trying to cast output_stringas Unicode doesn't make any difference.

尝试转换output_string为 Unicode 没有任何区别。

output_string = u"printed unicode object: {}".format(unicode_string)

output_string = u"打印的unicode对象:{}".format(unicode_string)

Am I missing something here? The documentation for the string object seems pretty clear that this should work as I'm attempting to use it.

我在这里错过了什么吗?string 对象的文档似乎很清楚,这应该在我尝试使用它时起作用。

采纳答案by lqc

No this should not work (can you cite the part of the documentation that says so ?), but it should work if the formatting pattern is unicode (or with the old formatting which 'promotes' the pattern to unicode instead of trying to 'demote' arguments).

不,这不应该起作用(你能引用文档中这样说的部分吗?),但如果格式模式是 unicode(或使用旧格式将模式“提升”为 unicode 而不是试图“降级”,它应该有效' 论点)。

>>> x = "\xc3\xb4".decode('utf-8')
>>> x
u'\xf4'
>>> x + 'a'
u'\xf4a'
>>> 'a' + x
u'a\xf4'
>>> 'a %s' % x
u'a \xf4'
>>> 'a {}'.format(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec 
  can't encode character u'\xf4' in position 0: ordinal not in range(128)
>>> u'a {}'.format(x)
u'a \xf4'
>>> print u"Foo bar {}".format(x)
Foo bar ?

Edit: The printline may not work for you if the unicode string can't be encoded using your console's encoding. For example, on my Windows console:

编辑:print如果无法使用控制台的编码对 unicode 字符串进行编码,则该行可能对您不起作用。例如,在我的 Windows 控制台上:

>>> import sys
>>> sys.stdout.encoding
'cp852'
>>> u'\xf4'.encode('cp852')
'\x93'

On a UNIX console this may related to your locale settings. It will also fail if you redirect output (like when using |in shell). Most of this issues have been fixed in Python 3.

在 UNIX 控制台上,这可能与您的区域设置有关。如果您重定向输出(例如|在 shell 中使用时),它也会失败。大多数此类问题已在 Python 3 中修复。