Linux/Python:编码用于打印的 unicode 字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5109970/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:01:10  来源:igfitidea点击:

Linux/Python: encoding a unicode string for print

pythonlinuxunicodeencodinglocale

提问by Mats Ekberg

I have a fairly large python 2.6 application with lots of print statements sprinkled about. I'm using unicode strings throughout, and it usually works great. However, if I redirect the output of the application (like "myapp.py >output.txt"), then I occasionally get errors such as this:

我有一个相当大的 python 2.6 应用程序,上面散布着很多打印语句。我一直在使用 unicode 字符串,它通常效果很好。但是,如果我重定向应用程序的输出(如“myapp.py >output.txt”),那么我偶尔会收到如下错误:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

I guess the same issue comes up if someone has set their LOCALE to ASCII. Now, I understand perfectly well the reason for this error. There are characters in my Unicode strings that are not possible to encode in ASCII. Fair enough. But I'd like my python program to make a best effort to try to print something understandable, maybe skipping the suspicious characters or replacing them with their Unicode ids.

我想如果有人将他们的 LOCALE 设置为 ASCII,也会出现同样的问题。现在,我完全理解这个错误的原因。我的 Unicode 字符串中有一些字符无法用 ASCII 编码。很公平。但是我希望我的 python 程序尽最大努力尝试打印一些可以理解的东西,可能会跳过可疑字符或用它们的 Unicode id 替换它们。

This problem must be common... What is the best practice for handling this problem? I'd prefer a solution that allows me to keep using plain old "print", but I can modify all occurrences if necessary.

这个问题一定很常见……处理这个问题的最佳实践是什么?我更喜欢一个允许我继续使用普通旧“打印”的解决方案,但如果需要,我可以修改所有出现的内容。

PS: I have now solved this problem.The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails, as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.

PS:我现在已经解决了这个问题。解决方案既不是给出的答案。我使用了http://wiki.python.org/moin/PrintFails给出的方法,正如 ChrisJ 在评论之一中给出的那样。也就是说,我将 sys.stdout 替换为使用正确参数调用 unicode encode 的包装器。效果很好。

采纳答案by Mats Ekberg

I have now solved this problem. The solution was neither of the answers given. I used the method given at http://wiki.python.org/moin/PrintFails, as given by ChrisJ in one of the comments. That is, I replace sys.stdout with a wrapper that calls unicode encode with the correct arguments. Works very well.

我现在已经解决了这个问题。解决方案既不是给出的答案。我使用了http://wiki.python.org/moin/PrintFails给出的方法,正如 ChrisJ 在评论之一中给出的那样。也就是说,我将 sys.stdout 替换为使用正确参数调用 unicode encode 的包装器。效果很好。

回答by Triptych

If you're dumping to an ASCII terminal, encode manually using unicode.encode, and specify that errors should be ignored.

如果您要转储到 ASCII 终端,请使用 手动编码unicode.encode,并指定应忽略错误。

u = u'\xa0'
u.encode('ascii') # This fails
u.encode('ascii', 'ignore') # This replaces failed encoding attempts with empty string

If you want to store unicode files, try this:

如果要存储 unicode 文件,请尝试以下操作:

u = u'\xa0'
print >>open('out', 'w'), u # This fails
print >>open('out', 'w'), u.encode('utf-8') # This is ok

回答by Andreas Jung

Either wrap all your print statement through a method perform arbitrary unicode -> utf8 conversion or as last resort change the Python default encoding from ascii to utf-8 inside your site.py. In general it is a bad idea printing unicode strings unfiltered to sys.stdout since Python will trigger an implict conversion of unicode strings to the configured default encoding which is ascii.

要么通过一种方法包装所有的打印语句,执行任意 unicode -> utf8 转换,要么作为最后的手段,在 site.py 中将 Python 默认编码从 ascii 更改为 utf-8。一般来说,将未过滤的 unicode 字符串打印到 sys.stdout 是一个坏主意,因为 Python 会触发 unicode 字符串到配置的默认编码 ascii 的隐式转换。