无法从 JSON 对象在 Python 中打印字符 '\u2019'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18473794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:50:43  来源:igfitidea点击:

Can't print character '\u2019' in Python from JSON object

pythonencodingprintingpython-3.x

提问by N-Saba

As a project to help me learn Python, I'm making a CMD viewer of Reddit using the json data (for example www.reddit.com/all/.json). When certain posts show up and I attempt to print them (that's what I assume is causing the error), I get this error:

作为一个帮助我学习 Python 的项目,我正在使用 json 数据(例如 www.reddit.com/all/.json)制作 Reddit 的 CMD 查看器。当某些帖子出现并且我尝试打印它们时(这就是我认为导致错误的原因),我收到此错误:

Traceback (most recent call last): File "C:\Users\nsaba\Desktop\reddit_viewer.py", line 33, in print ( "%d. (%d) %s\n" % (i+1, obj['data']['score'], obj['data']['title']))

回溯(最近一次调用):文件“C:\Users\nsaba\Desktop\reddit_viewer.py”,第 33 行,打印中(“%d. (%d) %s\n” % (i+1, obj ['data']['score'], obj['data']['title']))

File "C:\Python33\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 32: character maps to

文件“C:\Python33\lib\encodings\cp437.py”,第 19 行,在编码返回 codecs.charmap_encode(input,self.errors,encoding_map)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\ u2019' 在位置 32:字符映射到

Here is where I handle the data:

这是我处理数据的地方:

request = urllib.request.urlopen(url)
content = request.read().decode('utf-8')
jstuff = json.loads(content)

The line I use to print the data as listed in the error above:

我用来打印上面错误中列出的数据的行:

print ( "%d. (%d) %s\n" % (i+1, obj['data']['score'], obj['data']['title']))

Can anyone suggest where I might be going wrong?

谁能建议我可能出错的地方?

采纳答案by abarnert

It's almost certain that you problem has nothing to do with the code you've shown, and can be reproduced in one line:

几乎可以肯定,您的问题与您显示的代码无关,并且可以在一行中重现:

print(u'19')

If your terminal's character set can't handle U+2019 (or if Python is confused about what character set your terminal uses), there's no way to print it out. It doesn't matter whether it comes from JSON or anywhere else.

如果您的终端的字符集无法处理 U+2019(或者如果 Python 对您的终端使用的字符集感到困惑),则无法将其打印出来。它是来自 JSON 还是来自其他任何地方都没有关系。

The Windows terminal (aka "DOS prompt" or "cmd window") is usually configured for a character set like cp1252 that only knows about 256 of the 110000 characters, and there's nothing Python can do about this without a major change to the language implementation.*

Windows 终端(又名“DOS 提示符”或“cmd 窗口”)通常配置为像 cp1252 这样的字符集,它只知道 110000 个字符中的 256 个,如果不对语言实现进行重大更改,Python 对此无能为力.*

See PrintFailson the Python Wiki for details, workarounds, and links to more information. There are also a few hundred dups of this problem on SO (although many of them will be specific to Python 2.x, without mentioning it).

有关详细信息、解决方法和指向更多信息的链接,请参阅Python Wiki 上的PrintFails。在 SO 上也有数百个这个问题的副本(尽管其中许多将特定于 Python 2.x,但没有提及)。



* Windows has a whole separate set of APIs for printing UTF-16 to the terminal, so Python could detect that stdout is a Windows terminal, and if so encode to UTF-16 and use the special APIs instead of encoding to the terminal's charset and using the standard ones. But this raises a bunch of different problems (e.g., different ways of printing to stdoutgetting out of sync). There's been discussion about making these changes, but even if everyone were to agree and the patch were written tomorrow, it still wouldn't help you until you upgrade to whatever future version of Python it's added to…

* Windows 有一整套单独的 API 用于将 UTF-16 打印到终端,因此 Python 可以检测到 stdout 是 Windows 终端,如果是,则编码为 UTF-16 并使用特殊 API 而不是编码为终端的字符集和使用标准的。但这会引发一系列不同的问题(例如,不同的打印方式导致stdout不同步)。一直在讨论进行这些更改,但是即使每个人都同意并且明天编写补丁,直到您升级到它添加到的任何未来版本的 Python 之前,它仍然无济于事……

回答by blakev

I set IDLE (Python Shell) and Window's CMD default font to Lucida Console (a utf-8 supported font) and these types of errors went away; and you no longer see boxes [][][][][][][][]

我将 IDLE(Python Shell)和 Window 的 CMD 默认字体设置为 Lucida Console(支持 utf-8 的字体),这些类型的错误就消失了;并且您不再看到框 [][][][][][][][]

:)

:)

回答by Monte Hayward

@N-Saba, what is the string that causes the error to be thrown? In my test case, this looks to be a version-specific bug in python 2.7.3.

@N-Saba,导致抛出错误的字符串是什么?在我的测试用例中,这看起来是 python 2.7.3 中特定于版本的错误。

In the feed I was parsing, the "title" field had the following value:

在我解析的提要中,“title”字段具有以下值:

u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'

I get the expected right single quote char when I call either of these, in python 2.7.6.

当我在 python 2.7.6 中调用其中任何一个时,我得到了预期的正确单引号字符。

python -c "print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title']"
Intel's Sharp-Eyed Social Scientist

In 2.7.3, I get the error, unless I encode the value that I pulled by KeyName.

2.7.3 中,我收到错误消息,除非我对 KeyName 提取的值进行编码。

print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title']
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 5: ordinal not in range(128)
print {u'title': u'Intel\u2019s Sharp-Eyed Social Scientist'}['title'].encode('utf-8', 'replace')
Intel's Sharp-Eyed Social Scientist

fwiw, the @abamert command print('\u2019') prints "9". I think the intended code was print(u'\u2019').

fwiw,@abamert 命令 print('\u2019') 打印“9”。我认为预期的代码是打印(u'\u2019')。

回答by yeliabsalohcin

I came across a similar error when attempting to write an API JSON output to a .cav file via pd.DataFrame.to_csv()on a Win install of Python 2.7.14.

在尝试通过pd.DataFrame.to_csv()Python 2.7.14 的 Win 安装将 API JSON 输出写入 .cav 文件时,我遇到了类似的错误。

Specifying the encoding as utf-8fixed my process:

将编码指定为utf-8固定我的过程:

pd.DataFrame.to_csv(filename, encoding='utf-8')

回答by Echelon

For anyone encountering this in macOS, @abarnert's answer is correct and I was able to fix it by putting this at the top of the offending source file:-

对于在 macOS 中遇到此问题的任何人,@abarnert 的答案是正确的,我能够通过将其放在有问题的源文件的顶部来修复它:-

# magic to make everything work in Unicode
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

To clarify, this is making sure the terminal output accepts Unicode correctly.

澄清一下,这是确保终端输出正确接受 Unicode。