python 2.7 字符 \u2013
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20329896/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python 2.7 character \u2013
提问by user2904150
I have following code:
我有以下代码:
# -*- coding: utf-8 -*-
print u"William Burges (1827–81) was an English architect and designer."
When I try to run it from cmd. I get following message:
当我尝试从 cmd 运行它时。我收到以下消息:
Traceback (most recent call last):
File "C:\Python27\utf8.py", line 3, in <module>
print u"William Burges (1827ō?ō81) was an English architect and designer."
File "C:\Python27\lib\encodings\cp775.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
20: character maps to <undefined>
How could I solve this problem and make Python read this \u2013 character? And why Python doesn't read it with existing code, I thought that utf-8 works for every character.
我怎样才能解决这个问题并使 Python 读取这个 \u2013 字符?以及为什么 Python 不使用现有代码读取它,我认为 utf-8 适用于每个字符。
Thank you
谢谢
EDIT:
编辑:
This code prints out wanted outcome:
此代码打印出想要的结果:
# -*- coding: utf-8 -*-
print unicode("William Burges (1827-81) was an English architect and designer.", "utf-8").encode("cp866")
But when I try to print more than one sentence, for example:
但是当我尝试打印多个句子时,例如:
# -*- coding: utf-8 -*-
print unicode("William Burges (1827–81) was an English architect and designer. I am here. ", "utf-8").encode("cp866")
I get same error message:
我收到相同的错误消息:
Traceback (most recent call last):
File "C:\Python27\utf8vs.py", line 3, in <module>
print unicode("William Burges (1827ō?ō81) was an English architect and desig
ner. I am here. ", "utf-8").encode("cp866")
File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
20: character maps to <undefined>
回答by Hyman Aidley
I suspect the problem is down to the print statement rather than anything inherent to the python (it works fine on my Mac). In order to print the string, it needs to convert it into a displayable format; the longer dash you've used isn't displayable in the default character set on the Windows command line.
我怀疑问题出在打印语句上,而不是 python 固有的任何东西(它在我的 Mac 上运行良好)。为了打印字符串,需要将其转换为可显示的格式;您使用的较长的破折号无法在 Windows 命令行的默认字符集中显示。
The difference between your two sentences is not in the length but in the kind of dash used in "(1827-81)" vs "(1827–81)" - can you see the subtle difference? Try copying and pasting one over the other to check this.
你的两个句子之间的区别不在于长度,而在于“(1827-81)”与“(1827-81)”中使用的破折号——你能看到细微的区别吗?尝试将一个复制并粘贴到另一个上以检查这一点。
See also Python, Unicode, and the Windows console.
回答by Kirill Zaitsev
There is actually a wiki article on wiki.python.org about this issue https://wiki.python.org/moin/PrintFailsthat explains why this might happen with charmapcodec.
实际上在 wiki.python.org 上有一篇关于这个问题的 wiki 文章https://wiki.python.org/moin/PrintFails解释了为什么charmap编解码器可能会发生这种情况。
Setting the PYTHONIOENCODING environment variable as described above can be used to suppress the error messages. Setting to "utf-8" is not recommended as this produces an inaccurate, garbled representation of the output to the console. For best results, use your console's correct default codepage and a suitable error handler other than "strict".
Setting the PYTHONIOENCODING environment variable as described above can be used to suppress the error messages. Setting to "utf-8" is not recommended as this produces an inaccurate, garbled representation of the output to the console. For best results, use your console's correct default codepage and a suitable error handler other than "strict".
回答by Michael Kazarian
Your string contain ndash sumbol. It similr to ascii minus -, see symbol No 45 an ascii table. Replace ndash to minus, because ascii can't contain ndash. Below work variant:
您的字符串包含 ndash sumbol。它类似于 ascii 减号-,参见符号 No 45 an ascii table。将 ndash 替换为减号,因为 ascii 不能包含 ndash。以下工作变体:
# -*- coding: utf-8 -*-
my_string = "William Burges (1827–81) was an English architect and designer."
my_string = my_string.replace("–", "-")# replace utf-8 symbol (ndash) to ascii (-)
print my_string
output
输出
William Burges (1827-81) was an English architect and designer. I am here.

