string 如何解码 unicode 字符串 Python
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35083374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to decode a unicode string Python
提问by mfalade
What is the best way to decode an encoded string that looks like: u'u\xf1somestring'
?
解码看起来像的编码字符串的最佳方法是什么:u'u\xf1somestring'
?
Background: I have a list that contains random values (strings and integers), I'm trying to convert every item in the list to a string then process each of them.
背景:我有一个包含随机值(字符串和整数)的列表,我试图将列表中的每个项目转换为一个字符串,然后处理它们中的每一个。
Turns out some of the items are of the format: u'u\xf1somestring'
When I tried converting to a string, I get the error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 1: ordinal not in range(128)
原来有些项目的格式如下: u'u\xf1somestring'
当我尝试转换为字符串时,出现错误:UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 1: ordinal not in range(128)
I have tried
我试过了
item = u'u\xf1somestring'
decoded_value = item.decode('utf-8', 'ignore')
However, I keep getting the same error.
但是,我不断收到相同的错误。
I have read up about unicode characters and tried a number of suggestions from SO but none have worked so far. Am I missing something here?
我已经阅读了有关 unicode 字符的信息,并尝试了来自 SO 的一些建议,但到目前为止都没有奏效。我在这里错过了什么吗?
回答by Sameer Mirji
You need to call encode
function and not decode
function, as item
is already decoded.
您需要调用encode
函数而不是decode
函数,因为item
已经解码。
Like this:
像这样:
decoded_value = item.encode('utf-8')
回答by Tim Pietzcker
That string already isdecoded (it's a Unicode object). You need to encodeit if you want to store it in a file (or send it to a dumb terminal etc.).
该字符串已经被解码(它是一个 Unicode 对象)。如果要将其存储在文件中(或将其发送到哑终端等),则需要对其进行编码。
Generally, when working with Unicode, you should (in Python 2) decode all your strings early in the workflow (which you already seem to have done; many libraries that handle internet traffic will already do that for you), then do all your work on Unicode objects, and then at the very end, when writing them back, encode them to whatever encoding you're using.
通常,在使用 Unicode 时,您应该(在 Python 2 中)在工作流程的早期解码所有字符串(您似乎已经完成了;许多处理互联网流量的库已经为您完成了),然后完成所有工作在 Unicode 对象上,然后在最后,在写回它们时,将它们编码为您使用的任何编码。