string 如何解码 unicode 字符串 Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35083374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 16:30:18  来源:igfitidea点击:

How to decode a unicode string Python

stringpython-2.7unicodedecodeencode

提问by mfalade

What is the best way to decode an encoded string that looks like: u'u\xf1somestring'?

解码看起来像的编码字符串的最佳方法是什么:u'u\xf1somestring'

Background: I have a list that contains random values (strings and integers), I'm trying to convert every item in the list to a string then process each of them.

背景:我有一个包含随机值(字符串和整数)的列表,我试图将列表中的每个项目转换为一个字符串,然后处理它们中的每一个。

Turns out some of the items are of the format: u'u\xf1somestring'When I tried converting to a string, I get the error: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 1: ordinal not in range(128)

原来有些项目的格式如下: u'u\xf1somestring'当我尝试转换为字符串时,出现错误:UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 1: ordinal not in range(128)

I have tried

我试过了

item = u'u\xf1somestring'
decoded_value = item.decode('utf-8', 'ignore')

However, I keep getting the same error.

但是,我不断收到相同的错误。

I have read up about unicode characters and tried a number of suggestions from SO but none have worked so far. Am I missing something here?

我已经阅读了有关 unicode 字符的信息,并尝试了来自 SO 的一些建议,但到目前为止都没有奏效。我在这里错过了什么吗?

回答by Sameer Mirji

You need to call encodefunction and not decodefunction, as itemis already decoded.

您需要调用encode函数而不是decode函数,因为item已经解码。

Like this:

像这样:

decoded_value = item.encode('utf-8')

回答by Tim Pietzcker

That string already isdecoded (it's a Unicode object). You need to encodeit if you want to store it in a file (or send it to a dumb terminal etc.).

该字符串已经解码(它是一个 Unicode 对象)。如果要将其存储在文件中(或将其发送到哑终端等),则需要对其进行编码

Generally, when working with Unicode, you should (in Python 2) decode all your strings early in the workflow (which you already seem to have done; many libraries that handle internet traffic will already do that for you), then do all your work on Unicode objects, and then at the very end, when writing them back, encode them to whatever encoding you're using.

通常,在使用 Unicode 时,您应该(在 Python 2 中)在工作流程的早期解码所有字符串(您似乎已经完成了;许多处理互联网流量的库已经为您完成了),然后完成所有工作在 Unicode 对象上,然后在最后,在写回它们时,将它们编码为您使用的任何编码。