Python 使用 json.dumps() 时出现 UnicodeDecodeError

Question

提问by deostroll

I have strings as follows in my python list (taken from command prompt):

我的python列表中有如下字符串（取自命令提示符）：

>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ', 60,
 True, '40141613')
>>>

I have tried suggestions as mentioned here: Changing default encoding of Python?

我已经尝试过这里提到的建议：Changing default encoding of Python?

Further changed the default encoding to utf-16 too. But still json.dumps()threw and exception as follows:

也进一步将默认编码更改为 utf-16。但仍然json.dumps()抛出异常如下：

>>> write(o)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "okapi_create_master.py", line 49, in write
    o = json.dumps(output)
  File "C:\Python27\lib\json\__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "C:\Python27\lib\json\encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte

Can't figure what kind of transformation is required for such strings so that json.dumps()works.

无法确定此类字符串需要什么样的转换才能json.dumps()起作用。

Answer 1

采纳答案by falsetru

\xe1is not decodable using utf-8, utf-16 encoding.

\xe1无法使用 utf-8、utf-16 编码进行解码。

>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data

Try latin-1 encoding:

尝试 latin-1 编码：

>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ',
...           60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo            ", 60, true, "40141613"]'

Or, specify ensure_ascii=False, json.dumpsto make json.dumpsnot try to decode the string.

或者，指定ensure_ascii=False,json.dumps使json.dumps不尝试解码字符串。

>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ", 60, true, "40141613"]'

Answer 2

回答by miraculixx

I had a similar problem, and came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and usethe following lambdas:

我遇到了类似的问题，并提出了以下方法来保证来自任一输入的 unicodes 或字节字符串。简而言之，包括并使用以下 lambda：

# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt) 
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)

Applied to your question:

应用于您的问题：

import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ", 60,
 True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj

=>

object
  (5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ', 60, True, '40141613')
json (type <type 'str'>)
  [5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo            ", 60, true, "40141613"]
object again
  [5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo            ', 60, True, u'40141613']

Here's some more reasoning about this.

这里有一些关于这个的更多推理。

Python 使用 json.dumps() 时出现 UnicodeDecodeError

提问by deostroll

采纳答案by falsetru

回答by miraculixx

相关推荐

最近更新

标签

Python 使用 json.dumps() 时出现 UnicodeDecodeError

提问by deostroll

采纳答案by falsetru

回答by miraculixx

相关推荐

OSx 更新后如何修复损坏的 python 2.7.11

计算有多少值归因于一个 python (3.2) 字典的键

如何在 Ipython (py 2.7) notebook 中更改 Markdown 单元格的字体大小和颜色

Python scipy 最小化函数的输入结构

相关推荐

最近更新

标签