Python Json.dump 失败,“必须是 unicode,而不是 str”类型错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36003023/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Json.dump failing with 'must be unicode, not str' TypeError
提问by IronWaffleMan
I have a json file which happens to have a multitude of Chinese and Japanese (and other language) characters. I'm loading it into my python 2.7 script using io.open
as follows:
我有一个 json 文件,它恰好有大量的中文和日文(和其他语言)字符。我正在使用io.open
如下方法将它加载到我的 python 2.7 脚本中:
with io.open('multiIdName.json', encoding="utf-8") as json_data:
cards = json.load(json_data)
I add a new property to the json, all good. Then I attempt to write it back out to another file:
我向 json 添加了一个新属性,一切都很好。然后我尝试将它写回另一个文件:
with io.open("testJson.json",'w',encoding="utf-8") as outfile:
json.dump(cards, outfile, ensure_ascii=False)
That's when I get the error TypeError: must be unicode, not str
那是我收到错误的时候 TypeError: must be unicode, not str
I tried writing the outfile as a binary (with io.open("testJson.json",'wb') as outfile:
), but I end up with stuff this:
我尝试将输出文件编写为二进制 ( with io.open("testJson.json",'wb') as outfile:
),但最终得到了以下内容:
{"multiverseid": 262906, "name": "\u00e6\u00b8\u00b8\u00e9\u009a\u00bc\u00e7\u008b\u00ae\u00e9\u00b9\u00ab", "language": "Chinese Simplified"}
I thought opening and writing it in the same encoding would be enough, as well as the ensure_ascii flag, but clearly not. I just want to preserve the characters that existed in the file before I run my script, without them turning into \u's.
我认为以相同的编码打开和编写它就足够了,还有 ensure_ascii 标志,但显然不是。我只想在运行我的脚本之前保留文件中存在的字符,而不会将它们变成 \u。
回答by Yaron
Can you try the following?
您可以尝试以下方法吗?
with io.open("testJson.json",'w',encoding="utf-8") as outfile:
outfile.write(unicode(json.dumps(cards, ensure_ascii=False)))
回答by Antti Haapala
The reason for this error is the completely stupid behaviour of json.dumps
in Python 2:
此错误的原因是json.dumps
Python 2 中完全愚蠢的行为:
>>> json.dumps({'a': 'a'}, ensure_ascii=False)
'{"a": "a"}'
>>> json.dumps({'a': u'a'}, ensure_ascii=False)
u'{"a": "a"}'
>>> json.dumps({'a': '?'}, ensure_ascii=False)
'{"a": "\xc3\xa4"}'
>>> json.dumps({u'a': '?'}, ensure_ascii=False)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 210, in encode
return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
This coupled with the fact that io.open
with encoding
set only accepts unicode
objects (which by itself is right), leads to problems.
这io.open
与encoding
set 只接受unicode
对象(这本身是正确的)这一事实相结合,会导致问题。
The return type is completely dependent on whatever is the type of keys or values in the dictionary, if ensure_ascii=False
, but str
is returned always if ensure_ascii=True
. If you can accidentally set 8-bit strings to dictionaries, you cannot blindly convert this return type to unicode
, because you needto set the encoding, presumably UTF-8:
返回类型完全取决于字典中键或值的类型 if ensure_ascii=False
,但str
始终返回 if ensure_ascii=True
。如果不小心将8位字符串设置为字典,则不能盲目地将此返回类型转换为unicode
,因为需要设置编码,大概是UTF-8:
>>> x = json.dumps(obj, ensure_ascii=False)
>>> if isinstance(x, str):
... x = unicode(x, 'UTF-8')
In thiscase I believe you can use the json.dump
to write to an open binary file; however if you need to do something more complicated with the resulting object, you probably need the above code.
在这种情况下,我相信您可以使用json.dump
写入打开的二进制文件;但是如果你需要对结果对象做一些更复杂的事情,你可能需要上面的代码。
One solution is to end all this encoding/decoding madness by switching to Python 3.
一种解决方案是通过切换到 Python 3 来结束所有这些编码/解码的疯狂。
回答by Alastair McCormack
The JSON module handles encoding and decoding for you, so you can simply open the input and output files in binary mode. The JSON module assumes UTF-8 encoding, but can be changed using encoding
attribute on the load()
and dump()
methods.
JSON 模块为您处理编码和解码,因此您可以简单地以二进制模式打开输入和输出文件。JSON 模块采用 UTF-8 编码,但可以使用和方法encoding
上的属性进行更改。load()
dump()
with open('multiIdName.json', 'rb') as json_data:
cards = json.load(json_data)
then:
然后:
with open("testJson.json", 'wb') as outfile:
json.dump(cards, outfile, ensure_ascii=False)
with open("testJson.json", 'wb') as outfile:
json.dump(cards, outfile, ensure_ascii=False)
Thanks to @Antti Haapala, Python 2.x JSON module gives either Unicode or str depending on the contents of the object.
感谢@Antti Haapala,Python 2.x JSON 模块根据对象的内容提供 Unicode 或 str。
You will have to add a sense check to ensure the result is a Unicode before writing through io
:
在写入之前,您必须添加一个意义检查以确保结果是 Unicode io
:
with io.open("testJson.json", 'w', encoding="utf-8") as outfile:
my_json_str = json.dumps(my_obj, ensure_ascii=False)
if isinstance(my_json_str, str):
my_json_str = my_json_str.decode("utf-8")
outfile.write(my_json_str)