Python Json.dump 失败,“必须是 unicode,而不是 str”类型错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36003023/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:15:41  来源:igfitidea点击:

Json.dump failing with 'must be unicode, not str' TypeError

pythonjsonpython-2.7unicodeencoding

提问by IronWaffleMan

I have a json file which happens to have a multitude of Chinese and Japanese (and other language) characters. I'm loading it into my python 2.7 script using io.openas follows:

我有一个 json 文件,它恰好有大量的中文和日文(和其他语言)字符。我正在使用io.open如下方法将它加载到我的 python 2.7 脚本中:

with io.open('multiIdName.json', encoding="utf-8") as json_data:
    cards = json.load(json_data)

I add a new property to the json, all good. Then I attempt to write it back out to another file:

我向 json 添加了一个新属性,一切都很好。然后我尝试将它写回另一个文件:

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
        json.dump(cards, outfile, ensure_ascii=False)

That's when I get the error TypeError: must be unicode, not str

那是我收到错误的时候 TypeError: must be unicode, not str

I tried writing the outfile as a binary (with io.open("testJson.json",'wb') as outfile:), but I end up with stuff this:

我尝试将输出文件编写为二进制 ( with io.open("testJson.json",'wb') as outfile:),但最终得到了以下内容:

{"multiverseid": 262906, "name": "\u00e6\u00b8\u00b8\u00e9\u009a\u00bc\u00e7\u008b\u00ae\u00e9\u00b9\u00ab", "language": "Chinese Simplified"}

I thought opening and writing it in the same encoding would be enough, as well as the ensure_ascii flag, but clearly not. I just want to preserve the characters that existed in the file before I run my script, without them turning into \u's.

我认为以相同的编码打开和编写它就足够了,还有 ensure_ascii 标志,但显然不是。我只想在运行我的脚本之前保留文件中存在的字符,而不会将它们变成 \u。

回答by Yaron

Can you try the following?

您可以尝试以下方法吗?

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
  outfile.write(unicode(json.dumps(cards, ensure_ascii=False)))

回答by Antti Haapala

The reason for this error is the completely stupid behaviour of json.dumpsin Python 2:

此错误的原因是json.dumpsPython 2 中完全愚蠢的行为:

>>> json.dumps({'a': 'a'}, ensure_ascii=False)
'{"a": "a"}'
>>> json.dumps({'a': u'a'}, ensure_ascii=False)
u'{"a": "a"}'
>>> json.dumps({'a': '?'}, ensure_ascii=False)
'{"a": "\xc3\xa4"}'
>>> json.dumps({u'a': '?'}, ensure_ascii=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 210, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

This coupled with the fact that io.openwith encodingset only accepts unicodeobjects (which by itself is right), leads to problems.

io.openencodingset 只接受unicode对象(这本身是正确的)这一事实相结合,会导致问题。

The return type is completely dependent on whatever is the type of keys or values in the dictionary, if ensure_ascii=False, but stris returned always if ensure_ascii=True. If you can accidentally set 8-bit strings to dictionaries, you cannot blindly convert this return type to unicode, because you needto set the encoding, presumably UTF-8:

返回类型完全取决于字典中键或值的类型 if ensure_ascii=False,但str始终返回 if ensure_ascii=True。如果不小心将8位字符串设置为字典,则不能盲目地将此返回类型转换为unicode,因为需要设置编码,大概是UTF-8:

>>> x = json.dumps(obj, ensure_ascii=False)
>>> if isinstance(x, str):
...     x = unicode(x, 'UTF-8')

In thiscase I believe you can use the json.dumpto write to an open binary file; however if you need to do something more complicated with the resulting object, you probably need the above code.

这种情况下,我相信您可以使用json.dump写入打开的二进制文件;但是如果你需要对结果对象做一些更复杂的事情,你可能需要上面的代码。



One solution is to end all this encoding/decoding madness by switching to Python 3.

一种解决方案是通过切换到 Python 3 来结束所有这些编码/解码的疯狂。

回答by Alastair McCormack

The JSON module handles encoding and decoding for you, so you can simply open the input and output files in binary mode. The JSON module assumes UTF-8 encoding, but can be changed using encodingattribute on the load()and dump()methods.

JSON 模块为您处理编码和解码,因此您可以简单地以二进制模式打开输入和输出文件。JSON 模块采用 UTF-8 编码,但可以使用和方法encoding上的属性进行更改。load()dump()

with open('multiIdName.json', 'rb') as json_data:
    cards = json.load(json_data)

then:

然后:

with open("testJson.json", 'wb') as outfile:
    json.dump(cards, outfile, ensure_ascii=False)
with open("testJson.json", 'wb') as outfile:
    json.dump(cards, outfile, ensure_ascii=False)

Thanks to @Antti Haapala, Python 2.x JSON module gives either Unicode or str depending on the contents of the object.

感谢@Antti Haapala,Python 2.x JSON 模块根据对象的内容提供 Unicode 或 str。

You will have to add a sense check to ensure the result is a Unicode before writing through io:

在写入之前,您必须添加一个意义检查以确保结果是 Unicode io

with io.open("testJson.json", 'w', encoding="utf-8") as outfile:
    my_json_str = json.dumps(my_obj, ensure_ascii=False)
    if isinstance(my_json_str, str):
        my_json_str = my_json_str.decode("utf-8")

    outfile.write(my_json_str)