Python Json.dump 失败，“必须是 unicode，而不是 str”类型错误

Question

提问by IronWaffleMan

I have a json file which happens to have a multitude of Chinese and Japanese (and other language) characters. I'm loading it into my python 2.7 script using io.openas follows:

我有一个 json 文件，它恰好有大量的中文和日文（和其他语言）字符。我正在使用io.open如下方法将它加载到我的 python 2.7 脚本中：

with io.open('multiIdName.json', encoding="utf-8") as json_data:
    cards = json.load(json_data)

I add a new property to the json, all good. Then I attempt to write it back out to another file:

我向 json 添加了一个新属性，一切都很好。然后我尝试将它写回另一个文件：

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
        json.dump(cards, outfile, ensure_ascii=False)

That's when I get the error TypeError: must be unicode, not str

那是我收到错误的时候 TypeError: must be unicode, not str

I tried writing the outfile as a binary (with io.open("testJson.json",'wb') as outfile:), but I end up with stuff this:

我尝试将输出文件编写为二进制 ( with io.open("testJson.json",'wb') as outfile:)，但最终得到了以下内容：

{"multiverseid": 262906, "name": "\u00e6\u00b8\u00b8\u00e9\u009a\u00bc\u00e7\u008b\u00ae\u00e9\u00b9\u00ab", "language": "Chinese Simplified"}

I thought opening and writing it in the same encoding would be enough, as well as the ensure_ascii flag, but clearly not. I just want to preserve the characters that existed in the file before I run my script, without them turning into \u's.

我认为以相同的编码打开和编写它就足够了，还有 ensure_ascii 标志，但显然不是。我只想在运行我的脚本之前保留文件中存在的字符，而不会将它们变成 \u。

Answer 1

回答by Yaron

Can you try the following?

您可以尝试以下方法吗？

with io.open("testJson.json",'w',encoding="utf-8") as outfile:
  outfile.write(unicode(json.dumps(cards, ensure_ascii=False)))

Answer 2

回答by Antti Haapala

The reason for this error is the completely stupid behaviour of json.dumpsin Python 2:

此错误的原因是json.dumpsPython 2 中完全愚蠢的行为：

>>> json.dumps({'a': 'a'}, ensure_ascii=False)
'{"a": "a"}'
>>> json.dumps({'a': u'a'}, ensure_ascii=False)
u'{"a": "a"}'
>>> json.dumps({'a': '?'}, ensure_ascii=False)
'{"a": "\xc3\xa4"}'
>>> json.dumps({u'a': '?'}, ensure_ascii=False)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 210, in encode
    return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

This coupled with the fact that io.openwith encodingset only accepts unicodeobjects (which by itself is right), leads to problems.

这io.open与encodingset 只接受unicode对象（这本身是正确的）这一事实相结合，会导致问题。

The return type is completely dependent on whatever is the type of keys or values in the dictionary, if ensure_ascii=False, but stris returned always if ensure_ascii=True. If you can accidentally set 8-bit strings to dictionaries, you cannot blindly convert this return type to unicode, because you needto set the encoding, presumably UTF-8:

返回类型完全取决于字典中键或值的类型 if ensure_ascii=False，但str始终返回 if ensure_ascii=True。如果不小心将8位字符串设置为字典，则不能盲目地将此返回类型转换为unicode，因为需要设置编码，大概是UTF-8：

>>> x = json.dumps(obj, ensure_ascii=False)
>>> if isinstance(x, str):
...     x = unicode(x, 'UTF-8')

In thiscase I believe you can use the json.dumpto write to an open binary file; however if you need to do something more complicated with the resulting object, you probably need the above code.

在这种情况下，我相信您可以使用json.dump写入打开的二进制文件；但是如果你需要对结果对象做一些更复杂的事情，你可能需要上面的代码。

One solution is to end all this encoding/decoding madness by switching to Python 3.

一种解决方案是通过切换到 Python 3 来结束所有这些编码/解码的疯狂。

Answer 3

回答by Alastair McCormack

The JSON module handles encoding and decoding for you, so you can simply open the input and output files in binary mode. The JSON module assumes UTF-8 encoding, but can be changed using encodingattribute on the load()and dump()methods.

JSON 模块为您处理编码和解码，因此您可以简单地以二进制模式打开输入和输出文件。JSON 模块采用 UTF-8 编码，但可以使用和方法encoding上的属性进行更改。load()dump()

with open('multiIdName.json', 'rb') as json_data:
    cards = json.load(json_data)

then:

然后：

~~with open("testJson.json", 'wb') as outfile: json.dump(cards, outfile, ensure_ascii=False)~~
~~with open("testJson.json", 'wb') as outfile: json.dump(cards, outfile, ensure_ascii=False)~~

Thanks to @Antti Haapala, Python 2.x JSON module gives either Unicode or str depending on the contents of the object.

感谢@Antti Haapala，Python 2.x JSON 模块根据对象的内容提供 Unicode 或 str。

You will have to add a sense check to ensure the result is a Unicode before writing through io:

在写入之前，您必须添加一个意义检查以确保结果是 Unicode io：

with io.open("testJson.json", 'w', encoding="utf-8") as outfile:
    my_json_str = json.dumps(my_obj, ensure_ascii=False)
    if isinstance(my_json_str, str):
        my_json_str = my_json_str.decode("utf-8")

    outfile.write(my_json_str)

Python Json.dump 失败，“必须是 unicode，而不是 str”类型错误

提问by IronWaffleMan

回答by Yaron

回答by Antti Haapala

回答by Alastair McCormack

相关推荐

最近更新

标签

Python Json.dump 失败，“必须是 unicode，而不是 str”类型错误

提问by IronWaffleMan

回答by Yaron

回答by Antti Haapala

回答by Alastair McCormack

相关推荐

Python 属性错误：模块“cv2.face”没有属性“createlbphfacerecognizer”

Python 我可以在 GPU 上运行 Keras 模型吗？

Python 检查列表是否是子列表

Python 一一循环数据帧（熊猫）

相关推荐

最近更新

标签