Python 将 json.dumps 中的 utf-8 文本保存为 UTF8,而不是 \u 转义序列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18337407/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:30:37  来源:igfitidea点击:

Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence

pythonjsonunicodeutf-8escaping

提问by Berry Tsakala

sample code:

示例代码:

>>> import json
>>> json_string = json.dumps("??? ????")
>>> print json_string
"\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"

The problem: it's not human readable. My (smart) users want to verify or even edit text files with JSON dumps (and I'd rather not use XML).

问题:它不是人类可读的。我的(聪明的)用户想要验证甚至编辑带有 JSON 转储的文本文件(我宁愿不使用 XML)。

Is there a way to serialize objects into UTF-8 JSON strings (instead of \uXXXX)?

有没有办法将对象序列化为 UTF-8 JSON 字符串(而不是 \uXXXX)?

采纳答案by Martijn Pieters

Use the ensure_ascii=Falseswitch to json.dumps(), then encode the value to UTF-8 manually:

使用ensure_ascii=Falseswitch json.dumps(),然后手动将值编码为 UTF-8:

>>> json_string = json.dumps("??? ????", ensure_ascii=False).encode('utf8')
>>> json_string
b'"\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94"'
>>> print(json_string.decode())
"??? ????"

If you are writing to a file, just use json.dump()and leave it to the file object to encode:

如果您正在写入文件,只需使用json.dump()并将其留给文件对象进行编码:

with open('filename', 'w', encoding='utf8') as json_file:
    json.dump("??? ????", json_file, ensure_ascii=False)

Caveats for Python 2

Python 2 的注意事项

For Python 2, there are some more caveats to take into account. If you are writing this to a file, you can use io.open()instead of open()to produce a file object that encodes Unicode values for you as you write, then use json.dump()instead to write to that file:

对于 Python 2,还有一些需要考虑的注意事项。如果您要将其写入文件,则可以使用io.open()而不是在open()您编写时生成一个为您编码 Unicode 值的文件对象,然后使用json.dump()代替写入该文件:

with io.open('filename', 'w', encoding='utf8') as json_file:
    json.dump(u"??? ????", json_file, ensure_ascii=False)

Do note that there is a bug in the jsonmodulewhere the ensure_ascii=Falseflag can produce a mixof unicodeand strobjects. The workaround for Python 2 then is:

做笔记,有一对在错误json模块,其中ensure_ascii=False标志可以产生一个混合unicodestr对象。Python 2 的解决方法是:

with io.open('filename', 'w', encoding='utf8') as json_file:
    data = json.dumps(u"??? ????", ensure_ascii=False)
    # unicode(data) auto-decodes data to unicode if str
    json_file.write(unicode(data))

In Python 2, when using byte strings (type str), encoded to UTF-8, make sure to also set the encodingkeyword:

在 Python 2 中,当使用str编码为 UTF-8 的字节字符串 (type ) 时,请确保还设置encoding关键字:

>>> d={ 1: "??? ????", 2: u"??? ????" }
>>> d
{1: '\xd7\x91\xd7\xa8\xd7\x99 \xd7\xa6\xd7\xa7\xd7\x9c\xd7\x94', 2: u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'}

>>> s=json.dumps(d, ensure_ascii=False, encoding='utf8')
>>> s
u'{"1": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4", "2": "\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4"}'
>>> json.loads(s)['1']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> json.loads(s)['2']
u'\u05d1\u05e8\u05d9 \u05e6\u05e7\u05dc\u05d4'
>>> print json.loads(s)['1']
??? ????
>>> print json.loads(s)['2']
??? ????

回答by Ryan X

Using ensure_ascii=False in json.dumps is the right direction to solve this problem, as pointed out by Martijn. However, this may raise an exception:

正如 Martijn 所指出的,在 json.dumps 中使用 ensure_ascii=False 是解决这个问题的正确方向。但是,这可能会引发异常:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 1: ordinal not in range(128)

You need extra settings in either site.py or sitecustomize.py to set your sys.getdefaultencoding() correct. site.py is under lib/python2.7/ and sitecustomize.py is under lib/python2.7/site-packages.

您需要在 site.py 或 sitecustomize.py 中进行额外设置以正确设置 sys.getdefaultencoding()。site.py 在 lib/python2.7/ 下,sitecustomize.py 在 lib/python2.7/site-packages 下。

If you want to use site.py, under def setencoding(): change the first if 0: to if 1: so that python will use your operation system's locale.

如果要使用 site.py,请在 def setencoding(): 将第一个 if 0: 更改为 if 1: 以便 python 使用您操作系统的语言环境。

If you prefer to use sitecustomize.py, which may not exist if you haven't created it. simply put these lines:

如果您更喜欢使用 sitecustomize.py,如果您还没有创建它,它可能不存在。简单地把这些行:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

Then you can do some Chinese json output in utf-8 format, such as:

然后就可以做一些utf-8格式的中文json输出,比如:

name = {"last_name": u"王"}
json.dumps(name, ensure_ascii=False)

You will get an utf-8 encoded string, rather than \u escaped json string.

你会得到一个 utf-8 编码的字符串,而不是 \u 转义的 json 字符串。

To verify your default encoding:

要验证您的默认编码:

print sys.getdefaultencoding()

You should get "utf-8" or "UTF-8" to verify your site.py or sitecustomize.py settings.

您应该使用“utf-8”或“UTF-8”来验证您的 site.py 或 sitecustomize.py 设置。

Please note that you could not do sys.setdefaultencoding("utf-8") at interactive python console.

请注意,您不能在交互式 python 控制台上执行 sys.setdefaultencoding("utf-8") 。

回答by monitorius

UPDATE: This is wrong answer, but it's still useful to understand why it's wrong. See comments.

更新:这是错误的答案,但理解为什么它是错误的仍然很有用。看评论。

How about unicode-escape?

怎么样unicode-escape

>>> d = {1: "??? ????", 2: u"??? ????"}
>>> json_str = json.dumps(d).decode('unicode-escape').encode('utf8')
>>> print json_str
{"1": "??? ????", "2": "??? ????"}

回答by Jonathan Ray

Peters' python 2 workaround fails on an edge case:

Peters 的 python 2 解决方法在边缘情况下失败:

d = {u'keyword': u'bad credit  \xe7redit cards'}
with io.open('filename', 'w', encoding='utf8') as json_file:
    data = json.dumps(d, ensure_ascii=False).decode('utf8')
    try:
        json_file.write(data)
    except TypeError:
        # Decode data to Unicode first
        json_file.write(data.decode('utf8'))

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe7' in position 25: ordinal not in range(128)

It was crashing on the .decode('utf8') part of line 3. I fixed the problem by making the program much simpler by avoiding that step as well as the special casing of ascii:

它在第 3 行的 .decode('utf8') 部分崩溃。我通过避免该步骤以及 ascii 的特殊大小写使程序更简单来解决问题:

with io.open('filename', 'w', encoding='utf8') as json_file:
  data = json.dumps(d, ensure_ascii=False, encoding='utf8')
  json_file.write(unicode(data))

cat filename
{"keyword": "bad credit  ?redit cards"}

回答by Neit Sabes

Here's my solution using json.dump():

这是我使用 json.dump() 的解决方案:

def jsonWrite(p, pyobj, ensure_ascii=False, encoding=SYSTEM_ENCODING, **kwargs):
    with codecs.open(p, 'wb', 'utf_8') as fileobj:
        json.dump(pyobj, fileobj, ensure_ascii=ensure_ascii,encoding=encoding, **kwargs)

where SYSTEM_ENCODING is set to:

其中 SYSTEM_ENCODING 设置为:

locale.setlocale(locale.LC_ALL, '')
SYSTEM_ENCODING = locale.getlocale()[1]

回答by Tr?n Quang Hi?p

To write to a file

写入文件

import codecs
import json

with codecs.open('your_file.txt', 'w', encoding='utf-8') as f:
    json.dump({"message":"xin chào vi?t nam"}, f, ensure_ascii=False)

To print to stdout

打印到标准输出

import json
print(json.dumps({"message":"xin chào vi?t nam"}, ensure_ascii=False))

回答by Cheney

The following is my understanding var reading answer above and google.

以下是我对上面和谷歌阅读答案的理解。

# coding:utf-8
r"""
@update: 2017-01-09 14:44:39
@explain: str, unicode, bytes in python2to3
    #python2 UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 7: ordinal not in range(128)
    #1.reload
    #importlib,sys
    #importlib.reload(sys)
    #sys.setdefaultencoding('utf-8') #python3 don't have this attribute.
    #not suggest even in python2 #see:http://stackoverflow.com/questions/3828723/why-should-we-not-use-sys-setdefaultencodingutf-8-in-a-py-script
    #2.overwrite /usr/lib/python2.7/sitecustomize.py or (sitecustomize.py and PYTHONPATH=".:$PYTHONPATH" python)
    #too complex
    #3.control by your own (best)
    #==> all string must be unicode like python3 (u'xx'|b'xx'.encode('utf-8')) (unicode 's disappeared in python3)
    #see: http://blog.ernest.me/post/python-setdefaultencoding-unicode-bytes

    #how to Saving utf-8 texts in json.dumps as UTF8, not as \u escape sequence
    #http://stackoverflow.com/questions/18337407/saving-utf-8-texts-in-json-dumps-as-utf8-not-as-u-escape-sequence
"""

from __future__ import print_function
import json

a = {"b": u"中文"}  # add u for python2 compatibility
print('%r' % a)
print('%r' % json.dumps(a))
print('%r' % (json.dumps(a).encode('utf8')))
a = {"b": u"中文"}
print('%r' % json.dumps(a, ensure_ascii=False))
print('%r' % (json.dumps(a, ensure_ascii=False).encode('utf8')))
# print(a.encode('utf8')) #AttributeError: 'dict' object has no attribute 'encode'
print('')

# python2:bytes=str; python3:bytes
b = a['b'].encode('utf-8')
print('%r' % b)
print('%r' % b.decode("utf-8"))
print('')

# python2:unicode; python3:str=unicode
c = b.decode('utf-8')
print('%r' % c)
print('%r' % c.encode('utf-8'))
"""
#python2
{'b': u'\u4e2d\u6587'}
'{"b": "\u4e2d\u6587"}'
'{"b": "\u4e2d\u6587"}'
u'{"b": "\u4e2d\u6587"}'
'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'

'\xe4\xb8\xad\xe6\x96\x87'
u'\u4e2d\u6587'

u'\u4e2d\u6587'
'\xe4\xb8\xad\xe6\x96\x87'

#python3
{'b': '中文'}
'{"b": "\u4e2d\u6587"}'
b'{"b": "\u4e2d\u6587"}'
'{"b": "中文"}'
b'{"b": "\xe4\xb8\xad\xe6\x96\x87"}'

b'\xe4\xb8\xad\xe6\x96\x87'
'中文'

'中文'
b'\xe4\xb8\xad\xe6\x96\x87'
"""

回答by Yulin GUO

Use codecs if possible,

如果可能,使用编解码器,

with codecs.open('file_path', 'a+', 'utf-8') as fp:
    fp.write(json.dumps(res, ensure_ascii=False))

回答by Nik

As of Python 3.7 the following code works fine:

从 Python 3.7 开始,以下代码可以正常工作:

from json import dumps
result = {"symbol": "?"}
json_string = dumps(result, sort_keys=True, indent=2, ensure_ascii=False)
print(json_string)

Output:

输出:

{"symbol": "?"}

回答by Chandan Sharma

If you are loading JSON string from a file & file contents arabic texts. Then this will work.

如果您从文件和文件内容阿拉伯文本加载 JSON 字符串。然后这将起作用。

Assume File like: arabic.json

假设文件如:arabic.json

{ 
"key1" : "?????????",
"key2" : "????? ??????"
}

Get the arabic contents from the arabic.json file

从 arabic.json 文件中获取阿拉伯语内容

with open(arabic.json, encoding='utf-8') as f:
   # deserialises it
   json_data = json.load(f)
   f.close()


# json formatted string
json_data2 = json.dumps(json_data, ensure_ascii = False)

To use JSON Data in Django Template follow below steps:

要在 Django 模板中使用 JSON 数据,请按照以下步骤操作:

# If have to get the JSON index in Django Template file, then simply decode the encoded string.

json.JSONDecoder().decode(json_data2)

done! Now we can get the results as JSON index with arabic value.

完毕!现在我们可以将结果作为带有阿拉伯值的 JSON 索引。