Python 转储到 JSON 添加了额外的双引号和引号转义

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25242262/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:56:07  来源:igfitidea点击:

Dump to JSON adds additional double quotes and escaping of quotes

pythonjson

提问by toobee

I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.

我正在使用 Python 工具检索 Twitter 数据并将这些数据以 JSON 格式转储到我的磁盘中。我注意到一条用双引号括起来的推文的整个数据字符串意外转义。此外,实际 JSON 格式的所有双引号都用反斜杠转义。

They look like this:

它们看起来像这样:

"{\"created_at\":\"Fri Aug 08 11:04:40 +0000 2014\",\"id\":497699913925292032,

"{\"created_at\":\"Fri Aug 08 11:04:40 +0000 2014\",\"id\":497699913925292032,

How do I avoid that? It should be:

我该如何避免?它应该是:

{"created_at":"Fri Aug 08 11:04:40 +0000 2014" .....

{"created_at":"Fri Aug 08 11:04:40 +0000 2014" .....

My file-out code looks like this:

我的文件输出代码如下所示:

with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
            f.write(unicode(json.dumps(data, ensure_ascii=False)))
            f.write(unicode('\n'))

The unintended escaping causes problems when reading in the JSON file in a later processing step.

在后面的处理步骤中读取 JSON 文件时,意外的转义会导致问题。

采纳答案by Martijn Pieters

You are double encoding your JSON strings. datais alreadya JSON string, and doesn't need to be encoded again:

您正在对 JSON 字符串进行双重编码。data已经JSON字符串,并且不需要进行编码再次

>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{\"created_at\": \"Fri Aug 08 11:04:40 +0000 2014\"}"

Just write these directly to your file:

只需将这些直接写入您的文件:

with open('data{}.txt'.format(self.timestamp), 'a') as f:
    f.write(data + '\n')

回答by Mike Maxwell

Another situation where this unwanted escaping can happen is if you try to use json.dump() on the pre-processed output of json.dumps(). For example

另一种可能发生这种不需要的转义的情况是,如果您尝试在 json.dumps() 的预处理输出上使用 json.dump()。例如

import json, sys
json.dump({"foo": json.dumps([{"bar": 1}, {"baz": 2}])},sys.stdout)

will result in

会导致

{"foo": "[{\"bar\": 1}, {\"baz\": 2}]"}

To avoid this, you need to pass dictionaries rather than the output of json.dumps(), e.g.

为了避免这种情况,您需要传递字典而不是 json.dumps() 的输出,例如

json.dump({"foo": [{"bar": 1}, {"baz": 2}]},sys.stdout)

which outputs the desired

输出所需的

{"foo": [{"bar": 1}, {"baz": 2}]}

(Why would you pre-process the inner list with json.dumps(), you ask? Well, I had another function that was creating that inner list out of other stuff, and I thought it would make sense to return a json object from that function... Wrong.)

(你问为什么要用 json.dumps() 预处理内部列表?嗯,我有另一个函数从其他东西中创建该内部列表,我认为从那个函数……错了。)