Python 转储到 JSON 添加了额外的双引号和引号转义

Question

提问by toobee

I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.

我正在使用 Python 工具检索 Twitter 数据并将这些数据以 JSON 格式转储到我的磁盘中。我注意到一条用双引号括起来的推文的整个数据字符串意外转义。此外，实际 JSON 格式的所有双引号都用反斜杠转义。

They look like this:

它们看起来像这样：

"{\"created_at\":\"Fri Aug 08 11:04:40 +0000 2014\",\"id\":497699913925292032,

How do I avoid that? It should be:

我该如何避免？它应该是：

{"created_at":"Fri Aug 08 11:04:40 +0000 2014" .....

My file-out code looks like this:

我的文件输出代码如下所示：

with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f:
            f.write(unicode(json.dumps(data, ensure_ascii=False)))
            f.write(unicode('\n'))

The unintended escaping causes problems when reading in the JSON file in a later processing step.

在后面的处理步骤中读取 JSON 文件时，意外的转义会导致问题。

Answer 1

采纳答案by Martijn Pieters

You are double encoding your JSON strings. datais alreadya JSON string, and doesn't need to be encoded again:

您正在对 JSON 字符串进行双重编码。data是已经JSON字符串，并且不需要进行编码再次：

>>> import json
>>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"}
>>> encoded_data = json.dumps(not_encoded)
>>> print encoded_data
{"created_at": "Fri Aug 08 11:04:40 +0000 2014"}
>>> double_encode = json.dumps(encoded_data)
>>> print double_encode
"{\"created_at\": \"Fri Aug 08 11:04:40 +0000 2014\"}"

Just write these directly to your file:

只需将这些直接写入您的文件：

with open('data{}.txt'.format(self.timestamp), 'a') as f:
    f.write(data + '\n')

Answer 2

回答by Mike Maxwell

Another situation where this unwanted escaping can happen is if you try to use json.dump() on the pre-processed output of json.dumps(). For example

另一种可能发生这种不需要的转义的情况是，如果您尝试在 json.dumps() 的预处理输出上使用 json.dump()。例如

import json, sys
json.dump({"foo": json.dumps([{"bar": 1}, {"baz": 2}])},sys.stdout)

will result in

会导致

{"foo": "[{\"bar\": 1}, {\"baz\": 2}]"}

To avoid this, you need to pass dictionaries rather than the output of json.dumps(), e.g.

为了避免这种情况，您需要传递字典而不是 json.dumps() 的输出，例如

json.dump({"foo": [{"bar": 1}, {"baz": 2}]},sys.stdout)

which outputs the desired

输出所需的

{"foo": [{"bar": 1}, {"baz": 2}]}

(Why would you pre-process the inner list with json.dumps(), you ask? Well, I had another function that was creating that inner list out of other stuff, and I thought it would make sense to return a json object from that function... Wrong.)

（你问为什么要用 json.dumps() 预处理内部列表？嗯，我有另一个函数从其他东西中创建该内部列表，我认为从那个函数……错了。）

Python 转储到 JSON 添加了额外的双引号和引号转义

提问by toobee

采纳答案by Martijn Pieters

回答by Mike Maxwell

相关推荐

最近更新

标签

Python 转储到 JSON 添加了额外的双引号和引号转义

提问by toobee

采纳答案by Martijn Pieters

回答by Mike Maxwell

相关推荐

在python中，如果一个函数没有return语句，它返回什么？

在 Ubuntu 上为 Python 安装 OpenCV，得到 ImportError: No module named cv2.cv

Python 熊猫数据框删除常量列

Python 如何使用 xml.etree.ElementTree 编写 XML 声明

相关推荐

最近更新

标签