Python 3，从/到gzip文件读/写压缩的json对象

Question

提问by Henry Thornton

For Python3, I followed @Martijn Pieters's codewith this:

import gzip
import json

# writing
with gzip.GzipFile(jsonfilename, 'w') as fout:
    for i in range(N):
        uid = "whatever%i" % i
        dv = [1, 2, 3]
        data = json.dumps({
            'what': uid,
            'where': dv})

        fout.write(data + '\n')

but this results in an error:

但这会导致错误：

Traceback (most recent call last):
    ...
  File "C:\Users\Think\my_json.py", line 118, in write_json
    fout.write(data + '\n')
  File "C:\Users\Think\Anaconda3\lib\gzip.py", line 258, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Any thoughts about what is going on?

关于发生了什么的任何想法？

Answer 1

回答by Tomalak

You have four steps of transformation here.

你在这里有四个转变步骤。

a Python data structure (nested dicts, lists, strings, numbers, booleans)
a Python string containing a serialized representation of that data structure ("JSON")
a list of bytes containing a representation of that string ("UTF-8")
a list of bytes containing a representation of that previous byte list ("gzip")

Python 数据结构（嵌套字典、列表、字符串、数字、布尔值）
包含该数据结构（“JSON”）的序列化表示的 Python 字符串
包含该字符串表示的字节列表（“UTF-8”）
包含先前字节列表（“gzip”）表示的字节列表

So let's take these steps one by one.

因此，让我们一一执行这些步骤。

import gzip
import json

data = []
for i in range(N):
    uid = "whatever%i" % i
    dv = [1, 2, 3]
    data.append({
        'what': uid,
        'where': dv
    })                                           # 1. data

json_str = json.dumps(data) + "\n"               # 2. string (i.e. JSON)
json_bytes = json_str.encode('utf-8')            # 3. bytes (i.e. UTF-8)

with gzip.GzipFile(jsonfilename, 'w') as fout:   # 4. gzip
    fout.write(json_bytes)

Note that adding "\n"is completely superfluous here. It does not break anything, but beyond that it has no use. I've added that only because you have it in your code sample.

请注意，添加"\n"在这里完全是多余的。它不会破坏任何东西，但除此之外它没有任何用处。我添加了它只是因为您的代码示例中有它。

Reading works exactly the other way around:

阅读正好相反：

with gzip.GzipFile(jsonfilename, 'r') as fin:    # 4. gzip
    json_bytes = fin.read()                      # 3. bytes (i.e. UTF-8)

json_str = json_bytes.decode('utf-8')            # 2. string (i.e. JSON)
data = json.loads(json_str)                      # 1. data

print(data)

Of course the steps can be combined:

当然这些步骤可以合并：

with gzip.GzipFile(jsonfilename, 'w') as fout:
    fout.write(json.dumps(data).encode('utf-8'))

and

和

with gzip.GzipFile(jsonfilename, 'r') as fin:
    data = json.loads(fin.read().decode('utf-8'))

Answer 2

回答by JanFrederik

The solution mentioned in https://stackoverflow.com/a/49535758/1236083(thanks, @Rafe) has a big advantage: as encoding is done on-the-fly, you don't create two complete, intermediate string objects of the generated json. With big objects, this saves memory.

https://stackoverflow.com/a/49535758/1236083（感谢@Rafe）中提到的解决方案有一个很大的优势：由于编码是即时完成的，因此您不会创建两个完整的中间字符串对象生成的json。对于大对象，这可以节省内存。

In addition to the mentioned post, decoding is simple as well:

除了提到的帖子，解码也很简单：

with gzip.open(filename, 'rt', encoding='ascii') as zipfile:
    my_object = json.load(zipfile)

Python 3，从/到gzip文件读/写压缩的json对象

提问by Henry Thornton

回答by Tomalak

回答by JanFrederik

相关推荐

最近更新

标签

Python 3，从/到gzip文件读/写压缩的json对象

提问by Henry Thornton

回答by Tomalak

回答by JanFrederik

相关推荐

Hive：解析 JSON

如何使用应用引擎 Python webapp2 正确输出 JSON？

ReactJS - 从 URL 获取 json 对象数据

Json.Encode 缺少 mvc 4 程序集参考

相关推荐

最近更新

标签