Python 3,从/到gzip文件读/写压缩的json对象
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39450065/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python 3, read/write compressed json objects from/to gzip file
提问by Henry Thornton
For Python3, I followed @Martijn Pieters's codewith this:
对于 Python3,我遵循了@Martijn Pieters 的代码:
import gzip
import json
# writing
with gzip.GzipFile(jsonfilename, 'w') as fout:
for i in range(N):
uid = "whatever%i" % i
dv = [1, 2, 3]
data = json.dumps({
'what': uid,
'where': dv})
fout.write(data + '\n')
but this results in an error:
但这会导致错误:
Traceback (most recent call last):
...
File "C:\Users\Think\my_json.py", line 118, in write_json
fout.write(data + '\n')
File "C:\Users\Think\Anaconda3\lib\gzip.py", line 258, in write
data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'
Any thoughts about what is going on?
关于发生了什么的任何想法?
回答by Tomalak
You have four steps of transformation here.
你在这里有四个转变步骤。
- a Python data structure (nested dicts, lists, strings, numbers, booleans)
- a Python string containing a serialized representation of that data structure ("JSON")
- a list of bytes containing a representation of that string ("UTF-8")
- a list of bytes containing a representation of that previous byte list ("gzip")
- Python 数据结构(嵌套字典、列表、字符串、数字、布尔值)
- 包含该数据结构(“JSON”)的序列化表示的 Python 字符串
- 包含该字符串表示的字节列表(“UTF-8”)
- 包含先前字节列表(“gzip”)表示的字节列表
So let's take these steps one by one.
因此,让我们一一执行这些步骤。
import gzip
import json
data = []
for i in range(N):
uid = "whatever%i" % i
dv = [1, 2, 3]
data.append({
'what': uid,
'where': dv
}) # 1. data
json_str = json.dumps(data) + "\n" # 2. string (i.e. JSON)
json_bytes = json_str.encode('utf-8') # 3. bytes (i.e. UTF-8)
with gzip.GzipFile(jsonfilename, 'w') as fout: # 4. gzip
fout.write(json_bytes)
Note that adding "\n"is completely superfluous here. It does not break anything, but beyond that it has no use. I've added that only because you have it in your code sample.
请注意,添加"\n"在这里完全是多余的。它不会破坏任何东西,但除此之外它没有任何用处。我添加了它只是因为您的代码示例中有它。
Reading works exactly the other way around:
阅读正好相反:
with gzip.GzipFile(jsonfilename, 'r') as fin: # 4. gzip
json_bytes = fin.read() # 3. bytes (i.e. UTF-8)
json_str = json_bytes.decode('utf-8') # 2. string (i.e. JSON)
data = json.loads(json_str) # 1. data
print(data)
Of course the steps can be combined:
当然这些步骤可以合并:
with gzip.GzipFile(jsonfilename, 'w') as fout:
fout.write(json.dumps(data).encode('utf-8'))
and
和
with gzip.GzipFile(jsonfilename, 'r') as fin:
data = json.loads(fin.read().decode('utf-8'))
回答by JanFrederik
The solution mentioned in https://stackoverflow.com/a/49535758/1236083(thanks, @Rafe) has a big advantage: as encoding is done on-the-fly, you don't create two complete, intermediate string objects of the generated json. With big objects, this saves memory.
https://stackoverflow.com/a/49535758/1236083(感谢@Rafe)中提到的解决方案有一个很大的优势:由于编码是即时完成的,因此您不会创建两个完整的中间字符串对象生成的json。对于大对象,这可以节省内存。
In addition to the mentioned post, decoding is simple as well:
除了提到的帖子,解码也很简单:
with gzip.open(filename, 'rt', encoding='ascii') as zipfile:
my_object = json.load(zipfile)

