Python 下载并解压缩内存中的 gzip 文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15352668/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:53:26  来源:igfitidea点击:

Download and decompress gzipped file in memory?

pythonfilegzipurllib2stringio

提问by OregonTrail

I would like to download a file using urllib and decompress the file in memory before saving.

我想使用 urllib 下载文件并在保存之前将文件解压缩到内存中。

This is what I have right now:

这就是我现在所拥有的:

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
outfile = open(outFilePath, 'w')
outfile.write(decompressedFile.read())

This ends up writing empty files. How can I achieve what I'm after?

这最终会写入空文件。我怎样才能实现我所追求的?

Updated Answer:

更新答案:

#! /usr/bin/env python2
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"        
# check filename: it may change over time, due to new updates
filename = "man-pages-5.00.tar.gz" 
outFilePath = filename[:-3]

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile)

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

采纳答案by crayzeewulf

You need to seek to the beginning of compressedFileafter writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzipmodule and will appear as an empty file to it. See below:

您需要compressedFile在写入之后但在将其传递给gzip.GzipFile(). 否则它将被gzip模块从末尾读取,并将显示为一个空文件。见下文:

#! /usr/bin/env python
import urllib2
import StringIO
import gzip

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-3.34.tar.gz"
outFilePath = "man-pages-3.34.tar"

response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
#
# Set the file's current position to the beginning
# of the file so that gzip.GzipFile can read
# its contents from the top.
#
compressedFile.seek(0)

decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')

with open(outFilePath, 'w') as outfile:
    outfile.write(decompressedFile.read())

回答by lyschoening

For those using Python 3, the equivalent answer is:

对于那些使用 Python 3 的人,等效的答案是:

import urllib.request
import io
import gzip

response = urllib.request.urlopen(FILE_URL)
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)

with open(OUTFILE_PATH, 'wb') as outfile:
    outfile.write(decompressed_file.read())

回答by Chih-Hsuan Yen

If you have Python 3.2 or above, life would be much easier:

如果你有 Python 3.2 或更高版本,生活会容易得多:

#!/usr/bin/env python3
import gzip
import urllib.request

baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-4.03.tar.gz"
outFilePath = filename[:-3]

response = urllib.request.urlopen(baseURL + filename)
with open(outFilePath, 'wb') as outfile:
    outfile.write(gzip.decompress(response.read()))

For those who are interested in history, see https://bugs.python.org/issue3488and https://hg.python.org/cpython/rev/3fa0a9553402.

对于那些对历史感兴趣的人,请参阅https://bugs.python.org/issue3488https://hg.python.org/cpython/rev/3fa0a9553402

回答by BaiJiFeiLong

One line code to print the decompressed file content:

一行代码打印解压后的文件内容:

print gzip.GzipFile(fileobj=StringIO.StringIO(urllib2.urlopen(DOWNLOAD_LINK).read()), mode='rb').read()