Python 下载并解压缩内存中的 gzip 文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15352668/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Download and decompress gzipped file in memory?
提问by OregonTrail
I would like to download a file using urllib and decompress the file in memory before saving.
我想使用 urllib 下载文件并在保存之前将文件解压缩到内存中。
This is what I have right now:
这就是我现在所拥有的:
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
outfile = open(outFilePath, 'w')
outfile.write(decompressedFile.read())
This ends up writing empty files. How can I achieve what I'm after?
这最终会写入空文件。我怎样才能实现我所追求的?
Updated Answer:
更新答案:
#! /usr/bin/env python2
import urllib2
import StringIO
import gzip
baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
# check filename: it may change over time, due to new updates
filename = "man-pages-5.00.tar.gz"
outFilePath = filename[:-3]
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO(response.read())
decompressedFile = gzip.GzipFile(fileobj=compressedFile)
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read())
采纳答案by crayzeewulf
You need to seek to the beginning of compressedFileafter writing to it but before passing it to gzip.GzipFile(). Otherwise it will be read from the end by gzipmodule and will appear as an empty file to it. See below:
您需要compressedFile在写入之后但在将其传递给gzip.GzipFile(). 否则它将被gzip模块从末尾读取,并将显示为一个空文件。见下文:
#! /usr/bin/env python
import urllib2
import StringIO
import gzip
baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-3.34.tar.gz"
outFilePath = "man-pages-3.34.tar"
response = urllib2.urlopen(baseURL + filename)
compressedFile = StringIO.StringIO()
compressedFile.write(response.read())
#
# Set the file's current position to the beginning
# of the file so that gzip.GzipFile can read
# its contents from the top.
#
compressedFile.seek(0)
decompressedFile = gzip.GzipFile(fileobj=compressedFile, mode='rb')
with open(outFilePath, 'w') as outfile:
outfile.write(decompressedFile.read())
回答by lyschoening
For those using Python 3, the equivalent answer is:
对于那些使用 Python 3 的人,等效的答案是:
import urllib.request
import io
import gzip
response = urllib.request.urlopen(FILE_URL)
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
with open(OUTFILE_PATH, 'wb') as outfile:
outfile.write(decompressed_file.read())
回答by Chih-Hsuan Yen
If you have Python 3.2 or above, life would be much easier:
如果你有 Python 3.2 或更高版本,生活会容易得多:
#!/usr/bin/env python3
import gzip
import urllib.request
baseURL = "https://www.kernel.org/pub/linux/docs/man-pages/"
filename = "man-pages-4.03.tar.gz"
outFilePath = filename[:-3]
response = urllib.request.urlopen(baseURL + filename)
with open(outFilePath, 'wb') as outfile:
outfile.write(gzip.decompress(response.read()))
For those who are interested in history, see https://bugs.python.org/issue3488and https://hg.python.org/cpython/rev/3fa0a9553402.
对于那些对历史感兴趣的人,请参阅https://bugs.python.org/issue3488和https://hg.python.org/cpython/rev/3fa0a9553402。
回答by BaiJiFeiLong
One line code to print the decompressed file content:
一行代码打印解压后的文件内容:
print gzip.GzipFile(fileobj=StringIO.StringIO(urllib2.urlopen(DOWNLOAD_LINK).read()), mode='rb').read()

